Characterization of the Floral Transcriptome of Moso Bamboo (Phyllostachys edulis) at Different Flowering Developmental Stages by Transcriptome Sequencing and RNA-Seq Analysis

Background As an arborescent and perennial plant, Moso bamboo (Phyllostachys edulis (Carrière) J. Houzeau, synonym Phyllostachys heterocycla Carrière) is characterized by its infrequent sexual reproduction with flowering intervals ranging from several to more than a hundred years. However, little bamboo genomic research has been conducted on this due to a variety of reasons. Here, for the first time, we investigated the transcriptome of developing flowers in Moso bamboo by using high-throughput Illumina GAII sequencing and mapping short reads to the Moso bamboo genome and reference genes. We performed RNA-seq analysis on four important stages of flower development, and obtained extensive gene and transcript abundance data for the floral transcriptome of this key bamboo species. Results We constructed a cDNA library using equal amounts of RNA from Moso bamboo leaf samples from non-flowering plants (CK) and mixed flower samples (F) of four flower development stages. We generated more than 67 million reads from each of the CK and F samples. About 70% of the reads could be uniquely mapped to the Moso bamboo genome and the reference genes. Genes detected at each stage were categorized to putative functional categories based on their expression patterns. The analysis of RNA-seq data of bamboo flowering tissues at different developmental stages reveals key gene expression properties during the flower development of bamboo. Conclusion We showed that a combination of transcriptome sequencing and RNA-seq analysis was a powerful approach to identifying candidate genes related to floral transition and flower development in bamboo species. The results give a better insight into the mechanisms of Moso bamboo flowering and ageing. This transcriptomic data also provides an important gene resource for improving breeding for Moso bamboo.


Introduction
Transcriptome sequencing is a convenient way of rapidly obtaining information on the expressed fraction of a genome [1]. Next-generation sequencing (NGS) technology, such as the Illumina Solexa, Roche 454 and ABI SOLiD platforms, has emerged as a cost-effective approach for high-throughput sequencing. It has revolutionized biological research by rapidly providing genomic and transcriptomic data [2][3][4]. RNA-Seq refers to the whole-transcriptome shotgun sequencing of fragmented mRNA or cDNA, resulting in overlapping short reads covering the entire transcriptome [5,6]. RNA-Seq analysis is powerful, enabling large-scale functional assignments of genes via the assembly of large sequenced transcriptome library. With this technique, quantitative analysis of gene expression can be easily performed without potential bias, allowing a more sensitive and accurate profiling of the transcriptome [7,8]. Despite its obvious potentials, this combined approach has not yet been applied to Moso bamboo (Phyllostachys edulis (Carrière) J. Houzeau, synonym Phyllostachys heterocycla Carrière) [9] research.
Bamboo is one of the most important non-timber forest products in the world. About 2.5 billion people depend on bamboo, ranging from food to raw materials for construction and manufacture [10][11][12]. Bamboos (Bambusoideae) belong to the monophyletic BEP clade (Bambusoideae, Ehrhartoideae, Pooideae) in the grass family (Poaceae), and consist of woody and herbaceous species. Woody bamboos are arborescent, perennial plants characterized by their woody stems and a prolonged vegetative phase lasting decades or even longer before flowering [13]. In addition, they often flower synchronously and die collectively after flowering. Two types of bamboo are usually recognized -sympodial bamboos, which form clumps of culms, and monpodial bamboos which form groves. Under normal circumstances, flowering of monopodial bamboo rarely occurs. However, collective death in some bamboo species has occasionally occurred after flowering over a large area. The reasons for this remain unclear, but it is a devastating blow to bamboo resource base. Therefore, it is important to determine the specific pathways and genes involved in bamboo flowering and flower development.
As a large woody bamboo with high environmental, economic and cultural value in Asia [21], Moso bamboo is a perennial plant characterized by its infrequent sexual reproduction with flowering intervals ranging from several to more than a hundred years. Flowering depends on many factors, including environment, nutrition, climate and the plants' physiological status. Several hypotheses have been proposed for the flowering mechanisms [22,23]. However, there have been few molecular studies on flowering of Moso for various reasons including the difficulty of collecting tissue samples, lack of genomic knowledge, infrequent sexual reproduction and long periods of time between flowering. With the announcement of the genome sequence of Moso bamboo [21], it is now possible to identify and determine the molecular regulation mechanisms of all functional elements in Moso bamboo. The transcriptome represents a comprehensive set of transcribed regions throughout the genome. Therefore, understanding the transcriptome dynamics is essential for revealing functional elements of the genome and interpreting phenotypic variation produced by combinations of genotypic and environmental factors.
In the present study, we aimed to investigate the transcriptome of developing flowers in Moso bamboo through high-throughput Illumina GAII sequencing and mapping short reads to the Moso bamboo genome and reference genes. Here, compared with the previous study reference [as 21], for the first time we collected from the same plant both leaves in the vegetative stage and flowering spikelet samples in different floral developmental stages. As transcriptome results are affected by the impact of external environmental conditions and growth differences among plants, transcriptome results from the same plant are more accurate to elucidate the molecular mechanism of regulation of flower development of bamboo.
Transcripts from four flower growth phases of the same plant were isolated, quantified and sequenced. The transcriptome sequences were then blasted against public databases. Subsequently, the annotated genes were clustered into putative functional categories using the Gene Ontology (GO) framework and grouped into pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG). Finally, we assigned genes putative homologs in model species, including Arabidopsis thaliana, Oryza sativa, Brachypodium distachyon and other relatives in the grass family to determine whether a hierarchy of gene regulation may persist between these species. Taken together, we, for the first time, comprehensively characterized the molecular basis of the physiological processes during the flower development of Moso bamboo using large-scale high-throughput sequencing, and our results may lay a foundation for further gene expression and functional genomic studies in bamboos.

Results and Discussion
Sequence data quality control and filtering In order to get an overview of the Moso bamboo transcriptome, we generated cDNA libraries from leaves of non-flowering plants (abbreviated as CK) and equally mixed RNA extracted from panicles at four different flowering developmental stages (abbreviated as F) using paired-end sequencing by the Illumina platform. In total, we acquired more than 67 million reads from each of the CK and F samples. Through mapping all these short reads to the reference genome of Moso bamboo, we found that about 70% of the reads could be uniquely mapped to the genome. Gene coverage was determined as the percentage of a gene covered by reads. 15,497 and 167,767 reads were detected from each of the F and CK samples respectively, with gene coverage from 90% to 100% ( Figure S1).

The reads mapping to the reference genome dataset
To identify the genes corresponding to these clean reads in each library, the clean reads were mapped to the reference genes expressed in the Moso bamboo genome. Mapping results showed that 39

Putative Moso bamboo floral development transcription factors
One unique characteristic of Moso bamboo is the switch to flowering after a very long period of vegetative growth and the undertermined flowering time. In order to compare the gene expression patterns between flowering and vegetative tissues, we collected panicles and vegetative leaves from non-flowering Moso bamboo plants for Illumina sequencing analysis. More than 714 floral development-related genes were up-regulated in the panicle tissues (with at least a 2-fold difference in the expression level in panicles compared with the level in leaves), including 476 flowering genes in the category of transcription factor genes ( Table 2, Table S3 and Table S4).
Transcriptional regulation is mediated through the interplay between transcription factors and specific-regulatory regions of the genome, leading to gene activation or other changes [25]. Therefore, some transcription factors may be considered as master regulators, since their actions initiate a branching series of downstream effects, including the activation of other transcription factors, ultimately resulting in broad changes in the organism (as [19]). Our study focused on these key regulatory genes in the flower development of Moso bamboo.  [29]. OsMADS18 (rice FUL3-like gene) causes an early flowering phenotype and early initiation of axilliary shoot meristems. It indicates that OsMADS18 acts as a promoter in the differentiation activity of the vegetative shoot [30]. Dreni et al indicated that OsMADS3 and OsMADS58 might have a redundantly function in specifying floral organ identity, including the carpel [31].
Dof (DNA binding with one zinc finger) transcription factors are a group of plant-specific transcription factors containing a single Cys2/Cys2-type zinc-finger-like Dof motif [32,33]. There are 37 Dof transcription factors in the Arabidopsis genome, and all of them have a highly conserved DNA-binding domain and divergent sequences beyond this domain [33][34][35][36]. They are involved in diverse functions, such as seed dormancy [37], carbon metabolism [38], phytochrome signaling pathway [39], phenylpropanoid metabolism [40] and flowering response [41,42]. JcDof3 is a circadian clock regulated gene, and it might be involved in the regulation of flowering time in Jatropha curcas [43]. Li et al. suggested that OsDof12 over-expression induces an early flowering under long-day (LD) conditions, the transcription levels of Hd3a and OsMADS14 were up-regulated [44]. We detected 28 putative Dof transcription factors in panicles, which are homologs of Dof3, Dof4, Dof5, Dof12 and CDF (Cycling DOF Factor) family, revealing similar roles of Dof 3 and Dof 12 genes during the flower  [45,46] and stress responses [47]. Members of this family have been identified as important downstream components of MAPK signaling pathways (as [19]). The involvement of WRKY factors in plant defense is also well documented: some of them have been shown to confer disease resistance [48,49], triggering the expression of defense-related genes in systemic acquired resistance [50][51][52][53]. They may be induced by signaling hormones, such as salicylic acid [54,55], jasmonic acid [56] and gibberellic acid [57]. In the present study, putative homologs of WRKY transcription factors exhibited a similar over-representation, with 110 genes showing high abundance in panicles of Moso bamboo.
The basic helix-loop-helix family (bHLH) consists of genes regulating various processes of flower development, such as controlling the development of carpel margins, as well as the morphogenesis of sepals, petals, stamens and anthers in Arabidopsis thaliana and Eschscholzia californica [58]. In our study, 34 bHLH-like genes were highly expressed in panicle tissues.
MYB transcription factors contain DNA binding domains, and some have been identified as floral developmental regulators [59]. In Arabidopsis, MYB21, MYB24 and MYB57 are crucial factors for the development of stamen filament. It has been reported that R2R3-MYB genes regulate various metabolic pathways, contributing to the control of flavonoid biosynthesis, tryptophan biosynthesis and participating in phenylpropanoid metabolism [60,61]. In plants, basic leucine zipper motif (bZIP) transcription factors regulate a series of processes, including pathogen defense, light and stress signaling, seed maturation and flower development. We found that these upstream regulatory stress-responsive genes were highly expressed during flowering, suggesting a potential connection between drought, pathogen or senescence and bamboo flowering.

Detection of putative genes related to flowering time control and flower development
Previous studies of model plant species showed that the timing of floral induction is controlled by sophisticated regulatory networks that monitor changes in the environment, including autonomous pathway, photoperiod and circadian clock pathway, gibberellins pathway, ambient temperature pathway and age pathway [62,63]. The unusual and infrequent nature of bamboo flowering has attracted the curiosity of scientists and laypeople for centuries [64]. However, little research has been conducted about the molecular mechanism of bamboo flowering due to the difficulty of collecting flowering samples. Through the comparison of Moso bamboo genes found in this study with the NCBI and Uniprot databases, we identified at least 238 genes as homologs of known flowering-related genes from other plants (Table S3). These genes were compared with flowering genes from Oryza sativa, Hordeum vulgare, Brachypodium distachyon, Sorghum bicolor and Arabidopsis thaliana. However, we found that the genes employed in typical flowering promotion pathways (such as those autonomous pathway, ambient-temperature) and floral pathway integrator (FPI) genes (as [63,65]) were not highly expressed in these floral tissues in bamboo. Only late elongated hypocotyl (LHY), phytochrome-interacting facor 3 (PIF3), crytochrome-1 (CRY1), GI-GANTEA (GI) and CONSTANS (CO) genes were detected in Moso bamboo for the photoperiod and circadian clock pathway. We did not detect any genes homologous to components of the circadian clock and photoperiod pathway, such as early flowering 3 (ELF3), early flowering 4 (ELF4), circadian clock associated 1 (CCA1), timing of CAB1 (TOC1), pseudo response regulator (PRR5, PRR7 and PRR9) and phtyochrome (PHYA, PHYB) [66]. The CO gene plays a key role in integrating light and temporal information, and it is essential for determining the flowering time [67]. The CO protein consists of two zinc fingers, including a N-terminal Bdomain mediating protein-protein interactions and a C-terminal CCT domain for nuclear localization [68]. In Arabidopsis, CO, as a transcription factor, promotes flowering by inducing the expression of flowering locus T (FT) [69]. Rice orthologues of the Arabidopsis genes CO and FT have been identified as Hd1 and Hd3a, respectively; and Hd1 controls rice flowering by regulating Hd3a (as [57]). In the present study, we found that 41 putative CO transcription factors were involved in flower development. However, most of them were down-regulated in floral tissues of Moso bamboo. The upstream floral repressor factors, such as LHY and PIF3, were highly expressed, whereas the FPI genes (GI and CRY1) were down-regulated, which might result in the low expression of CO genes. Low expression of CO genes suggested that the flower initiation was more independent of these traditional promotion pathways in bamboo flowering. Genes involved in gibberellins pathway have high expression, which is known as the promotion pathways in sophisticated regulatory networks. GA regulates diverse developmental processes during plant growth, ranging from seed germination, leaf expansion, and stem elongation to floral initiation [70]. Putative homologs of GA-signaling pathway genes were highly expressed, including the gibberellin response modulator (GAMYB) [71] and gibberellin receptors (GID1, GID2) [72]. Gibberellin triggers degradation of the DELLAs (such as GAI, RGA, and RGL78-80) and up-regulates MYB21, MYB24 and MYB57, which are essential for stamen filament growth (as [59]). High expression of these genes indicated the connection between GA and bamboo flowering.
Unlike the flowering pathway genes, over 252 genes involved in the response to drought stress or to other correlative stresses (such as oxidative stress) were highly expressed. Some FMI-related genes and their upstream regulatory drought-responsive genes were highly expressed during flowering, such as Dof3, Dof5, Dof7, Dof10, Dof12 and bZIP transcription factors. Putative homologs of FMI genes [73][74][75], such as AP1 and AP2, MADS1, MADS14 and MADS15, were also highly expressed. High expression of these above-mentioned genes indicated a potential connection between drought or other environmental stress and bamboo flowering.

Changes in gene expression profiles during different flower developmental stages
DGEs (Differentially Expressed Genes) generates absolute rather than relative gene expression measurements and avoids many inherent limitations of microarrays (as [8]). DGEs were used to analyze the gene expression during four stages of flower development in Moso bamboo. Four cDNA libraries (F1, F2, F3 and F4) were sequenced, resulting in between 10 to 11 million high-quality reads. The read sequences were mapped to Moso bamboo reference transcriptome database. About 8,412,664 to 9,367,437 reads were mapped to a gene in the reference genome database, and up to 80% of the sequences in transcriptome could be unequivocally identified to the Moso bamboo genome. About 5,185,207 to 5,820,078 reads were mapped to a gene in the reference gene database, and up to 79% of the sequences in transcriptome could be perfect matched to the Moso bamboo genes (Table 3). Additional Figure S2 shows the quality assessment of reads, sequencing saturation analysis, randomness assessment and gene coverage. We identified differentially expressed genes between samples using a previously developed algorithm by Audic and Claverie [76]. As expected, the majority of gene expression changes occurred between the CK and F1/F2/F3/F4 samples, with more up-regulated genes observed ( Figure 1 & Table S5, S6, S7, and S8). Moreover, about 1% (Table S9) of the up-regulated genes were found to be orphan sequences, which were not annotated to the Gene Ontology database (http://www. Geneontology.org/) with nr annotation, but the expression level of genes were significantly different, so we speculate that these genes may take part in other unique processes and pathways involving in the flower development of Moso bamboo.

Expression levels of putative genes involved in flower development
A total of 80 genes were annotated as flowering regulating pathway in Moso bamboo transcriptome (Table S10). To find whether these genes were putatively involved in flowering regulating pathway, expression profiles of these genes were compared with CK sample by DGEs (shown in Figure 2). From CK to F1 and from CK to F2, MADS, Dof, GID1, GID2, LHY and MYB genes were up-regulated. Some of these genes exhibited a decreased expression over time. The expression of Dof genes was significantly decreased in the anthesis and embryo formation stages, suggesting that Dof genes played a greater role in the early stages of flower development. Flowering inhibitors, such as LHY, PIF3 genes, were up-regulated during flower developmental periods, which might result in the down-regulated expression of CO genes in floral tissues. Low expression of CO genes and high expression of stress-response genes and FMI genes suggested that the activation of FMI was more independent of the traditional flowering regulating pathway in Moso bamboo flowering.

Gene validation and expression analysis
To confirm experimentally that the genes obtained from clean reads and computational analysis were indeed expressed, 8 putative genes related to flower development were chosen for RT-PCR and qRT-PCR analysis (Figure 3 and Table S11).
In the RT-PCR analysis, every selected gene was PCR positive with a single band at the calculated size (data not shown). According to the qRT-PCR results (Figure 3

Conclusions
Little genetic or genomic research has been performed into the fascinating mystery of bamboo flowering despite the economic importance of several bamboo species (as [20]). We investigated the sequences and transcript abundance levels at different   genes suggested that FMI was more independent of these known promotion pathways in Moso bamboo flowering. Moreover, high expression of genes involved in stress-responsive pathways and GA-signaling pathway indicated the potential connection between adversity stress, GA and bamboo flowering (shown in Figure 4). The dataset will provide an important resource foundation for future genetic or genomes studies on bamboo species, and will help to give better insight into the mechanism of the flower development of Moso bamboo.

Ethics Statement
All plant protocols were reviewed and approved by the International Center for Bamboo and Rattan and Key Laboratory of Bamboo and Rattan Science and Technology, State Forestry Administration. All necessary permits were obtained for the field studies from Guangxi Provincial Academy of Forestry and Guilin Forestry Bureau, Dajing County in Guangxi Provence. The field work conducted for sampling did not affect the local ecology and did not involve endangered or protected species.  vegetative to the reproductive stage. At the second stage (F2), the inflorescence axis continued to extend, lateral buds started to differentiate, and panicles grow from the plant but without flowering. At the third stage (F3), anthesis, flowers with both pistils and stamens emerged from glumes; flowering spikelets were collected at this stage. At the last stage (F4), pistils and stamens had withered, and the embryo was forming.

Sample preparation and RNA extraction
Samples were immediately frozen in liquid nitrogen and stored at 280uC until further analysis. Total RNA was extracted using the Trizol reagent (Invitrogen, USA). The quality and purified RNA was initially assessed on an agarose gel and NanoDrop 8000 spectrophotometer (NanoDrop, Thermo Scientific, Germany), and then the integrity of RNA samples was further evaluated using an Agilent 2100 Bioanalyzer (Agilent Technologies, USA).

Library preparation for transcriptome analysis
Samples of two different growth periods were prepared for transcriptome analysis using the Illumina's kit (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. Briefly, mRNA was extracted using magnetic beads with Oligo (dT) (for eukaryotes) or by removing rRNAs from the total RNA (for prokaryotes). Purified mRNA was then mixed with the fragmentation buffer and fragmented into short fragments, which were used as templates for cDNA synthesis. Purified short cDNA fragments were resolved in EB buffer for end reparation and single nucleotide A (adenine) addition, and then they were connected with adapters. Subsequently, suitable cDNA fragments were selected for the PCR amplification as templates. During the QC steps, quantification and qualification of the sample library were assessed using an Agilent 2100 Bioanaylzer and a ABI StepOne-Plus Real-Time PCR System. Finally, the library was sequenced using Illumina HiSeq 2000 at Beijing Genomics Institute (BGI) in Shenzhen, China.  Sequence data quality control and filtering As Moso bamboo genomic resources were available as the reference, the original image data was transferred into sequence data as raw data or raw reads via base calling after the libraries had been sequenced. Then, the clean reads were selected via analysis and assessment of base composition and quality, and filtering of raw reads by using Illumina PIPILINE software. Since the algorithms used in transcriptome construction of the reads provided by the Illumina platform may be severely inhibited by sequencing errors, a stringent cDNA sequence filtering process was employed to select clean reads. Firstly, reads with adaptors were removed. Secondly, reads in which unknown bases comprised more than 5% of the total were removed. Finally, low quality reads in which the percentage of low quality bases (base quality#20) was over 30%, were removed. All the clean reads were deposited in the National Center for Biotechnology Information (NCBI) and can be accessed in the Short Read Archive (SRA) linking to BioProject accession number: SRR1187864, SRR1185317.
Mapping short reads to the Moso bamboo genome and reference gene After reads containing sequencing adapters and low-quality reads (reads containing Ns.5) were removed, the remaining reads were aligned to the Moso bamboo genome using SOAPaligner/ SOAP2 [77], allowing up to two mismatches in each segment alignment. For annotation, transcripts were aligned to four public databases [no redundant protein database (Nr) in NCBI, nonredundant nucleotide database (Nt) in NCBI, Swiss-Prot and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database] by BLAST (E-value , . Gene ontology (GO) classification was analyzed by the Blast2GO software (v2.5.0) based on Nr annotation.

DGEs library preparation and sequencing
Tag library preparation for samples of four different flowering developmental periods (floral bud formation, inflorescence development, anthesis and embryo formation stages) was performed in parallel using the Illumina gene expression sample preparation kit. Briefly, mRNA was extracted using magnetic oligo (dT) beads from total RNA. Purified mRNA was then mixed with the fragmentation buffer and fragmented into short fragments, which were used as templates for cDNA synthesis. Purified short cDNA fragments were resolved in EB buffer for end reparation and single nucleotide A (adenine) addition, and then they were connected with adapters. Subsequently, suitable cDNA fragments were selected for the PCR amplification as templates. During the QC steps, quantification and qualification of the sample library were assessed using an Agilent 2100 Bioanaylzer and an ABI StepOnePlus Real-Time PCR System. Finally, the library was sequenced using Illumina HiSeq 2000 at Beijing Genomics Institute (BGI) in Shenzhen, China.

Evaluation of DGEs libraries
A statistical analysis of the frequency of each tag in different cDNA libraries was performed to compare the gene-expression in different developmental stages. Statistical comparison was conducted using the method previously described by Audic and Claverie [76]. After the expressional abundances in each library were normalized to transcript per million (RPKM), the most differentially regulated genes (differentially expressed genes, DEGs) were selected. The significance of differential transcript abundance was determined using the FDR (False Discovery Rate) control method [78] to justify the p-values, and only genes with an absolute fold change of $2 and a FDR significance score of # 0.001 were included in subsequent analysis. In addition, p value between two samples was determined using the following formula: Where N1 and N2 represent the total number of reads mapped to genome in each sample, and x and y represent the number of reads mapped to a common gene in phase 1 and phase 2, respectively.

Clustering of candidate gene expression profiles
Hierarchical clustering of log-transformed expression data was carried out using the Cluster 3.0 and Treeview programs [79]. Correlations between gene clusters were determined using Pearson's correlation. Heat maps were constructed using the University of Toronto BAR Heatmapper tool (http://www.bar. utoronto.ca/ntools/cgi-bin/ntools_heatmapper.cgi).

Quantitative real-time PCR (qRT-PCR)
To evaluate the validity of Illumina analysis and assess the expression profiles in terms of specific mRNA abundances, 8 putative genes were selected and detected by qRT-PCR, and to further investigate the expression profiles of these genes, qRT-PCR of flowering tissues at different flowering developmental stages were performed separately using leaves of no-flowering plants as the control. Reverse transcription reactions were performed using 2 mg of RNA by M-MLVRT (Promega, USA) according to the manufacturer's instructions. Sequences of 8 selected genes were obtained from the Moso bamboo genome database (http://www.ncgr.ac.cn/bamboo). Primers, picked by using the Primer 3 software (http://www.genome.wi. mit.edu/cgibin/primer/primer3.cgi), as shown in Additional file 13. Tonoplast intrinsic protein (TIP41), cited from Fan et al. [80], were used as the internal housekeeping gene control. Real-time PCR reactions were carried out with LightCycler480 System (Roche, USA) using SYBR Premix EX Taq kit (Roche, USA).The 20 ml reaction mixture contained 0.4 ml (10 mM) of each primer and 2 ml (50 ng) cDNA and 10 ml SYBR Green I Master according to the manufacturer's instructions. Amplification reactions were performed as the following: 95uC for 10 s, 60uC for 10 s, and 72uC for 20 s. All reactions were performed in triplicate, both technical and biological. Data was analyzed using Roche manager software.

Accession code
The Illumina HiSeq 2000 sequencing data for Moso bamboo flower transeriptome and RNA-seq have been deposited in NIH Short Read Archive database under the accession number SRR1187864 and SRR1185317.