Global scale transcriptome analysis reveal differentially expressed genes during early somatic embryogenesis in Dimocarpus longan Lour

Somatic embryogenesis (SE) is a process of somatic cells that dedifferentiate to the totipotent embryonic stem cells and generate embryos in vitro. Longan SE has been established and wildly used as a model system for studying embryogenesis in woody plants, and some SE-related genes had been characterized. In spite of that, a comprehensive overview of SE at a molecular level is still absent. With the aim of understanding the molecular mechanisms underlying SE in longan, we examined the transcriptome changes by using Illumina HiSeq platform from the four distinct developmental stages, including non-embryogenic callus (NEC), embryogenic callus (EC), incomplete compact pro-embryogenic cultures (ICpEC), globular embryos (GE). Our transcriptome reveals the transcription regulation of auxin, cytokinin and other hormones signaling pathway, flavonoids biosynthesis, fatty acid biosynthesis, extracelluar protein encoding genes, and other SE-related genes during early SE. Furthermore, we characterizes the potential molecular markers to distinguish NEC and early SE of longan. The present work provides new insights into future functional studies, as a means of studying the molecular mechanisms in SE.


Conclusion
Our transcriptome reveals the transcription regulation of auxin, cytokinin and other hormones signaling pathway, flavonoids biosynthesis, fatty acid biosynthesis, extracelluar protein encoding genes, and other SE-related genes during early SE. Furthermore, we characterizes the potential molecular markers to distinguish NEC and early SE of longan.
The present work provides new insights into future functional studies, as a means of studying the molecular mechanisms in SE.

Background
Longan (Dimocarpus longan Lour.), a tropical/subtropical evergreen fruit tree within the Sapindaceae family, native to South China and Southeast Asia, is now widely cultivated in Southeast Asia, South Asia, Australia and Hawaii [1]. Logan embryo development status was close association with the seed size, fruit-set rate, fruit production and quality. Base on the observation of histological and cytological, the change of endogenous hormones and polyamines, proteomics analysis of the isozymes and protein, and molecular biology researches on SE-related genes mRNA differential display, homologous cloning, expression pattern by qRT-PCR have been used to illuminate the potential regulation mechanism of longan SE [2]{Z Z Fang, 2010 #39;Lai, 2010 #1}. However, elucidating the embryo development mechanism at a molecular level remains a great challenge due to its highly genetic heterozygosity and difficulties in accessibility of early embryos in vivo [3]. Plant SE shares close similarities at almost all development stages to normal zygotic embryogenesis [4,5], SE has been wildly used as a model system to study the molecular regulation mechanism of early embryogenesis in plants [6]. The longan SE system has been established and extensively used as a model system for investigating embryogenesis in woody plants, which revealed that the concentration of 2,4-D was the key factor in controlling longan high-consistency SE [1,7,8].
Transition from NEC to EC, and from EC to somatic embryo are the key steps of SE.
However, the molecular regulation mechanisms during longan SE remain largely unknown.
The aim of this study was to elucidate the molecular mechanism in the transition from NEC to EC, and during early SE by investigating the expression profiling using Illumina RNA-seq technology, and to identify the molecular marker genes during SE. This RNA-seq of comparative transcriptome analysis will gain new insight into the molecular and developmental mechanisms of longan SE.

RNA-Seq analysis of longan early SE aligned with the Dimocarpus longan Draft Genome
To provide a comprehensive understanding of longan SE at a transcriptional level, we sequenced the four cDNA libraries constructed from the four in vitro embryo developmental stages (NEC, EC, ICpEC and GE, Figure 1) Exon skipping is the least type in all samples, and intron retention is the most popular type of AS events in NEC, ICpEC and GE ( Figure 2).

Global analysis of gene expression across the four distinct developmental processes
There were 22,743,19,745,21,144,21,102 expressed genes in NEC, EC, ICpEC and GE stage. Among these, more than 75.3% of the expressed genes were present in all four developmental stages. Significant numbers of genes were unique expressed in one process only, 2645 genes were only expressed in NEC, however, only 366, 505 and 588 genes were unique present in EC, ICpEC and GE stage, respectively (Figure 3a), which suggested that distinct spatial transcriptional patterns were present in the four developmental processes. To evaluate the differences of molecular response among four samples, gene expression were normalized to FPKM by using RSEM software. After filtering with FPKM>60, a total of 2961 (11.40%), 3445 (13.26%), 3445 (13.26%) and 3442 (13.25%) genes were highly expressed in NEC, EC, ICpEC and GE, respectively ( Table 2).
To reveal the potential key genetic factors involved in early SE, we filtered the significantly differentially expressed genes (DEGs) with |log 2 fold change| ≥ 1 and FDR < 0.001 between these four pairwise comparisons among the four libraries as follow: NEC_vs_EC (Transition from NEC to EC), EC_vs_ICpEC, EC_vs_GE, ICpEC_vs_GE. Among these four comparisons (Figure 3b), a total of 10,642, 4,180, 5,846 and 1,785 DEGs were identified, respectively. Compared with NEC, EC had 4,887 up-regulated and 5,755 downregulated genes. Compared with EC, ICpEC had 2,689 up-regulated and 1,491 downregulated genes, GE had 3,451 up-regulated and 2,395 down-regulated genes. Compared with ICpEC, GE had 832 up-regulated and 953 down-regulated genes. DEGs analysis revealed that longan transcriptome undergoes significantly dynamic changes during SE, particularly during the transition period from NEC to EC. Therefore, the longan SE transcriptome datasets given here may serve as a valuable molecular resource for future studies.

Functional classification of DEGs base on GO and KEGG
To evaluate the potential functions of the DEGs, we used GO terms assignment to classify the functions of DEGs in pairwise comparisons under three GO main categories: biological process, cellular component and molecular function (Additional file 7: Fig. S1). In all pairwise comparisons mentioned above, the term with the largest proportion in "biological process" was 'metabolic process', followed by 'cellular process', 'single-organism process', 'respond to stimulus' and 'localization', the term with the largest proportion in "cellular component" were 'cell' and 'cell part', followed by 'organelle' and 'membrane', the term with the largest proportion in "molecular function" was 'catalytic activity', followed by 'binding', 'transporter activity', 'molecular transducer activity' and 'nucleic acid binding transcription factor activity'.
Such an observation suggested an essential role of hormones and their complicated crosstalk during early SE. Therefore, the plant hormone signaling pathway may be a key regulator in longan early SE.

Flavonoids and fatty acid biosynthesis related genes were differential expressed during longan SE
Flavonoid biosynthesis and fatty acid biosynthesis were the representative KEGG pathways, a total of 125 significant DEGs were assigned to 'flavonoid biosynthesis' across the early SE processes ( Figure 5). In the transition from NEC to EC, the flavonoid biosynthesis key genes, C4H, CHS, CHI, F3H, F3'5'H, DFR, LDOX/ANS, ANR, LAR, CCoAOMT were mainly expressed in NEC, drastic down-regulated from NEC to EC and remained very low expression level during early, except that F3H_Dlo_011012.1, F3'5'H_ Dlo_010496.1 , LAR_ Dlo_022420.1, CCoAOMT_ Dlo_005144.2 were up-regulated in EC, but down-regulated during early SE. Besides, most of the FLS and F3'H family were mainly expressed in NEC, significantly down regulated in EC and kept low FPKM during early SE, especially, 15 F3'H and 9 FLS belonged to NEC specific genes. Only four FLS and six F3'H were first upregulated from NEC to EC and then down-regulated or kept low expression level during early SE (Additional file 2: Table S2).
The fatty acid composition rapidly changed during SE in Daucus carota [57], and Gossypium hirsutum [33]. In our study, a total of 35 fatty acid biosynthesis related genes were differently expressed during SE (Additional file 3: Table S3). From NEC to EC, except ACCase (Dlo_000360.1), three FabG, two FabZ, SAD (Dlo_031652.1), most of the ACCase, FabD, FabF, FabG, FabZ, FabI, FatB and SAD were significantly up-regulated in EC. During early SE, most of the DEGs remained high expression, part of them with slightly up/downregulated expression. For example, ACCase (Dlo_023270.1) and SAD (Dlo_019646.1) were up-regulated from NEC to EC, and highly expressed during early SE. Our results indicated that flavonoids were mainly accumulated in NEC, while fatty acid were mainly accumulated in early SE, especially in EC.

Extracellular protein encoding genes effect on the transition from NEC to EC
It had been reported that extracellular protein germins and germin-like (GLPs), Arabinogalactan proteins (AGPs), chitinases (CHIs), lipid transfer proteins (LTPs) and glycoprotein were critical to SE, and can be served as protein marker during early SE [58].
In our study, 16 CHIs were differentially expressed, and most of them were preferential expressed in NEC, and remarkable down-regulated in EC, only seven CHIs were upregulated during early SE with low FPKM. Among the 14 identified LTPs, only LTP (Dlo_013012.1, Dlo_013014.1) were highly and specific transcripts in early SE, most of them were mainly expressed in NEC and down-regulated from NEC to EC. Meanwhile, 12 GLPs and two secreted glycoprotein genes (EP1-like) were mainly expressed in NEC and kept very low FPKM during early SE. Except AGP10 was first up-regulated in EC and downregulated during early SE, most of the AGPs were down-regulated in EC, and kept relative low expression level during early SE (Additional file 4: Table S4). The results indicated that most of the extracellular protein encoding genes were mainly expressed in NEC, they were predicted to take effect on the transition from NEC to EC.

Characterization of Molecular Markers for longan SE
Several genes have been reported to molecular marker of SE, such as somatic WUS-homeobox (WOX).. In order to characterize the full-scale of molecular markers for early SE, the comparative analysis of FPKM in nine tissues of longan including root, stem, leaf, flower, flower bud, young fruit, pericarp, pulp and seed which RNA-Seq at the same time [48] were employed to select the molecular marker genes during SE. For our purposes here, it is crucial to identify the reliable molecular marker genes for distinguishing NEC from the early SE stages. In our study, several embryogenesis-labeled genes that had been reported previously were differentially expressed in the four development processes (Additional file 5: Table S5). However, some of them showed down-regulated or slightly up-regulated in EC, and kept low expression level from NEC to GE, such as late embryogenesis abundant protein ( LEA14A, LEAD34, LEA76), SERK1, SERK3, WUS, WOX5, WOX3, AIL6, AGL15, CLV1, EMB8, suggesting that they were unseemly markers for longan SE.
In our study, a total of 55 genes were identified as representative molecular markers, which were closely related to SE, can be classified as two main categories: NEC markers and SE molecular markers by their specific expression profiles in all test-samples (Table   4). The SE marker genes were barely or undetected in NEC, highly expressed during early SE, it also can be divided into SE-specific marker and SE-expressed marker. The SEspecific markers were highly transcribed only in somatic embryos, including LEC1, LEC2, CYP78A5, CYP87A3 and three unknown genes (DlU1, DlU2, DlU3) (Table 4). These SEspecific genes may played an important role in longan SE. The SE-expressed markers were similar to SE-specific markers, except that these genes also highly expressed in one or some tested tissues included in this study, including LEC1-like ( (Table 4).
On the contrary, 28 representative NEC marker genes were highly and preferentially  Table 4). The NEC-specific marker genes maybe the key inhibitor of the transition from NEC to EC, while the SE markers may function as promoter in SE development.

qRT-PCR Verification of selected molecular markers
To experimentally confirm that the molecular markers were indeed expressed and played an important role during longan SE, 16 potential molecular markers, including 8 Base on the qRT-PCR results, all selected genes were expressed at varying levels at different development stages ( Figure 6). The selected molecular markers DlLEC1, DlPDF1.3,DlGH3.6,DlPIN1,DlWOX9,DlWOX2,DlGIF2,DlRGF3, DlPLT2 and DlAGL80 were barely or undetected in NEC, while they mainly expressed during early SE, they all highest expressed in EC and then down-regulated during SE, showed relative low expression in TE and CE, indicated that those molecular markers played an important role in EC induction and maintainance. Meanwhile, DlL1L, DlBBM, DlABI3 and DlLTP were highly expressed or up-regulated during SE processes, and minimally or undiscovered expressed in NEC, suggested that those marker genes may positive regulated the longan SE development. In addition, the transcription level of DlLEA5 and DlCHI were highly and specific expressed in NCE, they may the inhibitor of the transition from NEC to EC. Consequently, the NEC and EC molecular markers can effectively used to distinguish the non-embryogenic and embryogenic somatic cells, and played an important role in longan SE.

Auxin and cytokinin play an important role in longan SE
It is well know that auxin and cytokinin ( Over the past 20 years, longan SE has been established and widely used as model system for embryogenesis in woody plants, high concentration of 2,4-D in MS medium was require for inducing EC from immature zygotic embryo, while it suppressed further development of SE, moreover, 2,4-D and KT were the key factors in long term maintenance of longan EC [1,7,8]. Subsequent studies revealed that controlling the doses of 2,4-D could synchronized regulated the developmental processes of longan SE, withdrawal of 2,4-D from the medium triggered further embryo development [46,[68][69][70]. The level of endogenous IAA and CTK in early SE stages (EC, ICpEC and GE) were much higher than that in NEC, and IAA level reached the peak in GE and then significantly decreased at later stages which were cultured on 2,4-D free medium. In addition, the level of IAA higher than CTK at the same stage during early SE. The results indicated that high level of endogenous IAA and lower level of CTK were essential for early SE [3]. However, the molecular mechanism responsible for the endogenous IAA and CTK level changing during SE, and potential crosstalk with each other or other factors remains poorly-understood.
Large numbers of DEGs associated with IAA and zeatin biosynthesis, transport, signal transduction and degradation were identified, the activities of these DEGs could be related to SE. YUCCA_Dlo_013505.1 kept up-regulated expression during early SE. Other IAA synthesis related genes were mainly expressed in NEC with minimal FPKM. The increase of IAA level may due to these differentially expressed genes during early SE. However, more evidences is needed to prove the relationship between these DEGs and increased IAA level.
During SE induction of C. canephora, the balance of free IAA and IAA conjugates is essential for embryogenic potential [73], the conjugation of auxin is synthesized by GH3 family [74], we found that GH3 genes were minimal expressed in NEC, most of them dramatic up-regulated in EC and then down-regulated during early SE, indicated that not only the level of free IAA but also the conjugated IAA is important during longan SE.
Previous studies has revealed that auxin transports is complex and highly regulated for Auxin transcriptionally activates Aux/IAA, GH3 and SAUR family,the Aux/IAA family has 29 members in Arabidopsis, but not all members are induce by auxin [78]. SAUR is the most abundant family of early auxin-inducible genes, but only few members have been functional characterized, OsSAUR39 was reported to negatively regulate auxin biosynthesis and transport [79]. ARF show strongly disturbance during zygotic embryo development [80,81], and ARF5 seems to be importance for SE [82]. Further transcript analysis during SE revealed that the components of auxin signaling: Aux/IAA, ARF, SAUR and other auxin-responsive genes were wildly modulated during SE [9,13]. In our study, 11 ARF significantly up-regulated in EC and remained high during early SE, three ARF first down-regulated in EC and then up-regulated during SE. IAA family showed the similar expression pattern during SE, while most of SAUR were mainly expressed in NEC.

Molecular markers for longan early SE
The 55 molecular markers genes for longan SE belong to several distinct functional categories, they can be used to mark the embryogenic potential of plant cells and study various biochemical and physiological processes of plant embryogenesis and development. PIN1 was involved in auxin polar transport and cellular differentiation during embryogenesis [106,107]. Antisense expression of PIN1 disrupted the formation of somatic embryos and reduced the expression of SE-related genes, indicated that PIN1 was essential for SE induction [108]. GH3.6 is contribute to maintain auxin homeostasis by converting excess IAA to IAA-amino acid conjugates, over-expression of GH3.6 significantly enhanced the accumulation of IAA-Asp [109]. DlPIN1 and DlGH3.6 were specific expressed in early SE and down-regulated during SE, they were suggested as reliable markers for longan early SE.
In our study, CYP78A5 and CYP87A3 were most abundant in EC and follow by ICpEC stage.
In B. napus, CYP78A5 was identified as an early marker for microspore-derived embryos AtSERK1 is sufficient to mark embryogenic competence in culture [115]. However, ZmSERKs were detected in non-embryogenic callus [116], and the identification of SERK genes in rice [117], and wheat [118], suggest that their functions are not limit to embryogenesis. In our study, DlSERKs were expressed not only in SE stages, but also in non-embryogenic callus and other tissues, thus, DlSERK could not be used as a reliable marker for longan SE. 14 kDa proline-rich protein DC2.15 was connected with the initiation of embryogenesis by the removal of auxin [120]. Aquaporins are the major channels of water transport pass through biological membranes, and involved in cell expansion, organ movement and elongation [121]. It is widely acceptable that the extracellular proteins (such as GLPs, LTPs, CHIs) are required for plant differentiation and morphogenesis, they can be used as protein markers for SE [58]. Our study revealed that a total of 28 transcripts were specific and extreme-highly expressed in NEC, while barely or undetected during early SE stages, we suggested these genes as NEC molecular markers, for example, the LEA5, CNOT3, DC2.15, PIP1;2, PIP2;1, GLP3, NsLPT, CAT, POD, GST, et al., these genes might played an important role in the transition from NEC to EC, and were helpfully for distinguishing NEC and early SE. However, some of these markers belonged to the certain gene family with distinct expression patterns during SE, further study of these genes function on longan SE is required.

Conclusions
In summary, our study generated a high resolution transcriptome datasets for longan SE.
A comparative analysis of global gene expression patterns during early SE stages provided subsets of DEGs that regulated SE in longan. Our study revealed the expression profiles of genes involved in auxin and cytokinin signaling pathway, flavonoid and fatty acid biosynthesis pathway, extracellular protein, as well as the representative molecular marker genes, indicating their possible roles in longan SE. This transcriptomic data provides new insights into future functional studies, as a means of studying the molecular mechanisms in SE.

Plant Material and RNA Extraction
The synchronized cultures at different developmental stages, including non-embryogenic callus (NEC), friable-embryogenic callus (EC), incomplete compact pro-embryogenic cultures (ICpEC), globular embryos (GE) of D. longan 'Honghezi', were obtained following previously methods [1,7,8], and stored at -80 0 C for RNA extraction. Total RNAs of the above samples were isolated using Trizol Reagent (Invitrogen, USA), then DNase I was used to digest any genomic DNA. Extracted RNAs were quantified by Agilent 2100 bioanalyzer (Agilent Technologies, USA) and evaluated the integrality by denaturing agarose gel electrophoresis and ethidium bromide staining. RNA samples with A260/A280 ratios between 1.9~2.1, 28S/18S ratios ≥1.0, and integrity numbers (RINs) more than 8.5 were selected to construct cDNA libraries.

Library Construction and RNA Sequencing
After enrichment with oligo(dT) 25 -attached magnetic beads, the purification mRNA was interrupted into short fragments by the fragmentation buffer. Used these short fragments as templates and SuperScript III (Invitrogen, USA) as reverse transcriptase to synthesize first strand cDNA. The second strand cDNA was subsequently synthesized using random primers and end repaired, then adaptors were ligated by T4 DNA ligase after adenylation at the 3'-end. Finally, suitable adaptor-ligated fragments were selected as templates for PCR amplification. The four resulting cDNA libraries were quantified by Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) and qRT-PCR (ABI StepOnePlus Real-Time PCR System, USA), and then RNA Sequencing (RNA-seq) was carried out with an Illumina HiSeq TM 2000 system at The Beijing Genomics Institute (BGI, Shenzhen, China).

RNA-Seq Reads Mapping and Differential Expression
The raw reads were cleaned by removing adapter reads, reads containing poly-N larger than 10%, and low quality reads (Q Phred < 20). Cleaned reads were then aligned to the longan reference genome using Bowtie software (http://bowtiebio.sourceforge.net/index.shtml) and TopHat2 (http://ccb.jhu.edu/software/tophat/index.shtml), read count for each gene was then obtained after mapping. Gene expression levels for each sample were estimated by RSEM The differentially expression analysis of the two varieties were performed using the ratio of their FPKM values, the False Discovery Rate (FDR) was used to determine the P-value threshold. The unique reads with the absolute value of log 2 (Fold change of FPKM)≥ 1 and the FDR < 0.001 were used as the thresholds to identify and compare significantly DEGs in the study.

Expression Annotation and Functional analysis of DEGs.
Gene function was annotated based on the databases of Blast Nr (NCBI non-redundant protein sequences), GO (Gene Ontology), and KEGG (Kyoto Encyclopedia of Genes and Genomes database).The GO and KEGG functional enrichment analysis of DEGs were performed to identify which DEGs were significantly enriched in GO terms or KEGG pathways. GO terms with corrected P-value ≤ 0.05 were considered as significantly enriched terms. The KEGG enrichment was determined by Rich factor, Q-value, and the number of enriched genes in this pathway. Q-value ≤ 0.05 was defined as those with genes that showed significant differential expression.

Gene Validation and Expression Analysis
Sixteen genes with potential roles in longan SE were chosen for validation using quantitative real-time PCR (qRT-PCR) with gene specific primers designed using DNAMAN 7.0. Relative mRNA levels from each gene in RNA isolated as described above from six synchronized cultures during longan SE, including NEC, EC, ICpEC, GE, TE and CE were quantified with respect to internal standards. All reactions were performed with three replicates in a LightCycler 480 qRT-PCR instrument (Roche Applied Science, Switzerland).
The expressive abundance of the sixteen selected genes were calculated relative to the expression of reference genes DlFSD, DlEF-1a, and Dlelf4a. Gene names, primer sequences, product sizes, and annealing temperatures are given in additional file 6: Table   S6. Tables   Table 1 Statistics