Global scale transcriptome analysis reveals differentially expressed genes involve in early somatic embryogenesis in Dimocarpus longan Lour

Background Somatic embryogenesis (SE) is a process of somatic cells that dedifferentiate to totipotent embryonic stem cells and generate embryos in vitro. Longan SE has been established and wildly used as model system for studying embryogenesis in woody plants, SE-related genes had been characterized. In spite of that, a comprehensive overview of SE at a molecular level is still absent. To understand the molecular mechanisms during longan SE, we examined the transcriptome changes by using Illumina HiSeq from the four distinct developmental stages, including non-embryogenic callus (NEC), embryogenic callus (EC), incomplete compact pro-embryogenic cultures (ICpEC), globular embryos (GE). Results RNA-seq of the four samples generated a total of 243.78 million high quality reads, approximately 81.5% of the data were mapped to longan genome. The cDNA libraries of NEC, EC, ICpEC and GE, generated 22,743, 19,745, 21,144, 21,102 expressed transcripts, 1935, 1710, 1816, 1732 novel transcripts, 2645, 366, 505, 588 unique genes, respectively. Comparative transcriptome analysis showed that a total of 10,642, 4180, 5846 and 1785 genes were differentially expressed in the pairwise comparisons of NEC_vs_EC, EC_vs_ICpEC, EC_vs_GE, ICpEC_vs_GE, respectively. Among them, plant hormones signalling related genes were significantly enriched, especially the auxin and cytokinin signalling components. The transcripts of flavonoid biosynthesis related genes were mainly expressed in NEC, while fatty acid biosynthesis related genes mainly accumulated in early SE. In addition, the extracelluar protein encoding genes LTP, CHI, GLP, AGP, EP1 were related to longan SE. Combined with the FPKM value of longan nine tissues transcription, 27 SE specific or preferential genes (LEC1, LEC1-like, PDF1.3, GH3.6, AGL80, PIN1, BBM, WOX9, WOX2, ABI3, et al.) and 28 NEC preferential genes (LEA5, CNOT3, DC2.15, PR1–1, NsLTP2, DIR1, PIP1, PIP2.1, TIP2–1, POD-P7 and POD5 et al.) were characterized as molecular markers for longan early SE. qRT-PCR validation of SE-related genes showed a high correlation between RNA-seq and qRT-PCR data. Conclusion This study provides new insights into the role of the transcriptome during early SE in longan. Differentially expressed genes reveal that plant hormones signalling, flavonoid and fatty acid biosynthesis, and extracelluar protein related genes were involved in longan early SE. It could serve as a valuable platform resource for further functional studies addressing embryogenesis in woody plants.


Background
Longan (Dimocarpus longan Lour.), a tropical/subtropical evergreen fruit tree within the Sapindaceae family, native to South China and Southeast Asia, is now widely cultivated in Southeast Asia, South Asia, Australia and Hawaii [1]. Logan embryo development status was close association with the seed size, fruit-set rate, fruit production and quality. Base on the observation of histological and cytological, the change of endogenous hormones and polyamines, proteomics analysis of the isozymes and proteins, molecular biology researches on SE-related genes mRNA differential display, homologous cloning, and expression pattern by qRT-PCR have been used to illuminate the potential regulation mechanism of longan SE [2]. However, elucidating the embryo development mechanism at a molecular level remains a great challenge due to its highly genetic heterozygosity and difficulties in accessibility of early embryos in vivo [3]. Plant SE shares close similarities at almost all development stages to normal zygotic embryogenesis [4,5], SE has been wildly used as a model system to study the molecular regulation mechanism of early embryogenesis in plants [6]. The longan SE system has been established and extensively used as a model system for investigating embryogenesis in woody plants, which revealed that the concentration of 2,4-D was the key factor in controlling longan high-consistency SE [1,7,8].
To date, the transcript profiling of longan embryogenic callus (EC) had been illuminated by Lai and Lin [46], which revealed numerous embryogenesis-related and reproductive growth related unigenes in EC. Lin and Lai [47] had identified and profiled the conserved and novel miRNA during longan SE by using Solexa sequencing combined with computational, and qRT-PCR methods, and the potential roles of 20 conserved and 4 novel miRNA in longan SE were described by their tissue or stage-specific expression profiling. Recently, longan draft genome sequences become available [48], which provided the comprehensive genomic information for studying the molecular regulation of SE. Transition from NEC to EC, and from EC to somatic embryo are the key steps of SE. However, the molecular regulation mechanisms during longan SE remain largely unknown. To elucidate the molecular mechanism in the transition from NEC to EC, and during early SE by investigating the expression profiling using Illumina RNA-seq technology, and to identify the molecular marker genes during SE. This RNA-seq of comparative transcriptome analysis will gain new insight into the molecular and developmental mechanisms of longan SE.

RNA-Seq analysis of longan early SE aligned with the Dimocarpus longan draft genome
To provide a comprehensive understanding of longan SE at a transcriptional level, we sequenced the four cDNA libraries constructed from the four in vitro embryo developmental stages (NEC, EC, ICpEC, and GE, Fig. 1). A total of 243,783,126 clean reads (comprising approximately 24.38 G of nucleotides) were obtained after data cleaning and quality checks. After aligned with longan reference genome [48], 48 (39,768), followed by ICpEC (36,446), and NEC (35,084), and the smallest in EC (19,056). Exon skipping is the least type in all samples, and intron retention is the most popular type of AS events in NEC, ICpEC and GE (Fig. 2).
To reveal the potential key genetic factors involved in early SE, we filtered the significantly differentially expressed genes (DEGs) with |log 2 fold change | ≥ 1 and FDR < 0.001 between these four pairwise comparisons as follow: NEC_vs_EC, EC_vs_ICpEC, EC_vs_GE, and ICpEC_vs_GE. Among these four comparisons (Fig. 3b), a total of 10,642, 4180, 5846 and 1785 DEGs were identified, respectively. Compared with NEC, EC had 4887 up-regulated and 5755 down-regulated genes. Compared with EC, ICpEC had 2689 up-regulated and 1491 downregulated genes, GE had 3451 up-regulated and 2395 down-regulated genes. Compared with ICpEC, GE had 832 up-regulated and 953 down-regulated genes. DEGs analysis revealed that longan transcriptome undergoes significantly dynamic changes during SE, particularly during the transition period from NEC to EC. Therefore, the longan SE transcriptome datasets given here may serve as a valuable molecular resource for future studies.

Functional classification of DEGs base on GO and KEGG
To evaluate the potential functions of the DEGs, we used GO terms assignment to classify the functions of  DEGs in pairwise comparisons under three GO main categories: biological process, cellular component and molecular function (Additional file 1: Figure S1). In all pairwise comparisons, the term with the largest proportion in "biological process" was 'metabolic process', followed by 'cellular process', 'single-organism process', 'respond to stimulus' and 'localization', the term with the largest proportion in "cellular component" were 'cell' and 'cell part', followed by 'organelle' and 'membrane', the term with the largest proportion in "molecular function" was 'catalytic activity', followed by 'binding', 'transporter activity', 'molecular transducer activity' and 'nucleic acid binding transcription factor activity' .
As showed in Fig. 4c, TRIT1, a gene involved in cis-zeatin synthesis was up-regulated from NEC to GE. Cis-ZOG family involved in Cis-zeatin O-glycosylation were highly expressed in NEC, and significantly downregulated from NEC to EC. During early SE, five CisZOG were up-regulated from EC to ICpEC, four Cis-ZOG were down-regulated from ICpEC to GE. In transzeatin biosynthesis, six IPT1,4, five CYP735A, four CKX, three UGT76C were noteworthy down-regulated from NEC to EC; two IPT1,4, four CYP735A, one CKX, three UGT76C were up-regulated in EC. During early SE, IPT1,4 family, five CYP735A, two CKX, four UGT76C were up-regulated during early SE with minimal FPKM. Among the cytokinin signal pathway, two A_ARR, 10 B_ ARR, 15 CRE1 were mainly expressed in NEC, and down-regulated in EC. One A_ARR, five B_ARR, seven CRE1 were up-regulated in EC. 13 CRE1, seven B_ARR and all A_ARR showed up-regulated expression during early SE, two B_ARR and five CRE1 were downregulated during early SE (Fig. 4d).
In addition, numerous genes involved in abscisic acid, gibberellin, ethylene, salicylic acid, jasmonic acid and brassinosteroid signal transduction pathway were differentially expressed during longan SE (Additional file 4: Figure S4; Additional file 5: Table S1 a-h). Such an  observation suggested an essential role of hormones and their complicated crosstalk during early SE. Therefore, the plant hormones signaling pathway may be the key regulator during longan early SE.
In our study, 11 R2R3-MYB transcripts were differentially expressed. During longan SE, MYB12 and MYB111 were barely expressed in NEC, significant up-regulated from NEC to EC and remained high during early SE. MYB75, MYB113, MYB4 and MYB123 were significant down-regulated in EC, and kept relative low expression during early SE.
The fatty acid composition rapidly changed during SE in Daucus carota [57], and Gossypium hirsutum [33]. In our study, a total of 35 fatty acid biosynthesis related genes were differently expressed during SE (Additional file 7: Table S3). From NEC to EC, except ACCase (Dlo_ 000360.1), three FabG, two FabZ, SAD (Dlo_031652.1), most of the ACCase, FabD, FabF, FabG, FabZ, FabI, FatB and SAD were significantly up-regulated in EC. During early SE, most of the DEGs remained high expression, part of them with slightly up/down-regulated expression. For example, ACCase (Dlo_023270.1) and SAD (Dlo_019646.1) were up-regulated from NEC to EC, and highly expressed during early SE. Our results indicated that flavonoids were mainly expressed in NEC, while fatty acid were mainly accumulated in early SE stages, especially in EC.

Extracellular protein encoding genes effect on the transition from NEC to EC
It had been reported that extracellular protein germins and germin-like (GLPs), Arabinogalactan proteins (AGPs), chitinases (CHIs), lipid transfer proteins (LTPs) and glycoprotein were critical to SE, and can be served as protein with a rainbow color scale, each row represents a single gene, and each column represents a sample. The IDs and names of selected DEGs are indicated to the right of the histograms marker during early SE [58]. In our study, 16 CHIs were differentially expressed, and most of them were preferential expressed in NEC, and remarkable down-regulated in EC, only seven CHIs were up-regulated during early SE with low FPKM. Among the 14 identified LTPs, only LTP (Dlo_013012.1, Dlo_013014.1) were highly and specific expressed in early SE, most of them were mainly expressed in NEC and down-regulated from NEC to EC. Meanwhile, 12 GLPs and two secreted glycoprotein genes (EP1-like) were mainly expressed in NEC and kept very low FPKM during early SE. Except AGP10 was first upregulated in EC and down-regulated during early SE, most of the AGPs were down-regulated in EC, and kept relative low expression level during early SE (Additional file 8: Table S4). The results indicated that most of the extracellular protein encoding genes were mainly expressed in NEC, they were predicted to involve in the transition from NEC to EC.

Characterization of molecular markers for longan SE
Several genes have been reported to molecular marker of SE, such as somatic embryogenesis receptor-like kinase (SERK), leafy cotyledon1 (LEC1), BABYBOOM (BBM), wuschel (WUS), WUS-homeobox (WOX). In order to characterize the full-scale of molecular markers for early SE, the comparative analysis of FPKM in nine tissues of longan [48], including root, stem, leaf, flower, flower bud, young fruit, pericarp, pulp and seed [48] were employed to select the molecular marker genes during SE. For our purposes here, it is crucial to identify the reliable molecular marker genes for distinguishing NEC stage from EC, ICpEC and GE stages. In our study, several embryogenesis-labeled genes that had been reported previously were differentially expressed in each stage (Additional file 9: Table S5). However, some of them showed down-regulated or slightly up-regulated in EC, and kept low expression level from NEC to GE, such as late embryogenesis abundant protein (LEA14A, LEAD34, LEA76), SERK1, SERK3, WUS, WOX5, WOX3, AIL6, AGL15, CLV1, EMB8, suggesting that they were unseemly markers for longan SE.
On the contrary, 28 representative NEC marker genes were highly and preferentially expressed in NEC, barely or  Table 4). The NEC-specific marker genes maybe the key inhibitor of the transition from NEC to EC, while the SE markers may function on SE development.

qRT-PCR verification of selected molecular markers
To experimentally confirm that the molecular markers were indeed expressed and played a key role during longan SE, 16         were selected for qRT-PCR identification in the synchronized cultures at distinct developmental stages during longan SE, including NEC, EC, ICpEC, GE, torpedo-shaped embryos (TE) and cotyledonary embryos (CE). Base on the qRT-PCR results, all selected genes were expressed at varying levels at different development stages (Fig. 6). The selected molecular markers DlLEC1, DlPDF1.3, DlGH3.6, DlPIN1, DlWOX9, DlWOX2, DlGIF2, DlRGF3, DlPLT2 and DlAGL80 were barely or undetected in NEC, while they mainly expressed during early SE, they all highest expressed in EC and then down-regulated during SE, showed relative low expression in TE and CE, indicated that those molecular markers played an important role in EC induction and maintainance. Meanwhile, DlL1L, DlBBM, DlABI3 and DlLTP were highly expressed or up-regulated during SE processes, and minimally or undiscovered expressed in NEC, suggested that those marker genes may positive regulated the longan SE development. In addition, the transcription level of DlLEA5 and DlCHI were highly and specific expressed in NCE, they may the inhibitor of the transition from NEC to EC. qRT-PCR validation of SE-related genes also showed a high correlation between RNA-seq and qRT-PCR data (Additional file 10: Table S6).

Auxin and cytokinin play an important role in longan SE
It is well know that auxin and cytokinin (CTK) were key factors of plant cell division, differentiation, and SE induction [59]. Meanwhile, the level of endogenous IAA and CTK were influenced by the application of exogenous auxin and CTK [3,10,[60][61][62]. Auxin was consider as a central regulator in SE, probably due to the establishment of auxin gradients during SE induction [9]. So far, the exogenous application of auxin during SE has been well documented [9,10,27,60]. Among the auxin, 2,4-dichlorophenoxyacetic acid (2,4-D) was most effective and widely used for induction of SE in several plants [63][64][65]. The level of endogenous IAA was correlated with proembryogenic mass formation and high-frequency SE competency [66]. Previous study had also proved that dynamic change of endogenous IAA was among the first signals leading to the induction of SE [67].
Over the past 20 years, longan SE has been established and widely used as model system for embryogenesis in woody plants, high concentration of 2,4-D in MS medium was require for inducing EC from immature zygotic embryo, while it suppressed the further development of SE, moreover, 2,4-D and KT were the key factors in long term maintenance of longan EC [1,7,8]. Subsequent studies revealed that controlling the doses of 2,4-D could synchronized regulated the developmental processes of longan SE, withdrawal of 2,4-D from the medium triggered further embryo development [46,[68][69][70]. The level of endogenous IAA and CTK in early SE stages (EC, ICpEC and GE) were much higher than NEC stage, IAA level reached the peak in GE and then significantly decreased at later stages. In addition, the level of IAA higher than CTK at the same stage during early SE. The results indicated that high level of endogenous IAA and lower level of CTK were essential for early SE [3]. However, the molecular mechanism responsible for the endogenous IAA and CTK level changing during SE, and potential crosstalk with each other or other factors remains poorly-understood.
The increase of IAA during longan early SE might be due to the increased biosynthesis and transition of endogenous auxin precursor [9]. The tryptophan (Trp) dependent IAA biosynthesis was an important pathway in higher plants, exogenous applied the doses of Trp and IAA had similar enhancement during rice SE [71]. In our study, the expression level of ASA, IGS, TSA, TSB, the key genes in Trp synthesis, were drastic up-regulated in EC and remained high in early SE, only PAI showed NEC specific with low FPKM, suggested that the level of Trp during early SE was higher than NEC, high IAA level might be due to high level of auxin precursor during early SE. YUCCAs family encoding key enzymes in IAA biosynthesis, were required for SE induction in Arabidopsis [72], and three YUCCAs and AAO1, one NIT, CYP71A13 and three ST5a, showed up-regulated expression from NEC to EC, two YUCCAs, AAO1, ST5a were down-regulated during early SE, while YUCCA_Dlo_013505.1 kept upregulated expression during early SE. Other IAA synthesis genes were mainly expressed in NEC with minimal FPKM. The increase of IAA level may due to these differentially expressed genes during early SE. However, more evidences is needed to prove the relationship between these DEGs and increased IAA level.
During SE induction of C. canephora, the balance of free IAA and IAA conjugates was essential for embryogenic potential [73], the conjugation of auxin was synthesized by GH3 family [74], we found that GH3 family genes were minimal expressed in NEC, most of them dramatic upregulated in EC and down-regulated during early SE, indicated that the conjugated IAA also played an important role in longan SE. Previous studies had revealed that auxin transports was complex and highly regulated for embryogenic development [75]. TIR1 mediated Aux/IAA proteins degradation and auxin-regulated transcription in the present of auxin [76], while TIR1 genes were downregulated in EC and remained low during longan early SE. AUX1, which mediated influx of IAA into cells, were mainly expressed in NEC, and down-regulated during early SE. PIN1 played a fundamental role in maintaining the embryonic auxin gradients [77], were up-regulated in EC and kept high in ICpEC and GE in our study.
Auxin transcriptionally activated Aux/IAA, GH3 and SAUR family, the Aux/IAA family had 29 members in Arabidopsis, but not all members were induce by auxin [78]. SAUR was the most abundant family of early auxin-inducible genes, but only few members had been functional characterized, OsSAUR39 was reported to negatively regulate auxin biosynthesis and transport [79]. ARF showed strongly disturbance during zygotic embryo development [80,81], and ARF5 seemed to be importance for SE [82]. Further transcript analysis during SE revealed that the components of auxin signaling: Aux/ IAA, ARF, SAUR and other auxin-responsive genes were wildly modulated during SE [9,13]. In our study, 11 ARF significantly up-regulated in EC and remained high during early SE, three ARF first down-regulated in EC and then up-regulated during SE. IAA family showed the Fig. 6 qRT-PCR verification of the selected molecular markers during longan SE. Non-embryogenesis callus (NEC), friable-embryogenesis callus (EC), incomplete compact pro-embryogenic cultures (ICpEC), globular embryos (GE), torpedo-shaped embryos (TE) and cotyledonary embryos (CE). DlFSD, DlEF1a, and Dlelf4a are used as reference genes. Data are means±SD (n = 3) similar expression pattern during SE, while most of SAUR were mainly expressed in NEC.
Other than auxin being a main inducer of SE, exogenously supplied CTK to induce SE was well established in a lot of species [83][84][85]. Large numbers of transcripts involved in zeatin biosynthesis and signal transduction were differentially expressed during cotton SE [10]. Meanwhile, endogenous CTK level were higher in SE than in NEC [3]. From NEC to EC, a total of 40 DEGs implicated in cytokinin signal transduction, including 22 CRE1 (seven up-regulated and 15 down-regulated), 15 B-ARR (five up-regulated and 10 down-regulated), 3 A-ARR (one up-regulated and two down-regulated). During early SE, a total of 32 DEGs involved in cytokinin signaling pathway, most of them were up-regulated during early SE. In zeatin synthesis pathway, TRIT1 was upregulated from NEC to ICpEC, most of CisZOG, IPT1,4, CYP735A, CKX, UGT76C were down-regulated in EC and remained low during early SE. However, IAA and zeatin biosynthesis and signal transduction related genes showed complex and integrated regulation during SE, further study of these genes is required in longan SE.

SE-related molecular marker genes play a key role during longan SE
The molecular marker genes for longan SE belong to several distinct functional categories, they can be used to mark the embryogenic potential of plant cells and study various biochemical and physiological processes of plant embryogenesis and development. A number of transcription factors (TFs) had been reported as key factors in SE induction. In Brassica napus, LEC1, LEC2, FUS3, ABI3, WOX9, WOX2, BBM, genes belonged to TFs, were identified as molecular markers for early microspore embryogenesis [86]. In our study, 10 molecular markers were TFs (DlLEC1, DlL1L, DlLEC2, DlABI3, DlFUS3, DlWOX9, DlWOX2, DlAGL80, DlBBM, DlPLT2), their functions on embryogenesis had been well characterized in various plants. Ectopic expression of LEC1 was sufficient to trigger embryogenic potential and to induce somatic embryo from Arabidopsis leaf surface [87]. ZmLEC1 was used as a reliable marker for early SE in maize as its expression pattern during SE was similar to that of AtLEC1 during zygotic embryogenesis [88]. Mutational analyses in Arabidopsis showed that LEC genes were essential for induction of SE [37]. Ectopic expression of a carrot C-LEC1 which was driven by AtLEC1 promoter, rescued the defects of lec1-1 mutant [89]. Moreover, ectopic-expressed AtLEC1 in tobacco induced the start of embryogenic transition [90]. The LEC1-like (L1L) was most closely related to LEC1 and required for normal embryo development, ectopic-expressed L1L in Arabidopsis can complement LEC1 functions [91]. Meanwhile, L1L expression was mainly accumulated in the early stage SE of Theobroma cacao [92], Vitis vinifera [93], and Helianthus annuus [94].
LEC2, ABI3, FUS3 were B3 domain-containing transcription factors, ectopically expressed AtABI3 do not induced SE but endowed the embryo with traits to seedling [95]. BBM and PLT2 were clustered to AP2/ERF transcription factor family, their functions on embryogenesis and root meristem were overlap [96][97][98]. Overexpression of BBM triggered spontaneous somatic embryo formation in Arabidopsis thaliana and Brassica napus, BBM was server as a marker for embryogenesis cells in Brassica napus [96]. Recently study show that BBM and PLT2 induced SE in a quantitatively and context dependent manner by LEC1-ABI3-FUS3-LEC2 (LAFL) network, and LAFL/AGL15 were required for BBM mediated embryogenesis [40]. In this assay, DlLEC1, DlLEC2, DlFUS3 were early SE-specific genes, DlL1L, DlBBM, DlABI3, DlPLT2 were highly expressed during the SE processes, they can be used as remarkable markers for longan early SE.. To date, AGL15 was the only MADS-BOX member which preferentially expressed in developing embryos and promote the initiation of SE [44,45], and AGL80 was essential for the central cell and endosperm development [99]. However, DlAGL15 was considered as poor marker. Firstly, we suggested another MADS-BOX gene DlAGL80, a SE-specific gene as a new marker for longan early SE.
WUS was a critical regulator for stem cell fate in the shoot apical meristem [100]. Over-expression of AtWUS initiated the acquisition of embryogenic competence in Gossypium hirsutum [41,42]. WUS was suggested as a useful gene marker for SE initiation [101]. Meanwhile, WOX genes marked cell fate during early embryogenesis in Arabidopsis [102], WOX2 was used as potential marker during early SE [103]. STIMPY/WOX9 played an important role in promoting cell proliferation and preventing precocious differentiation in emerging seedlings [104]. WOX2 and WOX9 were highly expressed at the early stage of SE in Picea abies, they may function together on conifer embryo patterning [105]. In addition, DlWUS was isolated from embryogenic callus and expressed in all the stage of SE, which consistent with our transcriptome date suggested that DlWUS genes were poor markers during longan SE. qRT-PCR verification demonstrated that DlWOX2 and DlWOX9 were specific expressed in early SE and down-regulated during SE, they might played an importance role in longan early SE.
PIN1 was involved in auxin polar transport and cellular differentiation during embryogenesis [106,107]. Antisense expression of PIN1 disrupted the formation of somatic embryos and reduced the expression of SE-related genes, indicated that PIN1 was essential for SE induction [108]. GH3.6 was contribute to maintain auxin homeostasis by converting excess IAA to IAA-amino acid conjugates, over-expression of GH3.6 significantly enhanced the accumulation of IAA-Asp [109]. DlPIN1 and DlGH3.6 were specific expressed in early SE and down-regulated during SE.
In our study, CYP78A5 and CYP87A3 were most abundant in EC and follow by ICpEC stage. In B. napus, CYP78A5 was identified as an early marker for microspore-derived embryos development [86]. PDF1.3 was closely related to Arabidopsis Protodermal factor 1, a gene exclusively expressed in L1 layer of vegetative, inflorescence, floral meristems and specific-expressed in protodermal cell during embryogenesis which related to cell fate determination [110]. In addition, AtGRP-5 was associated with somatic embryo formation in Arabidopsis and eggplant [111]. RGF3 and GIF2 were key genes of cell proliferation, showed SE-specific expression pattern during early SE. RGF3 belonged to root meristem growth factors family that played the redundant role in maintaining the post-embryonic root stem cell niche and by positive regulating cell proliferation [112]. GIF2 was required for cell proliferation and lateral organs grow [113]. Those SErelated genes DlPIN1, DlGH3.6, DlPDF1.3, DlGRP-1, DlRGF3 and DlGIF2 can be use to mark the early stage of longan early SE. Furthermore, DlLBP, DlKCS, DlZDS and DlRPL17, DlMAN7 and DlU1, DlU2, DlU3 were specific accumulated in early SE, despite that no functions on SE have been published yet for them, suggested that they might be the key genes for longan SE.
SERK played a key role in the acquisition of embryogenic competence in plant cells, DcSERK was identified as a suitable marker for SE as it only abundant in embryogenic cultures and ceased after the globular stage, but not in any other tissues [34]. In Dactylis glomerata, SERK showed the similar expression pattern with DcSERK and used as a convenient marker for cells competent to form embryos in monocots [114]. AtSERK1 was highly expressed during Arabidopsis embryogenic cell formation and early embryogenesis, suggested that AtSERK1 was sufficient to mark embryogenic competence in culture [115]. However, ZmSERKs were detected in non-embryogenic callus [116], and the identification of SERK genes in rice [117], and wheat [118], suggested that their functions were not limit to embryogenesis. In our study, DlSERKs were expressed not only in SE stages, but also in nonembryogenic callus and other tissues.
LEA5 belonged to the fifth group of late embryogenesis abundant proteins gene, were abundant in late embryogenesis of mature seed, and involved in the abiotic stresses responses [119]. CNOT3, might be a new CCR4-NOT complex gene in plant, which had proved in regulation of cell division in HeLa cell, while its functions on plant was poorly-understood. 14 kDa proline-rich protein DC2.15 was connected with the initiation of embryogenesis by the removal of auxin [120]. Aquaporins were the major channels of water transport pass through biological membranes, and involved in cell expansion, organ movement and elongation [121]. It is widely acceptable that the extracellular proteins (such as GLPs, LTPs, CHIs) were required for plant differentiation and morphogenesis, they were used as protein markers for SE [58]. Our study revealed that a total of 28 transcripts were specific and extreme-highly expressed in NEC, while barely or undetected during early SE stages, for example, the LEA5, CNOT3, DC2.15, PIP1;2, PIP2;1, GLP3, NsLPT, CAT, POD, GST, et al., these genes might play an important role in the transition from NEC to EC. However, some of these markers belonged to the certain gene family with distinct expression patterns during SE, further study of these genes function on longan SE is required.

Conclusions
In summary, our study generated a high resolution transcriptome datasets for longan SE. A comparative analysis of global gene expression patterns during early SE stages provided subsets of DEGs that regulated SE in longan. Our study revealed the expression profiles of genes involved in plant hormones such as auxin and cytokinin signaling pathway, flavonoid and fatty acid biosynthesis pathway, extracellular protein, as well as the representative molecular marker genes, indicating their possible roles in longan SE. This transcriptomic data provides new insights into future functional studies, as a means of studying the molecular mechanisms in SE.

Plant material and RNA extraction
The synchronized cultures at different developmental stages, including non-embryogenic callus (NEC), friableembryogenic callus (EC), incomplete compact proembryogenic cultures (ICpEC), globular embryos (GE), torpedo-shaped embryos (TE) and cotyledonary embryos (CE) of D. longan 'Honghezi' were obtained following previously methods [1,7,8,68]. To obtained.the synchronized cultures of NEC, EC, ICpEC, GE and TE, embryogenic calli was transferred to MS basal medium (2% sucrose and 6 g/L agar, pH 5.8) supplemented with 0.2% activated carbon and 1.0 mg/L, 0.5 mg/L, 0.1 mg/L, 0.05 mg/L 2,4-dichlorophenoxyacetic acid (2,4-D), respectively. Embryogenic calli was transferred to MS basal medium (5% sucrose and 6 g/L agar, pH 5.8) to obtained the synchronized cultures of CE. The synchronized cultures of different stage were cultured in three biological replicates, each replicates consisting of 10 culture bottles, and were confirmed by the histological observations as shown schematically in Fig. 1. We collected the NEC, EC, ICpEC and GE samples from 5 bottles in each of the three replicates, and stored at − 80°C for RNA extraction.
Total RNA was extracted separately from NEC, EC, ICpEC and GE in the three biological replicates using Trizol Reagent (Invitrogen, USA), then DNase I was used to digest any genomic DNA. Extracted RNAs were quantified by Agilent 2100 bioanalyzer (Agilent Technologies, USA) and evaluated the integrality by denaturing agarose gel electrophoresis and ethidium bromide staining. RNA samples with A260/A280 ratios between 1.9~2.1, 28S/18S ratios ≥1.0, and integrity numbers (RINs) more than 8.5 were selected to construct cDNA libraries. The RNA of the three biological replicates were mixed in equal amounts and used for cDNA library construction.

Library construction and RNA sequencing
After purification with oligo (dT) 25 -attached magnetic beads, the mRNA was interrupted into short fragments by divalent cations under elevated temperature. Then, these cleaved RNA fragments were used to synthesize first-strand cDNA using a random hexamer primer and the SuperScript III (Invitrogen, USA) reverse transcriptase. The second-strand cDNA was subsequently synthesized using random primers and end repaired, then adaptors were ligated by T4 DNA ligase after adenylation at the 3′-end. Eventually, suitable adaptor-ligated fragments were selected as templates for PCR amplification to generate the final cDNA library. The four resulting cDNA libraries were quantified by Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA) and qRT-PCR (ABI StepOnePlus Real-Time PCR System, USA), and then RNA Sequencing (RNA-seq) was carried out with an Illumina HiSeq™ 2000 system at The Beijing Genomics Institute (BGI, Shenzhen, China). The entire set of raw reads was submitted to NCBI Sequence Read Archive under the accession number: PRJNA565345.

RNA-Seq reads mapping and differential expression
The raw reads were cleaned by removing adapter reads, reads containing poly-N larger than 10%, and low quality reads (Q Phred < 20). Cleaned reads were then aligned to the longan reference genome using Bowtie software (http://bowtie-bio.sourceforge.net/ index.shtml) and TopHat2 (http://ccb.jhu.edu/software/tophat/index.shtml), read count for each gene was then obtained after mapping. Gene expression levels for each sample were estimated by RSEM (RNA-Seq by Expectation Maximization) software [122]. The expression levels of matched genes in each cDNA library were derived and normalized to FPKM (Fragments Per Kilobase of exon per Million fragments mapped) [123]. The differentially expression analysis of the pairwise comparison of RNA-Seq libraries were confirmed by using the Poisson Distribution analysis method [124], and the False Discovery Rate (FDR) was used to determine the P value threshold. The unique reads with the absolute value of log 2 (Fold_Change) ≥ 1 and the FDR < 0.001 were used as the thresholds to define as differentially expressed genes (DEGs) in the pairwise comparisons (NEC_vs_EC, EC_vs_ICpEC, EC_ vs_GE, ICpEC_vs_GE).

Expression annotation and functional analysis of DEGs
Gene function was annotated based on the databases of Blast Nr (NCBI non-redundant protein sequences), GO (Gene Ontology), and KEGG (Kyoto Encyclopedia of Genes and Genomes database). The GO and KEGG functional enrichment analysis of DEGs were performed to identify which DEGs were significantly enriched in GO terms or KEGG pathways. GO terms with corrected P-value ≤0.05 were considered as significantly enriched terms. The KEGG enrichment was determined by Rich factor, Q-value, and the number of enriched genes in this pathway. Q-value ≤0.05 was defined as those with genes that showed significant differential expression.

Quantitative real-time PCR analysis
For qRT-PCR validation, 500 ng total RNA extracted from each stage of longan SE (NEC, EC, ICpEC, GE, TE and CE) were transcribed into cDNA with random primers and Oligo dT primer using the SYBR Ex Script™ kit (Takara, China), Sixteen unique transcripts with potential roles in longan SE were chosen and their specific primers were designed using DNAMAN 7.0. qRT-PCR was performed on the LightCycler 480 instrument (Roche Applied Science, Switzerland) in a total volume of 20 μL in each well containing 10 μL of 2× SYBR Premix Ex Taq™; 0.8 μL of each specific primer (100 nM); 1.0 μL of cDNA template (in a 1:10 dilution); and 7.4 μL of ddH 2 O. The PCR conditions were: denaturation for 60 s at 95°C, and then 40 cycles of 10 s at 95°C and 20 s between 58°C and 61°C in function of the Tm of the primers. Primer annealing specificity was examined and verified by melting curve analysis. Four-point standard curves of a fivefold dilution series (1:5 to 1:625) from pooled cDNA were used to calculate PCR efficiency. The reactions were performed in 96-well PCR plates and each experiment consisted of three biological replicates. The expressive abundance of the sixteen selected genes were calculated relative to the expression of reference genes DlFSD, DlEF-1a, and Dlelf4a. Data were further processed in MS Excel. Gene names, primer sequences, product sizes, and annealing temperatures are given in Additional file 11: Table S7.