Identification and expression analysis of genes related to calyx persistence in Korla fragrant pear

The objective of this study was to increase understanding about genetic mechanisms affecting calyx persistence in Korla fragrant pear (Pyrus brestschneideri Rehd). Flowers were collected at early bloom, full bloom, and late bloom. The RNA was extracted from the flowers and then combined according to calyx type. Transcriptome and digital gene expression (DGE) profiles of flowers, ovaries, and sepals with persistent calyx (SC_hua, SC_ep, and SC_zf, respectively) were compared with those of flowers, ovaries, and sepals with deciduous calyx (TL_hua, TL_ep, and TL_zf, respectively). Temporal changes in the expression of selected genes in floral organs with either persistent or deciduous calyx were compared using real-time quantitative PCR (qRT-PCR). Comparison of the transcriptome sequences for SC_hua and TL_hua indicated 26 differentially expressed genes (DEGs) with known relationship to abscission and 10 DEGs with unknown function. We identified 98 MYB and 21 SPL genes from the assembled unigenes. From SC_zf vs TL_zf, we identified 21 DEGs with known relationship to abscission and 18 DEGs with unknown function. From SC_ep vs TL_ep, 12 DEGs with known relationship to abscission were identified along with 11 DEGs with unknown function. Ten DEGs were identified by both transcriptome sequencing and DGE sequencing. More than 50 DEGs were observed that were related to calyx persistence in Korla fragrant pear. Some of the genes were related to cell wall degradation, plant hormone signal transduction, and stress response. Other DEGs were identified as zinc finger protein genes and lipid transfer protein genes. Further analysis showed that calyx persistence in Korla fragment pear was a metabolic process regulated by many genes related to cell wall degradation and plant hormones.


Results and discussion
Transcriptome sequencing and assembly In total, 107202492 raw reads were generated by Illumina sequencing of SC_hua vs TL_hua (Table 1). There were 103466288 clean reads after removing low-quality sequences. Assembly of the clean reads resulted in 39891341 unigenes ranging in size between 201 and 16666 bp (Fig. 1). The N50 length of the unigenes was 1579 bp and the N90 length was 289 bp.

Sequence annotation
The unigenes were aligned with seven public databases [i.e., NR (NCBI non-redundant protein sequences), NT (NCBI nucleotide sequences), KEGG (Kyoto Encyclopedia of Genes and Genomes), SwissProt (A manually annotated and reviewed protein sequence database), PFAM (Protein family), GO (Gene Ontology) and KOG/COG (Clusters of Orthologous Groups of proteins)] ( Table 2). The results showed that 18605 unigenes (38.05 %) had significant matches in the NR database, 16700 unigenes (34.15 %) had significant matches in the NT database, and 17326 unigenes (35.43 %) had significant matches in the SwissProt database. In total, 26088 unigenes (53.35 %) were annotated in at least one database, with 3037 unigenes (6.21 %) being annotated in all seven databases.
Unigene metabolic pathway analysis was also conducted using KEGG. This process predicted a total of 258 pathways, representing 6925 unigenes (Fig. 4). The pathways involving the highest number of unique transcripts were 'carbohydrate metabolism' (662), followed by 'translation' (639) and 'signal transduction' (542). The above data is a very valuable genetic resource for studying calyx persistence in Korla Fragrant Pear.

Differential expression analysis in SC_hua vs TL_hua
Differentially expressed genes (DEGs) are defined as genes that are significantly enriched or depleted in one sample relative to another (q value < 0.005 and |log2 (foldchange)| >1). In the rest of this paper, up-regulated means that the gene expression level was greater in samples with persistent calyx than in samples with deciduous calyx.  Down-regulated means that the gene expression level was less in samples with persistent calyx than in samples with deciduous calyx. There were 103 DEGs among 48894 unigenes in SC_hua vs TL_hua. Among these, 47 DEGs were up-regulated and 56 DEGs were down-regulated (Fig. 5). To further characterize the function of the DEGs, GO enrichment analysis was conducted for all of the DEGs in SC_hua vs TL_hua with the whole transcriptome as the background (Additional file 1). In the BP category, the top three enriched terms were 'heterocycle biosynthetic process' , 'organic cyclic compound biosynthetic process' and 'cellular nitrogen compound biosynthetic process'. In the CC category, 'nuclear part' , 'membrane-enclosed lumen' , 'intracellular organelle lumen' , 'organelle lumen' and 'nuclear lumen' were the dominant enriched terms. In the MF category, 'nucleic acid binding transcription factor activity' and 'sequence-specific DNA binding transcription factor activity' were most highly enriched. A GO enrichment analysis was also conducted for the up-regulated DEGs (Additional file 2). In the BP category, 'biological regulation' , 'regulation of biological process' , and 'regulation of cellular process' were most highly enriched. In the CC category, 'membrane-enclosed lumen' , 'intracellular organelle lumen' , 'organelle lumen' and 'nuclear lumen' were the main enriched terms. In MF, the top two enriched terms were 'nucleic acid binding transcription factor activity' and 'sequence-specific DNA binding transcription factor activity'.
The KEGG pathway enrichment analysis for DEGs also revealed both common and tissue specific patterns of over-representation (Additional file 3). The top-four enriched pathways for DEGs in SC_hua vs TL_hua were 'cysteine and methionine metabolism' , 'porphyrin and chlorophyll metabolism' , 'phenylalanine metabolism' and 'isoquinoline alkaloid biosynthesis'. For up-regulated DEGs (Additional file 4), 'calcium signaling pathway' , 'porphyrin and chlorophyll metabolism' , 'phosphatidylinositol signaling system' and 'glycerolipid metabolism' were most highly enriched. For down-regulated DEGs (Additional file 5), 'cysteine and methionine metabolism' , 'isoquinoline alkaloid biosynthesis' and 'biosynthesis of amino acids' were the three main enriched pathways.

DGE sequencing
A DGE analysis was performed to compare gene expression in SC_ep, SC_zf, TL_ep, and TL_zf. After removing low-quality sequences, we obtained 12283115, 10084701,
The expression of ERF109 at the early bloom and late bloom stages was significantly (P = 0.01) greater in flowers with persistent calyx than in flowers with deciduous calyx. Regardless of whether the flower had a deciduous or a persistent calyx, ERF109 expression was significantly (P = 0.01) greater at the early bloom stage than at either the full bloom or late bloom stages (Fig. 7a). The expression of ERF109 at the late bloom stage was significantly (P = 0.01) greater in ovaries with persistent calyx than in sepals with persistent calyx (Fig. 7b). Regardless of bloom stage, the expression of ERF109 in ovaries with deciduous calyx was not significantly different than that in sepals with deciduous calyx (Fig. 7c).
The expression of NAC002 in flowers varied significantly depending on the type of calyx and the flower stage. Specifically, NAC002 expression at early bloom and late bloom was significantly (P = 0.01) greater in flowers with persistent calyx than in flowers with deciduous calyx; however, the opposite was observed at full   (Fig. 7d). The NAC002 expression in flowers with a persistent calyx was significantly (P = 0.01) highest at the late bloom and early bloom stages. In contrast, NAC002 expression in flowers with a deciduous calyx was significantly (P = 0.05) greatest at the full bloom stage. The expression of NAC002 in ovaries with persistent calyx was significantly greater than that in sepals with persistent calyx at the early bloom stage (P = 0.05) and at the full bloom stage (P = 0.01) (Fig. 7e). In contrast, at the late bloom stage, NAC002 expression in ovaries with persistent calyx was significantly (P = 0.01) less than that in sepals with persistent calyx. The expression of NAC002 in ovaries with deciduous calyx was significantly greater than that in sepals with deciduous calyx at the full bloom (P = 0.01) and late bloom stages (P = 0.05) (Fig. 7f ). The expression of MYB5 was significantly greater in flowers with persistent calyx than in flowers with deciduous calyx at the early bloom (P = 0.05) and late bloom (P = 0.01) stages (Fig. 7g). In contrast, at the full bloom stage, MYB5 expression was significantly (P = 0.05) less in flowers with persistent calyx than in flowers with deciduous calyx. The expression of MYB5 in sepals with persistent calyx was significantly greater than that in ovaries with persistent calyx at the full bloom (P = 0.05) and late bloom (P = 0.01) stages (Fig. 7h). In contrast, MYB5 expression at the early bloom stage was significantly (P = 0.01) less in sepals with persistent calyx than in ovaries with persistent calyx. The expression of MYB5 in sepals with deciduous calyx was significantly greater than that in ovaries with deciduous calyx at early bloom and full bloom ( Fig. 7i, P = 0.01).
Regardless of whether the flower had a deciduous or a persistent calyx, PGIG expression was significantly (P = 0.01) greater at the late bloom stage than at either the early bloom or full bloom stages (Fig. 7j). There was no significant difference in PGIG expression between flowers with persistent calyx and flowers with deciduous calyx. Regardless of whether the calyx was persistent or deciduous, the expression of PGIG in sepals was significantly greater than that in ovaries at the late bloom stage ( Fig. 7k and l, P = 0.01).
The expression of SPL9 at the early bloom and late bloom stages was greater in flowers with persistent calyx than in flowers with deciduous calyx; however the opposite was true at the full bloom stage. The expression of SPL9 in flowers with deciduous calyx was not significantly different from that in flowers with deciduous calyx. Regardless of whether the flower had a deciduous or a persistent calyx, SPL9 expression was significantly (P = 0.01) greater at the late bloom stage than at either the early bloom or full bloom stages (Fig. 7m). There was no significant difference in SPL9 expression between ovaries with persistent calyx and sepals with deciduous calyx (Fig. 7n). The expression of MYB5 in ovaries with deciduous calyx was significantly greater than that in sepals at the full bloom and late bloom stages ( Fig. 7o, P = 0.01).
The total expression pattern of the three genes ((ERF109 (comp36863_c0), NAC002 (comp41728_c0), and PGIG (comp49798_c0)) obtained with qRT-PCR was consistent with the RNA-seq data. This confirmed the validity of our results.

Plant hormone and organ abscission
Many hormones, especially IAA and ethylene, regulate organ abscission [30][31][32][33][34][35]. From 103 DEGs in SC_hua vs TL_hua, 11 genes were identified that were related to plant hormone metabolism. Five of these genes were related to ethylene-responsive transcription factor, two genes were related to auxin-induced protein, one gene was related to gibberellin-regulated protein, one gene was related to EREBP-like factor, one gene was related to the auxin responsive GH3 gene family, and one gene was related to brassinosteroid-regulated protein. From 64 DEGs in SC_ep vs TL_ep, seven genes were identified that were involved in plant hormone metabolism. Four of these genes were related to ethylene-responsive transcription factor, one gene was related to gibberellin 2- beta-dioxygenase 1, one gene was related to auxininduced protein, and one gene was related to abscisic acid 8'-hydroxylase 4. We also identified five genes related to ethylene-responsive transcription factor from 95 DEGs in SC_zf vs TL_zf (Table 6).

Genes related to cell wall degradation and organ abscission
The dissolution of the middle lamella is related to abscission, especially the loss of adhesion by separation layer cells due to the effects of cell wall degrading enzymes such as polygalacturonases. Several researchers have reported that cell wall modifying proteins such as expansin [36] and pectinesterase [37] have a role in abscission. Other researchers have observed that polygalacturonases have important function in the abscission process in oil palm [38], tomato [39], oilseed rape and Arabidopsis [40]. Beta-galactosidase [41], xyloglucan endotransglucosylase/hydrolase [42], and glucanase [43] genes have also been shown to be related to abscission.
We obtained eight genes related to cell wall degradation from DEGs in SC_hua vs TL_hua. These eight genes included one gene related to polygalacturonase, one gene related to polygalacturonase inhibition, one gene related to beta-galactosidase, one gene related to glucan endo-1,3-beta-glucosidase, one gene related to lignin catabolic process, one gene related to tissue regeneration, and two genes related to xyloglucan endotransglucosylase. One expansin gene was obtained from DEGs in SC_ep vs TL_ep. From DEGs in SC_zf vs TL_zf, we obtained genes related to glucan endo-1,3beta-glucosidase, beta-galactosidase, polygalacturonase inhibition, xyloglucan endotransglucosylase, and pectinesterase ( Table 7).

Function of SPL and MYB genes in organ abscission
The SPL genes play an important role in the growth process of plants, including morphogenesis, the transition between developmental stages, sporogenesis, floral and fruit development, stress response, and plant  hormone signal transduction [44]. In addition, SPL genes are induced during cell senescence leading to cell death [45,46]. The MYB genes participate in plant secondary metabolism [47] as well as the plant's response to hormones and environmental factors [48][49][50]. The MYB genes also regulate cellular differentiation, the cell life cycle [51,52], and the morphogenesis of organs such as leaves [53][54][55]. The MYB genes are also involved in abscission [11,56,46]. We obtained 98 MYB and 21 SPL genes from the 48894 annotated unigenes (Table 8).

Stress response genes and abscission
The sequencing results showed that many genes related to stress response exhibited differential expression. There was one heat shock factor protein, two dehydrationresponsive element-binding proteins, one dehydrationresponsive protein, two NAC transcription factor proteins, one NAC domain-containing protein [57,58], and one cysteine synthase-like gene [59] among the DEGs in SC_hua vs TL_hua. There were also genes related to the NAC domain-containing protein, the pathogenesis-related protein Bet v I family, the senescence-related protein gene, dehydration-responsive protein, and dehydrationresponsive element-binding protein from DEGs in SC_ep vs TL_ep. From the DEGs in SC_zf vs TL_zf, we obtained genes related to disease resistance response protein 206, dehydration-responsive protein, defensin-like protein, and senescence-related protein (Table 9).

Other genes and abscission
Several researchers have reported that zinc finger protein [60] and lipid-transfer protein [61,62] are involved in calyx abscission. We obtained one gene related to lipid-transfer protein from DEGs in SC_hua vs TL_hua.
One gene related to lipid-transfer protein as well as five zinc finger genes were obtained from DEGs in SC_zf vs TL_zf (Table 10).

Putative genes related to abscission
Other genes in this study showed high-level differential expression. However, the function of these genes is unknown. We defined these genes as putative genes related to abscission. There were ten putative genes among DEGs in SC_hua vs TL_hua, eleven putative genes among DEGs in SC_ep vs TL_ep, and eighteen putative genes among DEGs in SC_zf vs TL_zf (Table 11). The DEGs from transcriptome and DGE sequencing were subjected to a search against GO and KEGG databases. The results showed that many of the DEGs were involved in metabolic processes related to chlorophyll, plant hormone metabolism, carbohydrate metabolism, signal transduction and cell wall construction. The results were consistent with Qi's (2013) [12], and suggest that calyx persistence in Korla fragrant pear is regulated by many genes.

Conclusion
More than 50 DEGs were obtained through transcriptome and DGE sequencing. These DEGS were related to

Plant material
Three trees with high vigor and three trees with low vigor were selected in spring 2013 at the Shayidong Horticulture Field, Korla, Xinjiang Province. Flowers were collected from each tree at the early bloom, full bloom, and late bloom stages. The first flower to open in clusters on trees with high vigor has a persistent calyx (Fig. 8a, b). The fourth flower to open in clusters from trees with low vigor has a deciduous calyx (Fig. 8c, d).
The flowers were immediately frozen in liquid N and stored at −80°C.

Transcriptome sequencing
Solexa/Illumina sequencing was carried out by Novogene, Beijing, China. Total RNA was extracted from the flower samples using RNAout 1.0 (Tianenze, Beijing, China). The RNA degradation and contamination was monitored on  1 % agarose gels. The purity of the RNA was checked with a NanoPhotometer® (IMPLEN, CA, USA). The RNA concentration was measured using a Qubit®RNA Assay Kit and a Qubit®2.0 Fluorometer (Life Technologies, CA, USA). The RNA integrity was assessed using an RNA Nano 6000 Assay Kit and an Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA). After quality inspection, the RNA from flowers at the early, full, and late bloom stages were combined by calyx type. The combined RNA sample from flowers with a persistent calyx will be referred to as SC_hua. The combined RNA sample from flowers with a deciduous calyx will be referred to as TL_hua. These RNA samples were used for transcriptome sequencing. Three biological replicates were used. The RNA preparations used 3 μg RNA per sample. Sequencing libraries were generated using NEBNext®Ultra™ RNA Library Prep Kit for Illumina® (NEB, USA) following the manufacturer's recommendations. Index codes were added to attribute sequences in each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5x). First strand cDNA was synthesized using Table 9 Genes related to stress     The PCR products were purified (AMPure XP system) and the library quality was assessed using an Agilent Bioanalyzer 2100. The clustering of the index-coded samples was performed on a cBot Cluster Generation System using Tru-Seq PE Cluster Kit v3-cBot-HS (Illumia) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina Hiseq 2000 platform and paired-end reads were generated.

Data analysis of transcriptome sequencing
Raw data (raw reads) in fastq format were first processed through in-house Perl scripts. Clean data (clean reads) were obtained by removing reads containing adapter sequences, reads containing poly-N, and low quality reads. The Q20, Q30, GC-content, and sequence duplication level of the clean data were calculated. All downstream analyses were based on clean data with high quality.
The left files (read1 files) from all libraries/samples were pooled into one large left.fq file. The right files (read2 files) were pooled into one large right.fq file. Transcriptome assembly was accomplished based on the left.fq and right.fq files using Trinity [63]. The min_ kmer_cov was set at 2 and all other parameters were set at default. Gene function was annotated based on the following databases: NR (NCBI non-redundant protein sequences); NT (NCBI non-redundant nucleotide sequences); PFAM (Protein family); KOG/COG (Clusters of Orthologous Groups of proteins); SwissProt (A manually annotated and reviewed protein sequence database); KO (KEGG Ortholog database); GO (Gene Ontology).

DGE sequencing
The RNA was extracted from sepals and ovaries at the early, full, and late bloom stages. The RNA was combined by calyx type. The combined RNA sample from sepals with a persistent calyx will be referred to as SC_ep. The combined RNA sample from sepals with a deciduous calyx will be referred to as TL _ep. The combined RNA sample from ovaries with a persistent calyx will be referred to as SC_zf. The combined RNA sample from ovaries with a deciduous calyx will be referred to as TL _zf. The methods of RNA extraction, RNA quantification, RNA qualification, clustering, and sequencing were the same as those described above for transcriptome sequencing.

Differential expression analysis Samples with biological replicates
Differential expression analysis of two conditions/groups was performed using the DESeqR package (1.10.1). The DESeq provides statistical routines for determining differential expression in digital gene expression data using a model based on negative binomial distribution. The resulting P values were adjusted using Benjamini and Hochberg's approach for controlling the false discovery rate. Genes were considered to be differentially expressed if DESeq found the adjusted P-value to be <0.05.

Samples without biological replicates
Prior to differential gene expression analysis, the read counts for each sequenced library were adjusted using edgeR software through one scaling normalized factor. Differential expression analysis of two samples was performed using DEGseq R package (2010). The P value was adjusted using the q value [64]. The q value < 0.005&|log2 (fold change)| > 1 was set as the threshold for significantly differential expression.

GO enrichment analysis
Gene Ontology (GO) enrichment analysis of the differentially expressed genes (DEGs) was implemented by GOseq R packages based on Wallenius non-central hyper-geometric distribution [65] which can be adjusted for gene length bias in DEGs.

KEGG pathway enrichment analysis
KEGG [66] is a database resource for understanding highlevel functions and utilities of biological systems (e.g., cell, organism, and ecosystem), from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies (http://www.genome.jp/kegg/). We used KOBAS [67] software to test the statistical enrichment of differentially expressed genes in KEGG pathways.

Protein Protein Interaction (PPI)
The sequences of the DEGs were BLASTx against the genome of a related species (the PPI of which exists in the STRING database: http://string-db.org/) to get the predicted PPIs of these DEGs. The PPIs were visualized in Cytoscape [68].

Real-time quantitative PCR
The expression of five genes (Gene ID: comp36863_c0, comp41728_c0, comp46544_c0, comp49798_c0, and comp49614_c0) that might be associated with calyx persistence in Korla Fragrant Pear were analyzed by qRT-PCR. Total RNA was separately extracted from the full flowers, sepals and ovaries using RNAout 1.0 (Tianenze, Beijing, China) at the early bloom, full bloom, and late bloom stages. The RNA samples were from (i) sepals with persistent calyx, (ii) ovaries with persistent calyx, (iii) sepals with deciduous calyx, (iv) ovaries with deciduous calyx, (v) full flowers with deciduous calyx, and (vi) full flowers with persistent calyx. Each group had three biological replications. Gene-specific primers were designed according to the reference unigene sequences using Primer Premier 5.0 (Table 12). The synthesis of cDNA was performed using a Reverse Transcriptase M-MLV kit (TaKaRa, Dalian, China). Real-time quantification was performed using a CFX manager (Bio-Rad, USA) and the SYBR Green Real-time PCR Master Mix (Toyobo, Osaka, Japan). The protocol of real-time PCR was as follows: initiation with a 30 s pre-denaturation at 95°C followed by 40 cycles of amplification with 5 s of denaturation at 95°C, 10 s of annealing at 56°C, 15 s of extension at 72°C and reading the plate for fluorescence data collection at 65°C. A melting curve was performed from 65 to 95°C to check the specificity to the amplified product. Each reaction was repeated three times. Korla fragrant pear actin gene (forward: 5'-CCATCCA GGCTGTTCTCTC-3' , and reverse: 5'-GCAAGGTCCA GACGAAGG -3') was used as a normalizer.