Generation and Analysis of Expressed Sequence Tags from Chimonanthus praecox (Wintersweet) Flowers for Discovering Stress-Responsive and Floral Development-Related Genes

A complementary DNA library was constructed from the flowers of Chimonanthus praecox, an ornamental perennial shrub blossoming in winter in China. Eight hundred sixty-seven high-quality expressed sequence tag sequences with an average read length of 673.8 bp were acquired. A nonredundant set of 479 unigenes, including 94 contigs and 385 singletons, was identified after the expressed sequence tags were clustered and assembled. BLAST analysis against the nonredundant protein database and nonredundant nucleotide database revealed that 405 unigenes shared significant homology with known genes. The homologous unigenes were categorized according to Gene Ontology hierarchies (biological, cellular, and molecular). By BLAST analysis and Gene Ontology annotation, 95 unigenes involved in stress and defense and 19 unigenes related to floral development were identified based on existing knowledge. Twelve genes, of which 9 were annotated as “cold response,” were examined by real-time RT-PCR to understand the changes in expression patterns under cold stress and to validate the findings. Fourteen genes, including 11 genes related to floral development, were also detected by real-time RT-PCR to validate the expression patterns in the blooming process and in different tissues. This study provides a useful basis for the genomic analysis of C. praecox.


Introduction
Chimonanthus praecox (L.) Link, wintersweet, belongs to the Calycanthaceae family. It is a perennial deciduous shrub and blossoms in winter, from late November to March. Its unique flowering time and long blooming period make it one of most popular ornamental plants in China [1]. C. praecox is mainly a garden plant that also provides cut flowers. The flower is strongly fragrant and may be used as a source of essential oil, which has received much attention in New Zealand [2]. C. praecox thrives in cold environments and blooms in low-temperature seasons with little rainfall. The plant is assumed to be rich in genes related to floral development and adversities, especially those responding to environmental stress factors. However, the molecular mechanism that regulates floral development and copes with stresses in C. praecox flowers remains unclear.
Expressed sequence tags (ESTs) have been proven to be an efficient and rapid means to identify novel genes (and proteins) induced by environmental changes or stresses [3][4][5][6][7]. Genes related to flower form, longevity, and scent from roses, Phalaenopsis equestris, and Pandanus fascicularis were identified by ESTs [8][9][10]. The present study used transcriptomic analysis of C. praecox flowers to identify novel genes induced by environmental changes or related to floral development and ultimately to better understand the physiological and genetic basis of cold acclimation in flowers of woody plants.

Complementary DNA (cDNA) Library Construction and
Sequencing. The process of C. praecox blossoming includes the following: Stage 1, sprout period; Stage 2, flower-bud period; Stage 3, display-petal period; Stage 4, initiating bloom period; Stage 5, bloom period; Stage 6, wither period [11]. C. praecox flower buds or flowers at the six stages of development ( Figure 1) were collected from the nursery at Southwest University, Chongqing, China, for cDNA library construction. The samples were immediately frozen with liquid nitrogen and then refrigerated at −80 • C until RNA isolation. Total RNA samples were extracted from these flower buds or flowers using RNA isolation kits (W6771; Watson Biotechnologies Inc.), and RNA quality was detected by BioSpec-mini (Shimadzu). The final RNA sample for the cDNA library construction was bulked by pooling equal amounts of total RNA from each stage.
cDNA synthesis from the mixed total RNA and library construction through directional cloning of cDNAs into the λTriplEx2 vector was performed using a SMART cDNA Library Construction Kit (K1051-1) according to the manufacturer's instructions. The size of the insert fragment and the recombinant rate were measured by PCR, as described by Gao and Hu [12], using a random selection of 50 clones. All clones were sequenced from the 5 end using ABI 3700 (Applied Biosystems).

Sequence Clustering, Annotation, and Functional Categorization.
Poor-quality sequences, sequences with less than 100 bases, and vector sequences were trimmed from the raw sequences using SeqMan II (DNASTAR, Inc., Madison, WI) and manually. The trimmed cDNA sequences were assembled into clusters using the assembly program within SeqMan II set to default parameters. The assembly parameters were set to require a minimum match of 80% over 12 bp to initiate the assembly process. A consensus sequence for unigenes was exported from SeqMan II.
Unigenes (contigs and singletons) were annotated using BLASTX against the NCBI nonredundant protein database with a cut-off E value of the best hit of ≤10 −5 [13]. Sequences without a reliable match (>10 −5 ) were subsequently compared with the NCBI nonredundant nucleotide database by performing BLASTN (score >100) for complementary annotation [14]. All well-annotated unigenes were then further classified and mapped to the three Gene Ontology (GO) categories (biological, cellular, and molecular) via AmiGO (http://amigo.geneontology.org/) [15].

Expression Analysis of Selected Genes Using Quantitative
Real-Time RT-PCR. For real-time expression studies, C. praecox seeds were kept under a 16 h light/8 h dark cycle in a growth chamber at 25 • C. Small seedlings were subsequently transferred to plastic pots containing a mixture of soilrite and vermiculite (1 : 1). The plantlets were raised to the sixth leaf stage and then subjected to cold treatment (4 • C) for 15 min, 1 h, and 6 h. Control plantlets were maintained at 25 • C. The tissues were harvested and snap-frozen in liquid nitrogen and kept at −80 • C until further use.
The total RNA of flowers at the different developmental stages as well as floral organs dissected from Stage 4 flowers, roots, stems, and leaves were extracted with the above-mentioned method. Equal amounts of DNA-free RNA (5 μg) from different tissues were reverse-transcribed using a PrimeScript RT Reagent Kit with gDNA Eraser (TaKaRa). The primers used for real-time PCR (Table 1) were designed by Primer Premier 5.0 (PREMIER Biosoft, USA). Real-time PCRs were performed in triplicate on 10 μL reactions using     Table 1 shows the primer pair-specific temperatures. PCR was carried out using a Bio-Rad CFX96 real-time system machine based on the manufacturer's instructions under the following conditions: denaturation at 95 • C for 30 s; 40 cycles of 95 • C for 5 s; 59 • C for 5 s. Dissociation kinetics was performed using the real-time PCR system at the end of the experiment (60 to 95 • C; continuous fluorescence measurement) to check the annealing specificity of the oligonucleotides. A comparative Ct (threshold cycles) method of relative quantification was used to analyze the real-time quantitative RT-PCR data using Bio-Rad CFX Manager Software Version 1.6. Actin and tubulin were used as housekeeping genes for the calculation of relative transcript abundance. The sizes of the amplified products were confirmed via gel electrophoresis. Negative controls with no templates were carried out concurrently.

General Characteristics of the cDNA Library and the Ests.
A cDNA library constructed from C. praecox flowers at different stages of development was used as a source of ESTs. The titers of the primary library and amplified library were 1.4 × 10 6 and 1.0 × 10 10 pfu/mL, respectively, with a recombinant rate of 96% for the original library. The sizes of the inserts ranged from 0.5 to 2.5 kb, and the average insert size was estimated to be 1.1 kb by PCR amplification of inserts from 50 randomly selected clones. These results indicate that our cDNA library was qualified.
In total, 896 random cDNA clones were successfully sequenced to generate ESTs. Trimming of the short sequences (<100 bp), vector sequences, and poor-quality sequences resulted in 867 high-quality ESTs, constituting a total of 584,201 bases in the C. praecox sequence. The average read length of these ESTs was 673.8 bp. All 867 sequences were deposited in GenBank under accession numbers DW222667 to DW223533. The clustering of ESTs generated 94 contigs (containing 2 or more ESTs) and 385 singletons (containing only 1 EST), yielding 479 unigenes. The redundancy of the library was calculated as 44.8% [(1 − Number of Unigenes/Number of ESTs) × 100%]. Figure 2 shows the distribution of ESTs in unigenes after clustering. Forty-two contigs had more than 2 ESTs, with the largest one containing 77 ESTs.

Functional Annotation and Classification of C. praecox
Unigenes. The 479 unigenes were compared with the nonredundant protein and nucleotide sequences database in NCBI using BLAST. Four hundred five unique sequences, corresponding to 84.6% of all the unigenes, shared significant homology with sequences in the public databases. Of these, 266 were similar to genes of known functions, whereas 139 were similar to putative uncharacterized proteins ( Table 2). The remaining 74 unigenes (15.4%) had only very weak or no matches and were considered as novel genes in C. praecox flowers.   Table 3 summarizes the highly expressed genes that contained more than 5 ESTs in one contig. The first and third most abundant ESTs were homologous to lipid transfer protein (LTP); the second most abundant ESTs encoded protein related to the adenine nucleotide translocator and then seed-specific protein, mannose-specific lectin, and LEA III protein (Table 3). These ESTs possibly corresponded to the most abundantly expressed genes in C. praecox flowers. Two hundred eighty-one ESTs were found to have more than five copies (Table 3), 385 had single copies of ESTs, and the remaining ESTs contained two to four copies in C. praecox flowers.
The database-matched 405 unigenes were found to have BLAST hits with 99 organisms, among which the highest number was from Arabidopsis thaliana (32.6%; 132 unigenes), followed by Oryza sativa (24.0%; 97 unigenes) and Nicotiana tabacum (4.2%; 17 unigenes) ( Table 4). The remaining 67 unigenes (16.5%) had BLAST hits with a single organism only. The extensive distribution of the matched organisms may be attributed to the fact that the genome of C. praecox significantly differed from the genomes of model plants, as well as the fact that the relative plants' genomes have not yet been widely studied.
The initial annotations were further simplified into plant-specific annotations (plant GO slim; http://amigo geneontology.org/) to obtain additional insights into the putative functions of unigenes. Of the 479 C. praecox unigenes, 364 were assigned GO terms in any category (biological, cellular, and molecular). Figures 3, 4, and 5 classify the unigenes according to terms in the biological process ontology, molecular function ontology, and cellular component ontology, respectively.    Table 5 shows the nonredundant ESTs that share similarities with genes related to defense and stress response according to GO classifications and previously published data. As expected, the ESTs involved in stress and defense were highly abundant in the library because the C. praecox thrives and blossoms in winter, thereby confirming previous reports of related transcripts with higher levels of defense in developing flowers [16][17][18][19][20].

Sequences Related to Stress and
Twelve unigenes (22 ESTs) related to cold stress tolerance were found in the library; these were classified as "response to cold" according to GO terms [e.g., β-amylase, acyl-CoAbinding protein, 3-hydroxyisobutyryl-coenzyme A hydrolase, catalase, low-temperature and salt-responsive protein, glutathione S-transferase (GST), membrane channel protein, and abscisic-acid-induced protein] and "cold acclimation" (e.g., fatty acid biosynthesis 1 and WCOR413-like protein). In this class, the most abundant sequences encode GST. Five GSTs encoded by 16 ESTs were identified in the library. GST genes exhibited a diverse range of responses to jasmonates, salicylic acid, ethylene, as well as oxidative stress in Arabidopsis [21], and were induced by heavy metals and hypoxic stress in rice roots [22]. In the current study, however, not enough cold-related unigenes were obtained, as expected. Some unknown functional genes related to cold stress tolerance likely exist in this library. Another class of genes involved in "response to absence of light," "low light intensity," and "response to red or blue light" according to GO terms was also represented in the library. Eight unigenes (10 ESTs) were identified, including acyl-CoA-binding protein, sadtomato protein, and catalase, among others.
Of the stress-and defense-related unigenes, 39 were possibly related to development. Nine types of LTPs or LTP precursors were encoded by the highest number of ESTs in this study (Table 5). Research has suggested different functions for LTPs in the physiology of plants, such as cutin synthesis, β-oxidation, somatic embryogenesis, pollen development, allergenics, plant signaling, and plant defense [23][24][25], but the true physiological role of LTPs in C. praecox flowers has yet to be determined. Fourteen ESTs encoding two kinds of LEA proteins, groups III and V, were identified. The presence of LEA proteins correlates well with freezing, water deficit, and salt stress [26][27][28], probably through the prevention of enzyme aggregation [29], and likely plays a similar role in C. praecox. Some other development-related unigenes were also found, such as MYB, calmodulin, actin, CCR4-associated factor, ubiquitin-conjugating enzyme E2, and ABC transporter, which are involved in transcription factor activity, signal transduction, cell structure, nucleotide metabolism, protein metabolic process, and transporter activity, respectively. Table 6 shows the 19 unigenes related to floral development. Five of these were homologous to MADS box transcription factor genes. Plant life critically depends on the function of MADS box genes encoding MADS domain transcription factors, which are present to a limited extent in nearly all major eukaryotic groups but constitute a large gene family in land plants [30].

Sequences Related to Floral Development.
MADS box genes control diverse developmental processes in flowering plants-ranging from root to flower and fruit development-and they especially control the processes of the transition from vegetative to reproductive development and establishment of floral organ identity [31]. The present study has also identified six unigenes related to secondary metabolism that are probably involved in floral color and fragrance.
The role of chilling temperature in dormancy in vegetative buds and induction of flowering has been investigated in many temperate-region species, particularly in A. thaliana and Populus spp. [32,33]. The physiological processes of dormancy release and induction of flowering competence rely on longer-term chilling temperature and a period of vernalization, respectively [34,35]. The current study has identified dormancy and vernalization-related genes; however, only one unigene (Cp82; E-value = 6.00E −15 ) was annotated as a dormancy-related protein in the library The possible reason was only a small-scale sequencing in our study or the processes of dormancy release and induction of flowering competence in C. praecox only last a short period.

Expression Analysis of Cold-Responsive and Floral
Development-Related Genes. Real-time RT-PCR was performed for 12 unigenes, including 9 selected from the GO Slim annotation belonging to "response to cold" (Cp88, Cp173, Cp215, Cp274, Cp359, Cp364, Cp375, Cp440, and Cp465), 1 annotated as a dormancy-related protein (Cp82), and 2 without functional annotation (Cp24 and Cp64), to analyze the changes in their expression due to cold stress ( Figure 6). The data revealed that Cp82 responded to cold Cp440 Cp465 Gene Normalized fold expression Real-time RT-PCR was also applied to validate the expression patterns of 11 genes related to floral development (Cp197, Cp203, Cp268, Cp297, Cp328, Cp330, Cp360, Cp383, Cp423, Cp436, and Cp458), Cp24, Cp64, and Cp82 (Figure 7). The expression patterns of these 14 genes were analyzed in roots, stems, leaves, outer tepals, middle tepals, inner tepals, stamina, and pistils. The results showed  clear differences in their expression. The genes related to floral development, except for Cp203 (LLP-B3 protein) and Cp297 (SRG1-like protein), increased by more than twofold in middle tepals. Cp24 and Cp203 presented an active expression in stamina but showed a very slight accumulation in other tissues. Cp268 (caffeoyl-CoA O-methyl-transferaselike protein) and Cp458 (MADS box protein 9) presented a peak in middle tepals but were only slightly or not expressed at all in roots, stems, leaves, and pistils. Cp197 (AGL9.2) was higher in all reproductive organs and presented its peak in middle tepals but was not detected in roots, stems, and leaves. The expression of Cp328 (STYLOSA protein) and that of Cp383 (secondary cell wall-related glycosyltransferase family 47) were not detected in roots; moreover, Cp328  increased by more than fourfold in middle tepals, whereas Cp383 did in pistils and stems. The accumulation of Cp82 (dormancy-related protein) transcripts was higher in stems, leaves, stamina, and pistils. The other genes expressed in all the detected tissues, but Cp64 and Cp297 (SRG1like protein) presented lower levels of expression. Cp360 (peroxisomal fatty acid β-oxidation multifunctional protein AIM1) increased by more than twofold in roots, middle tepals, and pistils. Quantitative real-time PCR methods were used to validate the transcript levels of the 14 genes further during the blooming process in C. praecox (Figure 8). The results showed that all these genes were not detected or very slightly expressed in Stage 1 and that Cp64, Cp82, Cp268, Cp297, Cp330, Cp360, and Cp383 had almost no accumulation in Stage 2. The transcript accumulations of Cp203 and Cp458 were sharply elevated to the highest level in Stage 2 but dramatically decreased at the subsequent stages of floral development. The expressions of Cp24, Cp82, and Cp436 presented a peak in Stage 4. The transcript accumulations of the other 9 genes increased during the six developmental stages and reached their peak in Stage 6. The expression of Cp297(SRG1-like protein) was significantly high in Stage 6 but very low in the other stages, and it was associated with flower senescence. The SRG1 gene is reportedly expressed in senescing organs of Arabidopsis plants [36].
The current study found few references about bud dormancy in C. praecox. The expression pattern of Cp82 was very attractive. However, the results indicate that Cp82 is not certainly related to flower-bud dormancy. The dormancy-related protein CAA93825.1, which matched Cp82 (E-value = 1e −15 ), has been reported to play a role in dormancy breaking and in the germination of Trollius ledebourii seeds [37]. These data warrant further research into the relativity of Cp82 to dormancy breaking and germination in C. praecox seeds.

Conclusions
A cDNA library was constructed to generate an EST collection from C. praecox, thereby providing a preliminary view into the genomic properties of this species. This collection of high-quality ESTs represents the first EST data set for C. praecox. Eight hundred sixty-seven valid EST sequences were generated, and 479 unigenes were assembled, among which 266 unigenes (55.53%) were identified according to their significant similarities with proteins of known functions. The EST sequences have been deposited in GenBank under accession numbers DW222667 to DW223533. Stress response genes and floral development-related genes were also identified. This study evaluated the expression patterns of 23 genes, including 2 novel ones, using real-time RT-PCR. Further investigations in this direction would help in the discovery of promising candidates with a key role in the development of stress tolerance for woody plants to bloom.