De novo transcriptome analysis of Cnidium monnieri (L.) Cuss and detection of genes related to coumarin biosynthesis

Cnidium monnieri (L.) Cuss (C. monnieri) is one of the most widely used traditional herbal medicines, exhibiting a wide range of pharmacological functions for treating asynodia, trichomonas vaginitis, and osphyalgia. Its important medicinal value comes from its abundance of coumarins. To identify genes involved in coumarin biosynthesis and accumulation, we analyzed transcriptome data from flower, leaf, root and stem tissues of C. monnieri. A total of 173,938 unigenes with a mean length of 1,272 bp, GC content of 38.79%, and N50 length of 2,121 bp were assembled using the Trinity program. Of these, 119,177 unigenes were annotated in public databases. We identified differentially expressed genes (DEGs) based on expression profile analysis. These DEGs exhibited higher expression levels in flower tissue than in leaf, stem or root tissues. We identified and analyzed numerous genes encoding enzymes involved in coumarin biosynthesis, and verified genes encoding key enzymes using quantitative real-time PCR. Our transcriptome data will make great contributions to research on C. monnieri and provide clues for identifying candidate genes involved in coumarin metabolic pathways.


INTRODUCTION
Cnidium monnieri (L.) Cuss, an annual plant of the Umbelliferae family native to China and Vietnam, exhibits a diverse set of pharmacological activities including anti-osteoporotic ), anti-adipogenic (Shin et al., 2010 and anti-fungal properties (Wang et al., 2009). Many chemical compounds have been isolated and purified from C. monnieri, including coumarins, volatile oils, chromones, triterpenoids, glycosides and glucides (Li et al., 2015). Coumarins are the most abundant compounds in C. monnieri, of which osthole is considered the dominant chemical constituent with a broad range of anti-bacterial, anti-hepatitis and anti-tumor effects (Cai et al., 2000;Liu et al., 1999;. Coumarins are a large class of natural products found in higher plants (Venugopala, Rashmi & Odhav, 2013). They have attracted a great deal of interest due to a wide range of pharmacological activities including anti-inflammatory, anti-diabetic and anti-tumor functions (Chong et al., 2002). Coumarins originate from the phenylpropanoid pathway and are subclassified into simple coumarins, furanocoumarins, pyrano-coumarins and others (Yang et al., 2015). Simple coumarins include osthole, scopoletin, umbelliferone and esculetin (Bourgaud et al., 2006). The biosynthesis of coumarins starts with the formation of phenylalanine (Kosuke et al., 2006). Phenylalanine ammonia-lyase (PAL) catalyzes phenylalanine transformation into trans-cinnamic acid; the conversion of trans-coumaric acid to p-coumaroyl-CoA is then catalyzed by cinnamate 4 hydroxylase (C4H) and 4-coumarate-CoA ligase (4CL); p-coumaroyl-CoA is further converted to p-coumaroylshikimic acid and 2,4-dihydroxycinnamoyl-CoA catalyzed by shikimate hydroxycinnamoyl transferase (HCT) and trans-4-coumaroyl-CoA 2-hydroxylase (C2 H), respectively; the 2,4-dihydroxycinnamoyl-CoA and p-coumaroylshikimic acid undergo further reactions to produce various coumarins (Kai et al., 2008;Vialart et al., 2012). Although, the enzymes participating in each step of coumarin biosynthesis have been determined, information regarding the genes encoding these enzymes is still insufficient.
RNA-sequencing (RNA-Seq) has been widely applied as a powerful method for discovering and predicting functional genes, and revealing gene expression patterns, especially in non-model species for which reference genome sequences are insufficient (Chu & Corey, 2012). To date, dozens of medicinal plants have been subjected to RNA-Seq analysis, including recent transcriptome analyses of Melilotus albus (Luo et al., 2017), Artemisia argyi (Liu et al., 2017), Arisaema heterophyllum (Wang et al., 2018), Clinopodium chinense (Shi et al., 2019) and Polygonatum cyrtonema (Wang et al., 2019). However, such analytical approaches have not been used to obtain comprehensive transcriptomic resources for C. monnieri.
The purpose of this study was to uncover functional genes associated with or playing regulatory roles in coumarin biosynthesis. We conducted transcriptome sequencing and gene expression profiling in C. monnieri using BGI-500 sequencing technology. Numerous potential genes and differentially expressed genes (DEGs) involved in coumarin biosynthesis were screened using de novo transcriptome sequencing. Our findings will increase our understanding of the molecular mechanisms of coumarin biosynthesis in C. monnieri and provide clues for exploring functional genes in other plants having close relationships with C. monnieri.

Sample preparation and RNA extraction
Whole C. monnieri plants were collected from Tongcheng city in Anhui Province, China in May 2018, with verbal permission of the Manager Zhongquan Fang (Anhui Bowen agriculture and Forestry Development Co., Ltd), and authenticated by Professor Dequn Wang (Anhui University of Chinese Medicine). Prior to the experiment, the plants were grown at a temperature of 22−26 • C/14−18 • C (day/night) and relative humidity of 65-80%. All samples were rinsed in ultrapure water, and leaves, roots, flowers and stems, which were harvested from three individual C. monnieri plants, were placed in centrifuge tubes, frozen in liquid nitrogen immediately and stored at −80 • C.
Total RNA was extracted from each tissue using RNA plant Plus Reagent (Tiangen, Beijing, China) according to the manufacturer's instructions. The extracted RNA was checked using a NanoDrop 2000 (Thermo, CA, USA), and the concentration of the isolated RNA, 28S/18S and integrity (RIN) were estimated using an RNA Nano 6000 Assay Kit and the Agilent Bioanalyzer 2100 system (Agilent, CA, USA) ( Table S1).

Determination of osthole content
Dried C. monnieri samples from flowers, roots, leaves and stems were used for isolation of osthole as previously reported (Jialei et al., 2014). Dried powder (0.1 g) from each sample was mixed with ethanol (25 mL), incubated for 2 h at room temperature (25 • C) and subjected to ultrasonic extraction for 60 min (100 W, 50 kHz). The supernatant was collected, and detected using UV-spectrophotometry at 322nm (SHIMADZU Coporation, kyoto, Japan). Osthole (purity >99.5%) (Aladdin Industrial Corporation, Shanghai, China) was used to construct a standard curve of the relationship between concentration and absorbance (Fig. S1A). The yield (percentage) of osthole was calculated as follows: Osthole content of extraction g C. monnieri tissue powder weight g × 100%

cDNA library construction and sequencing
Total RNA was treated with RNase-free DNase I (TaKaRa, China) to remove DNA residues and then mixed with magnetic beads containing oligo (dT) to purify mRNA. After purification, the mRNA was fragmented into small pieces under elevated temperature. The cleaved RNA fragments were copied into first-strand cDNA using reverse transcriptase and random primers. Second-strand cDNA was synthesized using DNA Polymerase I and RNase H. These cDNA fragments had the addition of a single adenine at the 3 end for subsequent ligation of adapters. The products were then purified and enriched using PCR amplification. The quality of each sample library was evaluated using an Agilent 2100 Bioanalyzer system (ABI, New York, NY, USA). Each cDNA library was sequenced using the BGISEQ-500 system at Beijing Genomics Institute (BGI) (Shenzhen, Guangdong province, China) with paired-end (PE) sequencing length of 100 bp.

Quantitative real-time PCR (qRT-PCR) analysis of key genes in coumarin biosynthesis
Total RNA from each sample was subjected to reverse transcription using SuperScript TM III First-Strand Synthesis SuperMix (Invitrogen, US). qRT-PCR analysis of each gene was performed on a QuantStudio R Real-time PCR system (Life Technologies, US) using Power SYBR R Green PCR Master Mix (Roche, China). The actin gene (CL3748.Contig7) of C. monnieri was used as a reference. Primers for all selected genes were designed using Premier 6.0 and are listed in Table S2. Each qRT-PCR contained 1µl of diluted cDNA, 0.5 µl of each primer (10 µM) and 10.0 µl of Power SYBR R Green Master Mix. Reactions were conducted under the following conditions: denaturation at 95 • C for 1 min, followed by 40 cycles of 95 • C for 15 s and 63 • C for 25 s. Each qRT-PCR was performed using three biological replicates. The 2 − Ct method was used to calculate the relative expression level of genes (Livak & Schmittgen, 2001).

Analysis of differentially expressed genes (DEGs)
After assembly, clean reads were mapped to unigenes using Bowtie2 (version 2.2.5) (Langmead & Salzberg, 2012), and then gene expression levels were calculated with RSEM (version 1.1.12) (Dewey & Li, 2011). Since we had no RNAseq bioreplicates, we used a Poisson distribution method to identify candidate genes that might be differentially expressed. Both fold change (≥2.0) and false discovery rate (≤0.001) are used as criteria to narrow down the list of differentially expressed genes between flowers and other tissues (Bullard et al., 2010;Lee et al., 2008;Li et al., 2013). qRT-PCR with bioreplicates was used to confirm differential expression of a subset of these. GO and KEGG analyses were again performed on the DEGs following the method described by Audic (Audic & Claverie, 1997).

Identification of transcription factors
The open reading frames (ORFs) of unigenes were initially detected using getorf (version EMBOSS:6.5.7.0) (Rice, Longden & Bleasby, 2000) and further aligned to the plant transcription factor database (PlntfDB) through comparison with Pfam 23.0 using the hmmsearch method (Mistry et al., 2013). The function of unigenes was identified according to the characteristics of each transcription factor family described by PlantfDB.

Statistical analysis
The contents of osthole and all data from qRT-PCR were presented as mean ± standard deviation. Statistics were done by GraphPad Prism 6.0. Heatmap and boxplot were performed using R software (version 3.4.4). The one-way analysis of variance (ANOVA) was used to compare the data between groups with Duncan t test. P value <0.05 was considered to be statistically significant.

Osthole content determination in different tissues of C. monnieri
We extracted osthole from dried leaves, stems, flowers and roots of C. monnieri. Osthole content was highest in flowers (1.372%) and lowest in stems (0.283%) (Fig. S1B). Osthole quantification revealed significant differences among different tissues, with this active ingredient mainly distributed in flowers and secondly in leaves.

RNA sequencing and de novo assembly
Four mRNA libraries were extracted from leaf, stem, flower and root tissue of C. monnieri using BGISEQ-500 sequencing technology. After eliminating low-quality reads, we obtained a total of 10.6 Gb, 10.47 Gb, 10.81 Gb and 10.38 Gb of clean reads from 83.42 Mb, 81.65 Mb, 84.27 Mb and 82.14 Mb sets of raw data from leaf, stem, flower and root tissue, respectively, with all Q30 values greater than 85% (Table S3). The clean reads were assembled using Trinity and the TGI clustering tool (TGICL). After assembly, a total of 173,938 unigenes were obtained from the four libraries with a mean length of 1,272 bp, GC content of 38.79% and N50 value of 2,121 bp (Table S4). Of these unigenes, 45.80% (79,658) and 64.13% (111,562) exceeded 1,000 bp and 500 bp in length, respectively (Fig. S2). The quality of the assembled transcripts was assessed using a single copy orthologous database (BUSCO), and 98% of the unigenes showed a perfect match (Fig. S3).
We determined expression values of transcripts with fragments per kilobase of transcript per million mapped reads (FPKM) >1 for each tissue and found 55,421, 52,787, 48,287 and 44,717 expressed unigenes in flower, leaf, stem and root tissues, respectively ( Fig. 2A). The overall expression level of unigenes in flowers was higher than that in leaves, stems or roots (Fig. 2B).

Expression analysis of key enzyme genes
We selected six unigenes encoding five key enzymes and used quantitative real-time PCR (qRT-PCR) to determine their relative expression levels in the leaf, stem, flower and  root tissues of C. monnieri (Fig. S7). Relative expression of the CL3766.Contig1 (C4H ), CL16061.Contig1 (4CL) and CL8379.Contig6 (F6H1) genes was highest in root tissue, whereas relative expression of the CL11351.Contig3 (PAL) and CL2358.Contig9 (BGA) genes was highest in flower tissue. Unigene 20867 (BGA) had a high expression level in leaf tissue. The expression levels determined by qRT-PCR were consistent with the FPKM values.

Identification of DEGs
We identified 12,635 unigenes uniquely expressed in flowers and 77,762 shared unigenes exhibiting expression in all four tissues (Fig. 5A). Total DEGs were identified among the four tissues using gene FPKM values. A comparison between flowers and leaves revealed a total of 46,460 DEGs, of which 26,944 were up-regulated and 19,516 were down-regulated in flowers (Fig. 5B). A further comparison of flowers with roots revealed 47,901 DEGs, of which 29,754 were up-regulated and 18,147 were down-regulated in flowers (Fig. 5B). Comparison among flowers and stems revealed 36,666 DEGs, of which 21,683 were up-regulated and 14,983 were down-regulated in flowers (Fig. 5B). All DEGs were annotated in the KEGG database to further describe and evaluate their biological functions. This analysis revealed 32,060 DEGs in flower versus leaf associated with 137 pathways; these DEGs were mainly enriched in the ''plant hormone signal transduction,'' ''plant-pathogen interaction,'' ''circadian rhythm -plant'' and ''photosynthesis -antenna proteins'' pathways (Table S6). In flower versus root, 34,786 DEGs were annotated to 137 pathways and primarily enriched in ''biosynthesis of secondary metabolites,'' ''photosynthesis,'' ''photosynthesis -antenna proteins'' and ''phenylpropanoid biosynthesis'' pathways (Table S7). In flower versus stem, 27,513 DEGs were annotated to 137 pathways and primarily enriched in ''plant hormone signal transduction,'' ''indole alkaloid biosynthesis,'' ''glucosinolate biosynthesis'' and ''phenylpropanoid biosynthesis'' pathways (Table S8). A total of 18,881 up-regulated DEGs showed flower-specific expression with |log2 (fold changes) | >2. We inferred the nature of these DEGs via GOSlim functional analysis. Sequence homology revealed that each of the 18,881 flower-specific DEGs mapped to at least one ontology term, including 2,814 for biological processes, 3,232 for cellular components and 4,508 for molecular function; many genes were enriched in ''cellular process'' and ''catalytic activity'' subcategories within the biological processes and molecular function categories, respectively (Fig. S8).
KEGG enrichment analysis showed that 129 up-regulated DEGs with flower-specific expression encoding BGA (99 DEGs), 4CL (2 DEGs), HCT (13 DEGs), COMT (7 DEGs) and F6 H1 (8 DEGs) were related to coumarin biosynthesis (Table 3). These key enzymes catalyzed the production of different structural coumarins, such as osthole, scopoletin, umbelliferone and esculetin (Fig. 4). The high expression levels of these key enzyme genes were consistent with the high accumulation of coumarins in C. monnieri flowers, as revealed by UV spectrophotometry. Thus, these genes may play critical roles in regulating the production and accumulation of coumarins in C. monnieri flowers.
Transcription factors play an important role in the growth and development of higher plants as well as their responses to the environment (Chen, 2013). We assigned 3,799 unigenes to the MYB, AP2-EREBP, bHLH, NAC, C3H, WRKY and C2H2 families. MYB TFs comprise one of the largest TF families and are important for controlling biochemical processes such as responses to biotic and abiotic stresses, metabolism and signal transduction of hormones (Qing et al., 2009). In the secondary metabolism of plants, MYB TFs mainly participate in the phenylpropanoid metabolic pathway; coumarins are one of the important plant secondary metabolites belonging to the phenylpropanoids (Borevitz et al., 2000;Chezem et al., 2017;Niu, Jiang & Xu, 2016). Overexpression of genes encoding MYB TFs in Arabidopsis results in increased biosynthesis and accumulation of scopoletin, which belongs to the coumarins (Chezem et al., 2017). In this study, we identified 564 unigenes encoding MYB TFs, including 166 TFs up-regulated in flower versus leaf, 145 TFs up-regulated in flower versus stem and 122 TFs up-regulated in flower versus root (Table 4). Among these unigenes, 15 encoding MYB TF were related to phenylpropanoid biosynthesis, and coumarins are derived from the phenylpropanoid pathway (Bourgaud et al., 2006). These unigenes might be crucial for regulating yields of coumarin in C. monnieri.

CONCLUSIONS
We performed comprehensive transcriptome analysis of flower, leaf, stem and root tissues of C. monnieri. Numerous genes involved in coumarin biosynthesis were identified by RNA sequencing. Validation of several genes by qRT-PCR produced results consistent with RNA-Seq data. Our results increase the understanding of relevant metabolic pathways in C. monnieri and provide effective gene expression profile information for C. monnieri.