Comparative Transcriptomics of Metabolites in Wild and Cultivated Pinellia ternata

Background: As a traditional Chinese medicine(TCM) used to treat cough, Pinellia ternata (PT) has been utilized for thousands of years. Studies focused on the genetic makeup underlying the differences between wild and cultivated PT are rare, while wild herb is quite effective and indispensable in the clinical use of TCM. There is still rare researches related to the cause of differences between wild and cultivated PT. With the application of transcriptome technology in the eld of traditional Chinese medicine, High-throughput RNA sequencing (RNA-seq) could be employed to nd out how environment affect the accumulation of active constituent in medicinal herbs. Result: Overall, 133 503 unigenes were assembled, and 1329 genes were differentially expressed in the cultivated PT compared to the wild PT. Of those, 70 DEGs (differentially expressed genes) were signicantl[1]y downregulated and 9 were upregulated in the cultivated PT compared to the wild PT after ltering out genes involved in the biosynthesis and metabolism of amino acids, glutathione, phenylpropane and other constituents. These genes were assumed to be of special interest with respect to the better ecacy of the wild PT. Meanwhile, RT-qPCR was utilized to analysis the expression patterns of 9 selected genes and HPLC were applied to quantify the content of adenine and guanosine in PT which veried the RNA-Seq result. Conclusion: 79 DEGs were found in wild and cultivated PT, which played an important role in the synthesis and metabolism of the active ingredients of PT. And the expression of DEGs is higer in wild PT, indicated the importance of wild medicinal herb resources. The experiment produced robust resources for future investigations on the regulation of active ingredients in PT and Araceae and the promotion of the cultivation and clinical use of PT.


Background
Pinellia ternata (PT), is the dried tuber of Pinellia ternata (Thunb.) Breit, which belongs to the Araceae family. PT has been used for the treatment of cough and in ammation since ancient times (Xie et al. 2016). Currently, PT is widely used clinically for its antitumor, antioxidant, antitussive, expectorant, As a plant adapted to humid environments, genuine P. ternata is distributed in the southern margin of the Huanghuai Plain in China. However, the heavy use of herbicides has had a serious impact on this species.
Additionally, the rapidly developinged economy has caused rising labor costs, and wild resources are scarce. Therefore, P. ternata planting has already shifted to the western and northern regions, such as Guizhou and Gansu (Mu et al. 2013). The accumulation of SMs in PT varies in different growth environments, and its active ingredients are highly variable (Lee et al. 2009). The authenticity and quality of TCMs are directly related to their safety and e ciency in clinical applications (Lv et al. 2016). Thus, the inconsistency of the quality of wild and cultivated PT may directly affect its e cacy. Since the use of PT is still increasing, to ensure the uniformity of its clinical effects, it is necessary to conduct a comprehensive study of PT in different regions and provide support for solving the quality control problem.
The application of omics in the eld of medical research is expanding, and as an emerging highthroughput sequencing technology, RNA-Seq is gradually being applied to transcriptome research due to its low cost, high speed and high precision (Xiao et al. 2016; Marioni et al. 2008), providing a particularly effective technology for the discovery of genes from many organisms and tissue types (Birol et  To date, he investigation of the transcriptome of PT has been performed on the heat stress response of PT leaves and the biosynthesis of alkaloids in PT (Ma et al. 2018;Zhu et al. 2013). However, there has been little research focused on the differences between PT from planted and wild origins. Focusing on this distinction will help to reveal the quality formation of PT.
In this study, RNA-Seq technology based on de novo assembly was applied to conduct transcriptomics on PT from seven different origins (cultivated: S1-S6; wild: S7, CON). The transcript levels of 9 target genes were determined by real-time PCR, which provided a basis for revealing the molecular mechanism of the quality differences in PT from wild and cultivated elds.
Here, we report the rst high coverage comparative transcriptome analysis of wild and cultivated PT from different origins. We analyzed the transcriptomes of PT from six cultivated areas and two wild areas using three biological replicates. We identi ed and compared DEGs in cultivated and wild PT. Our comprehensive transcriptomic analyses provides a basis for revealing the molecular mechanism of the quality difference between wild and cultivated PT as well as robust resources for future investigations on the regulation of active ingredients in PT and Araceae.

Methods
Preparation of plant materials PT tubers were collected from 8 locations in China during 2018 (Table 1). Collected tuber samples were rinsed clean frozen immediately in liquid nitrogen and stored at -80°C before RNA extraction.
Library construction for transcriptome analysis Total RNA was extracted using the mirVana miRNA Isolation Kit (Ambion) following the manufacturer's protocol. RNA integrity was evaluated using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). The samples with an RNA integrity number (RIN) ≥ 7 were subjected to subsequent analyses. The libraries were constructed using the TruSeq Stranded mRNA LTSample Prep Kit (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. Then, these libraries were sequenced on an Illumina sequencing platform (HiSeqTM 2500 or Illumina HiSeq X Ten), and 125 bp /150 bp pairedend reads were generated. The raw data were submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under accession No. PRJNA623739. Quality control and de novo assembly Transcriptome sequencing and analyses were conducted by OE Biotech Co., Ltd. (Shanghai, China). Raw data (raw reads) were processed using Trimmomatic. The reads containing poly-N and the low-quality reads were removed to obtain the clean reads. After removing adapter and low-quality sequences, the clean reads were assembled into expressed sequence tag clusters (contigs) and de novo assembled into transcripts by using the paired-end method in Trinity (version: 2. 4).

Annotation of gene functions
The function of the unigenes was annotated by alignment of the unigenes with the NCBI non-redundant (NR), SwissProt, and Clusters of orthologous groups for eukaryotic complete genomes (KOG) database using Blastx with a threshold E-value of 1e −10 . The proteins with the highest hits to the unigenes were used to assign functional annotations. Based on the SwissProt annotation, Gene Ontology (GO) classi cation was performed by mapping the relationships between SwissProt and the GO terms. The unigenes were mapped to the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to annotate their potential metabolic pathways.
Unigene quanti cation, analysis of differentially expressed unigenes (DEGs), cluster analysis, GO and KEGG enrichment The FPKM and read count values of each unigene were calculated using bowtie2 and eXpress, respectively. DEGs were identi ed using the DESeq (2012) functions estimate Size Factors and nbinom Test. A P value < 0. 05 and fold change > 2 or < 0 were used as the thresholds for differential gene expression. Five was set as the threshold for signi cantly differential gene expression. Hierarchical cluster analysis of DEGs was performed to explore transcript expression patterns. GO enrichment and KEGG pathway enrichment analyses of DEGs were respectively performed using R based on the hypergeometric distribution.
Real-time quantitative PCR (RT-qPCR) for expression validation The expression patterns of 9 genes associated with metabolite and biosynthesis pathways related to the active ingredients in PT were analyzed using RT-qPCR (gene-speci c primers are provided in Additional le 1: Table S1). Total RNA was isolated from the roots using TRIzol (Tiangen Bio Co., Ltd., Beijing, China), and cDNA was synthesized using the PrimeScript™ RT Reagent Kit with a gDNA Eraser. RT-qPCR was conducted using SYBR® Premix Ex Taq™ II according to the manufacturer's instructions (Takara, Tokyo, Japan). The GAPD genes (5-TGCTGGGAATGATGTTGAATG-3 and 5-TTGGCATTGTTGAGGGTTTG-3) were used as internal standards. The PCR program was as follows: 30 s of predenaturation at 95°C, followed by 40 cycles of 5 s at 95°C and 40 s at 60°C and the steps for dissociation curve generation (10 s at 95°C, 60 s at 60°C, and 15 s at 95°C). The relative gene expression of each target gene was determined using the 2 −ΔΔCt method.
Determination of adenine and guanosine HPLC were applied to determine the content of adenine and guanosine in wild and cultivateed PT. Two gram of the dry powder of PT were weighed and transferred into 50 mL erlenmeyer ask. 10mL of puri ed water was used for extraction. The vials containing the samples and the solvents were ultrasonic extracted at room temperature for 60 min, and then the solutions were ltered with 0.22μm micro-porous membrane before injection. Stock solution, approximately 1 mg of each adenine and guanosine reference standard were accurately weighed into a centrifuge tube, dissolved in 1 mL methanol by sonication for 1 min. Working standard solutions were prepared by 20 fold dilution of the stock solution with methanol prior to HPLC analysis.

Results
De novo assembly of transcripts cDNAs prepared from tubers of PT were sequenced using the Illumina HiSeq platform.
As a result of sequencing, more than 48 595 258 clean reads and a total of 62 348 392 clean bases were obtained from 24 PT samples. The Q30 percentage was over 93.56% (Table 2), and the distribution of GC content frequency ranged from 49.63% % to 51.82%, showing that the throughput and sequencing quality were high enough for further analysis.
A nal set of unigenes was obtained by the splicing method in Trinity software and paired-end and clustered by CD-HIT software. The de novo assembly generated 57 154 unigenes that were considered for downstream analysis (Table 3, Fig. 1). The length distributions of the unigenes are shown in Table 3. There were 36 007 unigenes with lengths above 1000 bp, and the maximum length was 13 803 bp. Based on these unigenes, 76 894 unigenes above 500 bp were constructed. The length of unigenes ranged from 301 bp to 13803 kb, with an average length of 861 25 bp.

Functional annotation and classi cation
Unigenes were compared to the NR, KOG, GO, Swiss-Prot, eggNOG, and KEGG databases by diamond software, and the Pfam database was used to perform unigene functional analysis using HMMER software. A total of 53 323 unigenes (39.94% of the total) were matched in the NR database, 31 981 unigenes (23.96% of the total) were matched in the KOG database, 36 197 unigenes (27.11% of the total) were matched in the Swiss-Prot database, 18,831 unigenes (14.11% of the total) were matched in the KEGG database, 33 006 unigenes (24.72% of the total) were matched in the GO database, 123 unigenes (0.09% of the total) were matched in the Pfam database, and 48 128 unigenes (36.05% of the total) were matched in the eggNOG database with an E-value <1e−5 (Table 4).
To explore the pathways in which these annotated genes were contained, KEGG analysis with E values ≤1e−5 was also conducted. A total of 18 831 (14.11%) annotated unigenes were assigned to 182 pathways that corresponded to four KEGG modules: metabolism, environmental information processing, genetic information processing and cellular processes. (Fig. 3, Additional le 3: Table S3).
Among them, 9 229 unigenes were assigned to metabolic pathways. As shown in Fig. 3, the dominant pathways related to metabolism were mainly classi ed as carbohydrate metabolism (1 833), followed by global and overview map metabolism (1 232 unigenes), amino acid metabolism (1 108 unigenes), and energy metabolism (1 062 unigenes). Metabolic pathway analysis enables us to determine which genes in the same pathway cooperate with each other as they exercise their biological functions in PT.
Overall, the possible functions of the assembled unigenes were assessed by similarity matches with the GO and KEGG databases. The results of these database searches help us to better understand the biological features of PT.
Differentially expressed genes (DEGs) in PT from wild and cultivated elds Unigene expression was calculated using their FPKM values (Fragments Per kb Per Million Reads), and principal component analysis (Fig. 4) was performed according to the expression levels of each sample, showing that the samples from the same origin were concentrated, and the samples from the S5 and S6 groups in Hebei Province had similarities. Based on the multiple difference and signi cant difference test results, 16 242 differentially expressed unigenes were screened, of which 8 598 were upregulated and 7 644 were downregulated (P value < 0.05, fold change > 2). Unsupervised hierarchical clustering analysis was performed on the differentially expressed unigenes. As shown in the heat map in Fig. 5, there were signi cant differences between the cultivated and wild PT samples. Overall, these results indicate that there are signi cant differences between PT in different regions and between cultivated and wild PT, which help in categorizing cultivated accessions.
To better understand the biological functions of the speci c transcripts of wild and cultivated plants, KEGG and GO databases were both utilized for pathway analysis of the different unigenes (combined with KEGG annotation results). Signi cantly differentially expressed genes related to metabolic pathways of active ingredients in PT (screening for unigenes of a single group of samples≥ 3 and P value≤0. 01) were selected and are shown in Additional le 4: Table S4; 70 of these genes were signi cantly downregulated and 9 genes were upregulated.

Validation of RNA-seq analysis by RT-qPCR
To con rm the RNA-Seq results, the average expression levels of 9 selected genes, β-glucosidase (βGlu), β-fructofuranosidase (FFase), sucrose-phosphate synthase (SPS), glutamate decarboxylase (GDA), glutamate decarboxylase (GS), 5'-nucleotidase (5'-NT), caffeic acid 3-O-methyltransferase (COMT), glutathione S-transferase (GST) and glutathione peroxidase (GPX) were quanti ed via RT-qPCR (Additional le 5: Table S5), and the average expression levels (FPKM values) between wild and cultivated samples were compared. There was a high degree of correlation between the RT-qPCR results and the RNA-Seq results (Fig. 6). The expression levels of the 9 selected genes were higher in wild PT than in cultivated PT. The RT-qPCR data indicated that genes were regulated as indicated in the transcriptomic data, which further supported the accuracy of the transcriptomic results.

Determination of adenine and guanosine
The Chromatogram of catechin standards and PT are shown in Fig. 7. 10 cultivated PT and 3 wild PT dry extracts were analysed and the results are shown in Table 5 and Fig. 8. As expected, wild PT contained a higher quantity of adenine and guanosine than cultivated PT.

Discussion
Pinellia ternata is one of most important medicinal herbs in China, which makes it vital for genomic studies. With the transformation of cultivation areas and scarcity of wild PT resources, to ensure clinical e cacy, there is a growing need for the genetic characterization of wild and cultivated PT. Using transcriptomic evidence, we provided an in-depth view of the differences between wild and cultivated PT, which may explain the better clinical e cacy of wild PT. Among the DEGs screened in this experiment, compared with cultivated PT, wild PT had signi cantly more upregulated DEGs, which were involved in the synthesis and metabolic pathways of the active ingredients widely known to exist in PT (including amino acids, nucleosides, etc.).
Ascorbic acid (AsA) plays important roles as a major antioxidant and free-radical scavenger in plants. The L-galactose pathway is the main pathway for AsA biosynthesis in higher plants (LinsterC and ClarkeS 2008). As a key enzyme, GGP catalyzes GDP-L-galactose to L-galactose-1-phosphate, which is bene cial to the formation of AsA and thus responds to abiotic stress. In addition to GPX (downregulated) in cultivated PT), APX also plays a vital role in plant antioxidation and abiotic stress, including abiotic stress responses induced by light and temperature (Yoshimura et al. 2000;Kim et al. 2007). CAT has a high a nity for hydrogen peroxide in plant cells. It regulates the plant stress response mainly by removing hydrogen peroxide generated during mitochondrial electron transfer and fatty acid oxidation (Jiang et al. 2002;Xing et al. 2007).
We also found that the DEGs involved in the metabolism of glutathione (GSH) and related to the plant stress response were also upregulated in wild PT, including GPX, GST, glutathione reductase (GR), and aminopeptidase N (pepN).
GSH is one of the main sulfur reducers in living organisms and can be resistant to stress, such as cold damage and drought (Grill 1978;Schupp and Rennenberg, 1988). It combines the electrophilic groups of certain harmful substances with the thiol group of reduced glutathione to form more soluble and nontoxic derivatives of these substances (Coleman 1997). Increased GST activity protects plants from stresses, for example, from organic pollutants and heavy metals (Kirchhoff and P ugmacher et al. 2000;P ugmacher et al. 2007). GR deoxidizes oxidized glutathione disul de to reduced glutathione, which helps to eliminate reactive oxygen species (ROS) and participates in the plant ascorbic acid GSH cycle. Therefore, wild PT may suffer more environmental stress, making its AsA biosynthesis and metabolic activity signi cantly stronger than those of cultivated PT. bglX can hydrolyze β-glucoside and cellulose to glucose together with other cellulases; SPS and SPP recombine degraded sucrose for transport, storage and metabolism. The catalytic effect of SPS is irreversible. Huber (1983) pointed out that SPS activity is negatively correlated with starch accumulation but directly proportional to sucrose formation. Inv hydrolyzes sucrose into fructose and glucose, and fructose needs to be phosphorylated by ScrK or hexokinase before further metabolism. ScrK plays a major role in the metabolism and distribution of library tissues (David 2007), and it can be used as a hexose sensor and signal molecule to regulate the metabolism and growth of plants (Rolland et al. 2006). Trehalose is the most stable sugar of natural disaccharides catalyzed by TPS and TPP. Studies have shown that trehalose could enhance the resistance of plant cells to various abiotic stresses (Elbein et al. 2003). TPP has little effect on trehalose accumulation (Blázquez et al. 1998); thus, TPS may have a decisive effect on trehalose synthesis. It can be concluded that starch and sucrose metabolism in PT from wild elds was more active than in PT from cultivated elds.
Following starch and sucrose metabolism, genes become involved in glycolysis and the TCA cycle. ALDO catalyzes the conversion of fructose 1,6-diphosphate to glyceraldehyde 3-phosphate and dihydroxyacetone phosphate during glycolysis. This reaction is reversible. GOLS and PK are upregulated genes in cultivated PT that generate 1,3-bisphosphoglycerate and pyruvate during rst and second substrate level phosphorylation, respectively. PK is one of the most critical rate-limiting enzymes in glycolysis. MDH catalyzes the dehydrogenation reaction of malic acid to oxaloacetate, which is accompanied by the reduction of NAD to NADH that may improve the TCA cycle (Chang et al. 2013). ACO is an important iron-thioproteinase in plant cells. In the TCA cycle, ACO catalyzes the production of isocitrate from citric acid in the cell via the intermediate product acis. This indicated that the TCA cycle was stronger, while glycolysis was weaker in wild PT than in cultivated PT.
GS and GAD are two vital enzymes in the amino acid metabolism pathway. GS reduces the toxicity of excessive ammonium ions to plants by catalyzing the synthesis of glutamine. Glutamine is the main storage and transport form of ammonia. NH4+ participates in the process of glutamate formation through the glutamine synthetase/glutamic acid synthase cycle. As an osmotic regulator, proline is formed through the glutamic acid pathway and ornithine pathway and could improve the salt tolerance of plants. Nancy's (Roosens et al. 1999) study proved that GS could promote the synthesis of proline.
Glutamate-semialdehyde (GSA) synthesizes proline through the glutamate pathway under the catalysis of P5CR (Rocha et al. 2012). Many studies have shown that the overexpression of the P5CR gene can improve plant resistance to drought. For example, under salt stress, P5CR-overexpressing sweet potatoes have increased proline contents and enhanced salt tolerance via the regulation of osmotic pressure (Gut et al. 2009). GAD is a pyridoxal phosphate-dependent enzyme that catalyzes the decarboxylation of glutamic acid to γ-aminobutyric acid. GAD activity is mainly regulated by the combination of pH and calcium ion/calmodulin (CAM) (Gut et al. 2009). When exposed to environmental stimulation (hypoxia, low temperature, acid stimulation, high salt penetration, etc.), H+ and Ca2+ in the plant will increase signi cantly and promote GAD activity.
MCCA and carbamoyl phosphate synthetase belong to the same enzyme family that could convert the leucine intermediate metabolite 3-methylcrotonyl-CoA to 3-methylpentadienoyl-CoA (Wurtele and Nikolau 2000). In Arabidopsis thaliana, the gene numbered AT1G03090 encodes MCCA, a subunit of methylcrotonyl-CoA carboxylase, which is biotinylated to participate in the degradation of leucine and further affects plant development (Ding et al. 2012). In plants containing purine alkaloids, such as cocoa and coffee, MCCA is essential in the purine alkaloid synthesis pathway by participating in the synthesis of the precursors of these substances (Zrenner et al. 2006;Ashihara et al. 2008).
During serine metabolism, hydroxypyruvate is reduced to glyceric acid by HPR and returned to the chloroplast, and glycerol is catalyzed to the nal product 3-phosphoglycerate by glycerol 3 kinase (Bauwe et al. 2010).
Polyamines (PAs) in plants mainly include putrescine (Put), sub-spermidine (Spd), and spermine (Spm). PAO can oxidatively degrade polyamines and reduce excess free PAs. Its diverse degradation products are also vital for plant growth and development and stress adaptation ). In summary, amino acid metabolism in PT from wild elds was more active than in PT from cultivated elds.
PLA2 catalyzes the hydrolysis of the acyl bond at the 2-position of glycerol phosphate to generate free fatty acids and lysates, which play an important role in cell signal transduction, biosynthesis of bioactive lipids, and the release of linolenic acid from phospholipids. GDPD catalyzes the hydrolysis of glycerol phosphodiesters into an alcohol molecule and glycerol triphosphate. Its products are involved in many biochemical pathways. Alpha-linolenic acid is the substrate of a synthetic pathway of jasmonic acid. AOC synthesizes 12-oxo-plant dienoic acid (OPDA) in this pathway. OPDA then undergoes reduction and oxidation to synthesize jasmonate. Jasmonic acid can induce tuber formation during plant growth and development (McConn 1996). Triacylglycerol TAG is a storage lipid and an important carbon source to support seedling development after seed germination. DGAT catalyzes the last step of the TAG synthesis pathway and is the only rate-limiting enzyme in this pathway (Settlage et al. 1998). MGAT2 is a membrane-bound acyltransferase that belongs to the diacylglycerol acyltransferase (DGAT) gene family. ACOT hydrolyzes CoA into free fatty acids (FFA) and coenzyme A (CoASH) and participates in essential physiological processes by maintaining appropriate levels of intracellular lactoyl CoA, FFA and CoASH

Biosynthesis of other secondary metabolites
As an abundant and widely studied effective substance in PT, nucleoside alkaloids are considered to be more suitable for quality control than succinic acid (Lua et al. 2011) which makes an interesting study topic in future research on the differences between wild and cultivated PT an interesting target of study in future PT research. We found increased expression of 2 DEGs related to nucleoside alkaloids in wild accessions: 5'-NT and adenosine deaminase (ADA). Various nucleotides in plants are catalyzed to nucleosides through 5'-NT. ADA, a thiolase, catalyzes the conversion of adenine nucleoside to hypoxanthine nucleoside and then generates hypoxanthine through the action of nucleoside phosphorylase, which plays a vital role in purine nucleoside metabolism. This may indicate that nucleoside alkaloid biosynthesis was stronger in wild PT than in cultivated PT.
Vitamin E is a kind of fat-soluble vitamin necessary for the human body. It is an important effector substance on the cell membrane of plants that can remove singlet oxygen and inactivate reactive oxygen radicals under oxidative stress (Valentin et al. 2005). The tocopherol O-methyltransferase (γ-TMT) and tocopherol cyclase (TC) genes were upregulated in this study, and are responsible for the synthesis of vitamin E. γ-TMT is the last step in the biosynthetic pathway of vitamin E that catalyzes the methylation of γ and δ-tocopherol to α and β-tocopherol. TC catalyzes different intermediates in tocopherol synthesis to form aromatic rings. Kumar R's (2005) study found that overexpression of the TC gene in rapeseed could increase the total tocopherol levels and change the proportion of each component of tocopherol. It can be concluded that the biosynthesis of other SMs in wild PT was stronger than that in cultivated PT.

Conclusion
At present, studies of Pinellia ternata (PT) are mainly focused on the mechanism of its toxicity. However, since PT is a widely distributed species with an ancient history of human interaction, there is a growing need for the genetic characterization of PT accessions to understand the genetic basis controlling its various nutraceutical traits. However, the genetic study of the pharmacodynamic components of PT is limited.
In this work, we sequenced tuber transcriptomes of 8 different wild and cultivated plant accessions of PT with the RNA-Seq method, generating an important genomic resource for understanding the native biodiversity of the herb. In particular, the identi cation of sequence and expression polymorphisms of transcripts related to secondary metabolite metabolism resulted in a total of 114 979 080 paired end reads, which were assembled into unigenes with an average length of 861 25 bp. Further analysis of 79 genes related to metabolic pathways that were signi cantly differentially expressed allowed the exploration of the molecular mechanisms of PT quality formation under different growing conditions and the quality control of PT. RT-qPCR and HPLC determination results show that RNA-Seq is a reliable method for transcriptome research of PT. The data reported here will help further genomic projects to expand our knowledge of this valuable TCM.   Figure 1 Distribution of unigenes.     Chromatogram of adenine and guanosine standards A and a PT extract B).

Figure 8
Box plot of the content of adenine and guanosine in wild and cultivated PT.