Transcriptome analyses provide insights into the difference of alkaloids biosynthesis in the Chinese goldthread (Coptis chinensis Franch.) from different biotopes

Coptis chinensis Franch., the Chinese goldthread (‘Weilian’ in Chinese), one of the most important medicinal plants from the family Ranunculaceae, and its rhizome has been widely used in Traditional Chinese Medicine for centuries. Here, we analyzed the chemical components and the transcriptome of the Chinese goldthread from three biotopes, including Zhenping, Zunyi and Shizhu. We built comprehensive, high-quality de novo transcriptome assemblies of the Chinese goldthread from short-read RNA-Sequencing data, obtaining 155,710 transcripts and 56,071 unigenes. More than 98.39% and 95.97% of core eukaryotic genes were found in the transcripts and unigenes respectively, indicating that this unigene set capture the majority of the coding genes. A total of 520,462, 493,718, and 507,247 heterozygous SNPs were identified in the three accessions from Zhenping, Zunyi, and Shizhu respectively, indicating high polymorphism in coding regions of the Chinese goldthread (∼1%). Chemical analyses of the rhizome identified six major components, including berberine, palmatine, coptisine, epiberberine, columbamine, and jatrorrhizine. Berberine has the highest concentrations, followed by coptisine, palmatine, and epiberberine sequentially for all the three accessions. The drug quality of the accession from Shizhu may be the highest among these accessions. Differential analyses of the transcriptome identified four pivotal candidate enzymes, including aspartate aminotransferaseprotein, polyphenol oxidase, primary-amine oxidase, and tyrosine decarboxylase, were significantly differentially expressed and may be responsible for the difference of alkaloids contents in the accessions from different biotopes.

Despite the prominent roles of the Chinese goldthread in medicine, our understanding of its metabolic pathways of active alkaloids components is limited by a lack of genomic resources. In recent years, a number of studies focused on effective separation methods and the effectiveness of alkaloids and therapy against various diseases (Tjong et al., 2011;Cui et al., 2016;Friedemann et al., 2016;Kim et al., 2016;Zhang et al., 2011). However, the identification, purification and characterization of functional genes involved in metabolic pathways of active alkaloids components has received less attention, and only limited number of genes, such as (S)-Tetrahydroberberine oxidase ((S)-THBO) (Okada et al., 1988) and (S)-norcoclaurine synthase ((S)-NCS) (Bonamore et al., 2010) that involved in the biosynthetic pathway of berberine, has been investigated and reported. Moreover, although the distribution of the Chinese goldthread is limited to narrow regions west of China, such as Chongqing, Shaanxi and Guizhou province, due to variants of climate, topography and other environmental factors, the chemical composition of the alkaloids and the quantity of each alkaloids component in different biotope is different, resulting in different drug efficacy (Geng et al., 2010).
Deciphering genomes of medicinal species is of great importance in understanding and improving these poorly investigated species and will enable insights into the biochemistry and evolution of genes responsible for secondary metabolism biosynthetic pathways (Chen et al., 2012;Zhu et al., 2015;Yan et al., 2015;He et al., 2016a;Kellner et al., 2015). A pilot project on whole genome sequencing of the Chinese goldthread revealed that its genome size is nearly 1,116 Mb and the heterozygosity rate is as high as ∼1.1% (Sun et al., 2014), suggesting that producing a draft genome is not feasible if only using short read Illumina sequencing technology. More sophisticated method and platforms, such as combining with BAC library sequencing and/or third-generation sequencing platforms, maybe achieve a usable assembly version, but this is prohibitive to small research communities due to the expense, time, and expertise required. Fortunately, another promising technology is whole transcriptome shotgun sequencing (also called RNA Sequencing, RNA-Seq). High-quality transcriptome data would facilitate researches of functional genes and is also valuable for comparative biology studies. By using short paired-end reads, RNA-Seq has been proved to be an effective tool for transcripts reconstruction and quantification in a large variety of species (He et al., 2016b;Yang et al., 2014).
The specific goals of this study are to: (1) generate high-quality transcripts and unigenes sequences for the Chinese goldthread, which will provide reference transcriptome for further analyses; (2) identify pivotal candidate genes responsible for active metabolites biosynthesis in the Chinese goldthread. To set up the foundation for genomic studies of the Chinese goldthread, we sequenced its transcriptome using RNA-Seq. Based on the content of active components in producing areas, we analysed the active metabolites biosynthesis and explored pivotal candidate genes responsible for active metabolites biosynthesis.

Plant materials
The Chinese goldthread accession T01 was collected in Zhenping, Shaanxi, China; accession T02 was collected from Zunyi, Guizhou, China; and accession T03 was collected in Shizhu, Chongqing, China. Among them, only T03 was collected from genuine producing area. Whole plant of each accession was harvested and all samples were immediately frozen in liquid nitrogen and stored at −80 • C for later use.

Detection of alkaloids by HPLC
HPLC analysis was performed on an Agilent 1200 Series LC system (Agilent Technologies, Ratingen, Germany), including a quaternary pump, an on-line degasser, a diode array detector, an auto-sampler and a column compartment. The chromatographic separation was performed on an Xtimate C18 column (4.6 mm × 250 mm, 5 µm). The mobile phase consisted of acetonitrile (A) and 30 mmol/L ammonium bicarbonate buffer containing 0.7% (v/v) ammoniae aqua and 0.1% (v/v) triethylamine (B). The gradient elution program was as follows: 10-25% A at 0-15 min, 25-30% A at 15-25 min, and 30-45% A at 25-40 min. The flow rate was kept at 1.0 mL/min. Column temperature was kept constant at 30 • C, and the injection volume was 10 µL. The detection wavelength was set at 270 nm. Sample preparation was performed according to our previously reported method (Fan et al., 2012). The dried powders of the Chinese goldthread samples (0.1 g) were accurately weighed into a clean Erlenmeyer flask, and macerated in 50 mL of hydrochloric acid-methanol solution (1:100, v/v). The sample was extracted for 30 min in an ultrasonic bath at room temperature and the loss of weight due to evaporation of solvent was replenished with hydrochloric acid-methanol solution (1:100, v/v). The sample solutions were filtered through a 0.45 µm membrane filter before HPLC analysis. The concentrations of berberine, palmatine, coptisine, epiberberine, columbamine, and jatrorrhizine in each sample with three replications were determined by using established calibration curves. The concentration (mg/g) is defined as the milligram of alkaloid per one gram dried tissue. For each kind of alkaloid, statistical analysis on one-way analysis of variance (ANOVA) was performed using Microcal Origin software (version 8.0).

RNA isolation, cDNA library preparation and Illumina sequencing
Total RNA of the three samples was isolated from three accessions respectively using a modified CTAB method and purified by Micropoly (A) PuristTM mRNA purification kit (Ambion, Austin, TX, USA) according to manuals. The purity, concentration and integrity of the RNA were verified using NanoDrop 1000 spectrophotometer, Qubit 2.0 fluorometer and Agilent 2100 Bioanalyzer, respectively. Then, the mRNA was isolated using oligo (dT) magnetic beads and broken into fragments with a certain amount of Fragmentation Buffer. Fragments were used as templates to produce the first strand cDNA synthesis using random hexamers-primer. Buffer, dNTPs, RNaseH and DNA polymerase I were next added into the mixture to produce the second strand cDNA synthesis. The mixed cDNA was purified with AMPure XP beads. Then, ends of the purified cDNA were repaired and poly (A) was added. This was followed by the selection of suitable fragments using AMPure XP beads. PCR amplifications of fragments were performed to create final cDNA libraries for sequencing. After library construction, the concentration and insert size of the three libraries were tested using Qubit 2.0 fluorometer and Agilent 2100 Bioanalyzer, respectively. The valid concentration of the three libraries was accurately quantified with Q-PCR method to ensure reliability. The cDNA libraries were sequenced on the HiSeq2500 platform (Illumina, USA) with reads length 100 bp, and the sequencing reads were submitted to NCBI (

De novo assembly of transcripts and unigenes
The raw sequencing reads from transcriptome libraries were filtered with the following steps to obtain high-quality clean reads for de novo assembly. First, reads with adaptor sequences were removed. Then, low-quality reads, which are defined as low quality bases (Q < 20 or 'N') are more than 10%, were discarded. After filtering, the clean reads from three samples were pooled and then assembled into transcripts using Trinity (Grabherr et al., 2011) (https://github.com/trinityrnaseq/trinityrnaseq/wiki). Transcripts shorter than 300 bp were removed to eliminate redundant sequences, and the longest transcript in each cluster was selected and represented as the final unigenes.

Annotation of the Chinese goldthread unigenes
Putative coding sequences (CDS) were predicted using the GetORF function in the EMBOSS software packages (Rice, Longden & Bleasby, 2000), and these CDS were translated into amino acid sequences according to the Standard Amino Acid Codon table. Unigenes were aligned to functional databases, including non-redundant (NR), SwissProt (Boeckmann et al., 2003), the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa & Goto, 2000) and the Clusters of Orthologous Groups of proteins (COG) (Tatusov et al., 2003), by carrying out a BLAST (Camacho et al., 2009) analysis using an E-value cutoff of 1e-5. The resulting NR BLASTP hits were further processed by the Blast2GO software (Conesa et al., 2005) to retrieve associated GO terms (Ashburner et al., 2000). Bowtie (Langmead et al., 2009) was used to align clean reads of each sample to the unigene set and the RSEM program (Li & Dewey, 2011) was used to estimate gene expression Figure 1 The contents of six main alkaloids in the Chinese goldthread from three biotopes. For each kind of alkaloid, statistical analysis on one-way analysis of variance (ANOVA) was performed using Microcal Origin software (version 8.0). Significant differences (P < 0.05) between treatments are indicated by different letters. mg/g: the milligram of alkaloid per one gram dried tissue. levels. FPKM (Fragments Per Kilobase of transcript per Million mapped reads) was used to represent the expression levels of corresponding unigene (Trapnell et al., 2012). EBSeq (Leng et al., 2013) was used to perform differential expression analysis and to identify differential expression gene sets between samples. Benjamini-Hochberg method was used to correct p-value to avoid for occurrence of false positives. In our analysis, the threshold of FDR (false discovery rate) <0.01 and log2FC ≥ 2 (≥4 fold change between two samples) were set to identify differentially expressed genes (DEGs). To reveal the expression pattern of DEGs in each biotope, a hierarchical clustering, which could cluster the genes with same and similar expression level, was performed. To explore the function of DEGs, DEGs were annotated to COG (Cluster of Orthologous Groups of proteins) database (Tatusov et al., 2003) to classify reliable orthologs. DEGs were also annotated to GO databases and KEGG database and KEGG enrichment analysis of DEGs was conducted using fisher's exact test.

Chemical analysis of the main alkaloids in the Chinese goldthread from three biotopes
The developed HPLC method has been proven to be an efficient method in detecting the main alkaloids of Chinese goldthread, and was applied to simultaneously analyse main alkaloids. In this study, alkaloids in the samples from Shizhu (T03), Zunyi (T02) and Zhenping (T01) were determined (Fig. 1). Totally, six major alkaloids were quantified, i.e., berberine, palmatine, coptisine, epiberberine, columbamine, and jatrorrhizine, and subsequently evaluated by comparison with authentic standards. For each sample, berberine has the highest concentrations, followed by coptisine, palmatine, and epiberberine sequentially. More interestingly, we found that the concentrations of each alkaloid and total alkaloids in sample from Shizhu are all higher than those in samples from other two biotopes. Especially, for the most abundant berberine, the concentration is 82.44 ± 7.65 mg/g in Shizhu (T03), which is higher than the value of 77.17 ± 1.87 mg/g in Zunyi (p > 0.05) and 63.79 ± 5.76 mg/g in Zhenping (T01) (p < 0.05). As the effective constituent in the Chinese goldthread, berberine has been widely used as marker of the drug quality in China (Fan et al., 2012). Therefore, we suggest that the drug quality of the Chinese goldthread from Shizhu (T03) may be the highest among the three biotopes.

De novo transcriptome assembly captures high-quality transcripts and unigenes
To obtain a comprehensive reference unigenes of the Chinese goldthread, transcriptome of three accessions from three biotopes were sequenced using the Illumina HiSeq platform (Table 1). After the removal of adapter sequences contaminated reads, low-quality reads and duplicated reads, a total of 14.89 GB clean data was obtained, including 25,088,487, 23,905,895 and 24,723,011 pairs for Zhenping (T01), Zunyi (T02) and Shizhu (T03) respectively. These clean reads were then pooled and assembled by Trinity (k-mer length = 25) program, producing 155,710 transcripts and 56,071 unigenes ( Fig. 2A, Table S1). The N50 value was 1,864bp and 1,270 bp for transcripts and unigenes respectively. To assess the quality of assembly, clean reads were mapped to the assembled unigenes. More than 81.69%, 83.24%, and 80.29% of clean reads were mapped for three samples respectively (Table 1). We further used a core eukaryotic gene mapping approach [CEGMA (Parra, Bradnam & Korf, 2007)] to identify the core genes in the Chinese goldthread transcripts and unigenes; and more than 98.39% and 95.97% of core eukaryotic genes were found in the transcripts and unigenes respectively (Table 2). These results indicated that our assembled transcripts and unigenes were excellent for downstream analyses.

Characteristics of the Chinese goldthread unigenes
Simple sequence repeats (SSRs) are tandem sequences widely distributed across the entire genome and are of high genetic diversity. Using MISA (http://pgrc.ipkgatersleben.de/misa/), 6,224 SSRs were identified in 14,585 unigenes with length over 1kb (Fig. 2B). Approximately 31.6% (4,608 out of 14,585) of filtered unigenes have SSRs. The most prevalent SSRs type is mono-nucleotide SSRs, which is followed by tri-nucleotide SSRs. 529 SSRs presented in compound form. A total of 8.17% of the sequences have more than one SSR.
To further explore the genetic diversity in genic region in different biotopes, clean reads from three samples were mapped to unigenes to call variants (Table S2). Totally, 1,037,576 SNPs were identified in all three samples, and among of them, 970,763 are biallelic. For the three accessions from Zhenping (T01), Zunyi (T02), and Shizhu (T03), 598,441, 567,075, and 575,893 biallelic sites were identified respectively and 171,824 sites were shared by all the accessions (Fig. 2C). Only about 60% of total biallelic sites identified in each biotope and 18% of total biallelic sites identified in all the three biotope indicated a high variance among the biotopes. Moreover,520,462,493,718,and 507,247 heterozygous biallelic sites were identified in the three accessions from Zhenping (T01), Zunyi (T02), and Shizhu (T03) respectively (Table S2). These heterozygous sites contribute to as highly as ∼1% of total length of unigenes, indicating high polymorphism in coding regions of the Chinese goldthread. This heterozygosity rate in coding regions is comparable to the whole genome heterozygosity rate (∼1.1%) reported in a previous genome survey (Sun et al., 2014), and further demonstrated that the genome of the Chinese goldthread is highly heterozygous.

Annotation of the Chinese goldthread unigenes
Annotation provides important information of the gene repositories. Totally, 56,028 CDSs were predicted by GetORF (Rice, Longden & Bleasby, 2000), with the total length reaching 27,544,206 bases. The N50 of the predicted CDS for unigenes was 858 bp, while the mean length was about 492 bp (Table S3). Approximately 100% of assembled unigenes were annotated with CDS confirming the high quality of the de novo assembly. Functional annotation of the assembled unigenes was obtained according to the sequence similarity to functional databases including NR, Swiss-Prot, KEGG, COG and GO. 19,274 unigenes were categorized into 25 functional COG clusters (Fig. 2D), and 24,991 unigenes were annotated to 128,452 GO terms (Fig. 2E), with 59,470 (46.30%) for biological processes, 30,102 (23.43%) for cellular components, and 38,880 (30.27%) for molecular functions. To identify biological pathways of unigene, unigenes were annotated to KEGG database and a total of 11,573 unigenes were identified in 120 KEGG pathways (Table S4). Totally, 37,100 unigenes were functionally annotated in the five databases (Table S5), making up more than 66% of total unigenes.

Candidate genes involved in alkaloids biosynthesis pathways
The rhizome of the Chinese goldthread contains at least six effective medicinal ingredients, including jateorhizine, columbamine, epiberberine, coptisine, palmatine and berberine (Fig. 1). Four out of the six medicinal components, including the major component berberine, palmatine, epiberberine, and columbamine, belong to isoquinoline alkaloid. Therefore, we surveyed candidate genes in the Chinese goldthread involved in the isoquinoline alkaloid biosynthesis pathway (http://www.genome.jp/kegg-bin/show_ pathway?map00950). Surveying the KEGG annotation for all of the unigenes, 32 unigenes were mapped to the isoquinoline alkaloid biosynthesis pathway (Fig. S2, Table S15). Totally, 12 DEGs were identified in this pathway (Table S16). 8 DEGs were identified between Zhenping (T01) and Zunyi (T02), 8 DEGs were identified between Zhenping (T01) and Shizhu (T03), and 9 DEGs were identified between Zunyi (T02) and Shizhu     (Fig. 4). Isoquinoline alkaloids are tyrosine-derived plant alkaloids with an isoquinoline skeleton. The tyrosine was first transformed to reticuline, an important precursor of various benzylisoquinoline alkaloids, including analgesic compounds of morphine and codeine, and anti-infective agents of berberine, palmatine, and magnoflorine (Fig. S2). Interestingly, all the four differentially expressed enzymes located in the upstream of the biosynthesis of reticuline, indicating that the biosynthesis of reticuline may be the key regulatory steps for the active alkaloids biosynthesis. The aspartate aminotransferase (AAT, EC 2.6.1.1) catalyses the transamination between glutamate and oxaloacetate to generate 2-oxoglutarate and aspartate, the precursor for the biosynthesis of amino acids and derived metabolites. The expression levels of the five AAT genes are significantly different among the three accessions (Fig. 4), indicating possible different efficiency in the formation of aromatic tyrosine. The tyrosine decarboxylase (TYDC, EC: 4.1.1.25) catalyses the transformation between tyrosine and tyramine, and all the four TYDC genes have the highest expression level in accession Shizhu (T03) (Fig. 4), which also has the highest alkaloids concentration (Fig. 1). Therefore, the TDC is most likely to have an important function in the initial reaction of the active alkaloids biosynthesis pathway in the Chinese goldthread. Altogether, one possibility that the AAT control the formation of aromatic tyrosine and the TYDC exerts regulatory control over tyrosine through the alkaloids biosynthesis pathway.

CONCLUSIONS
In this study, we analyzed the active metabolites and the transcriptome of the Chinese goldthread from three biotopes, including Zhenping, Zunyi and Shizhu. Chemical analyses of the rhizome indicated that berberine is the most abundant, followed by coptisine, palmatine, and epiberberine sequentially for all the three accessions. As revealed by the concentrations of active alkaloids, the drug quality of the accession from Shizhu may be the highest among these accessions. We built the high-quality transcripts and unigenes for the Chinese goldthread, and more than 98.39% and 95.97% of core eukaryotic genes were found in the transcripts and unigenes respectively, indicating that this unigene set captures the majority of the coding genes. SNPs were called in the three accessions and the results revealed high polymorphism in coding regions of the Chinese goldthread (∼1%). Differential analyses of the transcriptome identified four pivotal candidate enzymes, including aspartate aminotransferaseprotein, polyphenol oxidase, primary-amine oxidase, and tyrosine decarboxylase, were significantly differentially expressed and may be responsible for the difference of alkaloids contents in the accessions from different biotopes. In total, our study promotes the understanding of the metabolic pathways of active alkaloids components and provides substantial genetic resources for further researches.