Molecular Cloning and Characterization of Two Key Enzymes involved in the Diterpenoid Biosynthesis Pathway of Isodon rubescens

Isodon rubescens, an important medical plant, contains various terpenoids. This plant’s active compounds are primarily oridonin with antitumor properties. As the precursor for oridonin biosynthesis, are synthesized by MEP pathway. On the basis of our earlier studies, we isolated and cloned two important genes catalyzing diterpenoid biosynthesis in the MEP pathway. 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase and 4-hydroxy-3methylbut-2-enyl diphosphate reductase are the fifth enzymes and the last step key enzyme for the methylerythritol phosphate (MEP) pathway, respectively, which is important for the regulation of isoprenoid biosynthesis. Sequence analysis revealed that DcIspF (accession no. KT948057) was 966 bp, contains a gene open reading frame (ORF) of 708 bp belonging to the MECDP-synthase superfamily and DcIspH (accession no KT948058) contains a 1389 bp ORF encoding a predicted 462 amino acid polypeptides as a member of the lytB_ispH superfamily. The deduced DcIspF and DcIspH amino acid sequences shared high similarity with DcIspF and DcIspH of other plant respectively, each of them exhibiting an N-terminal transit peptide and conserved amino acid sites. Quantitative real-time PCR analysis showed that the expression of DcIspF was considerably higher in leaves, the lowest in callus. These results indicate that we have identified functional DcIspF and DcIspH enzymes, which may play a pivotal role in the biosynthesis of diterpenoid in I. rubescens. Molecular Cloning and Characterization of Two Key Enzymes involved in the Diterpenoid Biosynthesis Pathway of Isodon rubescens


Introduction
I. rubescens, a plant rich in diterpenoid, a well-known traditional Chinese medicine also known as donglingcao, bingingcao, and poxuedan, is a small shrub of Lamiaceae family. This plant has been an important herb in China for the treatment of various diseases, such as pharyngitis, acute tonsillitis and chronic hepatitis [1]. In addition, this herb can relieve the symptoms and prolong the life of esophageal cancer, cardiac cancer, liver cancer, breast cancer, and rectal cancer [2]. These medicinal properties of I. rubescens have been correlated with the presence of diterpenoid which is present in the leaves and stems of I. rubescens, especially the leaves [3,4].
To date, numerous studies have shown that oridonin are the main active compounds in I. rubescens. It not only has obvious inhibitory effect on tumor cells, but also has obvious anti-mutation and antioxidant effect [5]. So, it has become a research hot spot for medical scholars at home and abroad. But numerous studies about it was focused on pharmacological activities [6,7]. Very limited information is available on the biosynthetic pathways of oridonin. Now the current studies regarding key enzymes of it biosynthesis in I. rubescens are few, and the production of oridonin still cannot be synthesized through biosynthesis methods. It is reported that the biosynthetic pathway of terpenoids is mainly from mevalonate (MVA) and methylerythritol phosphate (MEP) [8,9]. The MVA pathway is responsible for synthesizing sesquiterpenes, triterpenes, the monoterpenes, while diterpenes, tetraterpenes were mainly from MEP pathway. Similar to other higher plants, the I. rubescens synthesizes diterpenoids through the MEP pathway. So far, many findings have provided us with a relatively complete and convincing outline of MEP pathway. All enzymes involved in the MEP pathway have been identified in plants and bacterial, such as DXP synthase, DXP reductoisomerase (DXR or IspC), CDP-ME synthase (CMS or IspD), CMK or IspE, 2-Cmethyl-D-erythritol 2,4-cyclodiphosphate synthase (MCS or IspF), HMBPP synthase (HDS or IspG), HMBPP reductase (HDR or IspH), which produce the IPP and DMAPP universal basic blocks of isoprenoid compounds from the precursors pyruvate and D-glyceraldehyde 3-phosphate [10].
As MEP pathway plays important role in oridonin biosynthesis, a systemic analysis of the biosynthesis genes in I. rubescens was necessary. The availability of high-throughput transcriptomics has enabled a paradigm shift in pathway discovery and can be efficiently applied to nonmodel species [11,12]. In our previous study, we got a full understanding of the I. rubescens transcriptome, and found that many unigenes were annotated as genes encoding the terpenoid skeleton synthesis [13]. Based on theses, among the seven enzymic transformations expected to function in I. rubescens, DXP reductoisomerase (DXS), DXP reductoisomerase (DXR or IspC) which is transcribed both in I. rubescens leaves, stems, and roots have been cloned by us [14,15], The other five enzymes yet to be described in I. rubescens. Among the five enzymes of MEP pathway, IspF (MDS) is critical since it catalysis the first cyclization step to convert 4-(cytidine 5' diphospho)-2C-methyl-D-erythritol2-phosphate (CDP-ME2P) into 2C-methyl-D-erythritol 2,4-cyclodiphosphate (MECDP) [16]. 4-hydroxy-3-methylbut-2-enyl diphosphate reductase (HDR), also known as isoprenoid synthesis H (IspH), lysis-tolerant B (LytB), or HMBPP reductase, is key enzyme for the last step of the methylerythritol phosphate (MEP) pathway, catalyzing the reductive dehydroxylation of HMBPP to isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP). The overexpression and suppression of AaHDR, protein level in plastids differentially affect artemisinin and other terpenoid biosynthesis in Artemisia annua L [17]. However, fewer reports have showed effects of IspF and IspH expression manipulation on I. rubescens metabolisms. In this study, we report cloning and characterization of full-length cDNAs of IspF and IspH genes and the expression pattern of IspF and IspH in various tissues including roots, stems and leaves of I. rubescens was also studied. This is the first report about the isolation, characterization of IspF and IspH gene from the I. rubescens. Our work about cloning and identification of these genes will helps us know more about the biosynthesis pathway of diterpenoid especially oridonin in I. rubescens. Our research could provide important information for further regulation of oridonin biosynthesis in I. rubescens and lay the foundation for elucidating the biological mechanism of oridonin in I. rubescens.

Materials and methods
Plant materials: The plant was identified as I. rubescens, a perennial herb of Labiat, by Prof. S.Q. Chen. Samples were collected from the aseptic seedlings of I. rubescens and cultured in the plant tissue culture room at Henan University of traditional Chinese Medicine, China, at 25 ± 2°C with a 16-h light/8-h dark photoperiod provided by cool-white fluorescent lamps. Roots, stems and leaves were collected separately, immediately frozen in liquid nitrogen, and stored at -80°C until further analysis.
Cloning of full-length cDNA of DcIspF and DcIspH: Total RNA was extracted from the leaves of I. rubescens using the RNeasy Plant Mini Kit (Kangwei, Beijing, China) following the manufacturer's procedure. After the RNA integrity was examined by 1% agarose gel electrophoresis, the first strand of cDNA was synthesized by reverse transcription using the cDNA first strand reverse transcription Kit (Fermentas) according to the instructions. Primers of the DcIspF and DcIspH were designed and synthesized based on the transcriptome sequence of I. rubescens. Premier 5.0 software was used to design specific primers to amplify the two sequences: DcIspF, Primer_F (GCAAGTTCAAGTTAGCAGATTGG), Primer_R (CAGTCACTTCCAAACCAGCA); DcIspH, Primer_F (AGCGTCGC TTGAGTAGACTTC); PCR was conducted in a total volume of 25 µL Primer_R (CAAATGCAATATTTCACCATGA); PCR was conducted in a total volume of 50 µL containing 5 µL Reaction buffer, 3 µL Mg 2+ , 4 µLdNTP, 3 µL cDNA, 2 µL-F, 2 µL-R, 0.25 µL Taq DNA Polymerase and 10.75 µL ddH20, performed as follows: 95°C for 5 min; 35 cycles at 94°C for 45 s, 55°C for 1min, and 72°C for 1.5 min with a final extension step of 72°C for 7 min. PCR products were TA cloned into the pMD19TVector (Takara, Japan), transformed into Escherichia coli DH5α and then sequenced.
Bioinformatic analysis: Amino-acid sequence alignment was performed using MUSCLE (Multiple Sequence Caparison by Log-Expectation) in Jalview [18]. The neighbor joining trees and bootstrapping analysis were calculated using the Molecular Evolutionary Genetics Analysis (MEGA) 6.0 program. Homologues of genes were identified using the BLAST at the Nation Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/Blast), and the open reading frame (ORF) was analyzed. The protein sequence of the gene sequence was deduced. Expasy software was used to analyze the physical and chemical properties. Phosphorylation sites were predicted by NetPhos 2.0. Secondary structure prediction was performed with SOPMA. Homology based structural modeling and the three-dimensional (3-D) modeling was accomplished by Swiss-Model at their website.
Expression analysis and quantitative real-time PCR: Quantitative real-time (qRT) -PCR was used to investigate the transcript accumulation levels in various tissues. Using GAPDH as a reference, the quantitative RT-PCR reaction contained 10 µL 2 × SYBR Green PCR Master Mix (QIAGEN), 2 µL reverse transcription reaction product, 0.4 µL of 10 μmol/L forward and reverse primers, and water to 20 µL. The reaction was performed in a CFX96™ Touch Real-Time PCR Detection System (BioRad, CA, USA), and the reaction conditions were 95°C for 5 min, and then 40 cycles of 95°C for 10 seconds, 60°C for 30 s and 72°C for 30 s. The primers were described in Table 1. Three replicates of each cDNA sample were performed for quantitative RT-PCR analysis.

Molecular cloning of the full-length cDNA of DcIspF
Sequence analysis showed that the full-length cDNA (from the ATG initiation codon to the TGA stop codon) which were designated DcIspF, was 966 bp (accession no. KT948057), and contains a gene open reading frame (ORF) of 708 bp located in the 191-898 bp area encoding a protein of 235 amino acids with a molecular weight 24.8 kDa and a theoretical isoelectric point of 8.36. These parameters are very similar to other plants reported earlier [19,20]. The instability index of DcIspF was 40.97, indicating that it was an unstable protein. The ProtParam software was used to predict the total average hydrophilicity of DcIspF protein, which was 0.061, so it can be concluded that the DcIspF protein is a hydrophobic protein (Table 2). TMHMM Server.v.2.0 was used to predict the transmembrane domain of the DcIspF protein. The results showed that there was no transmembrane domain. In addition, Analysis using the computer program Signal P indicated that there was no signal peptide in DcIspF protein sequence. Protein subcellular localization typically reflects its biological function. To explore the subcellular location of DcIspF protein, we performed subcellular localization prediction using ProtComp9.0 (http://www.softberry.com/berry.phtml?topic=protcomppl&group=pro grams&subgroup=proloc), which indicated DcIspF protein localized in the chloroplast, which is consistent with the cellular localization of the MEP pathway.
Prediction of phosphorylation sites of DcIspF protein of I. rubescens is beneficial to further understanding its function. Potential phosphorylation sites using the NetPhos2.0 were also investigated. The results showed that 12 phosphorylation sites were detected on the DcIspF protein. 8,9,10,21,74,113,145 and 178 phosphorylation sites for serine; 4,20 and 27 phosphorylation sites for threonine, and 128 phosphorylation sites for tyrosine were found.  InterProScan software predicted that the encoded protein was DcIspF with one conservative functional domain (IPR003526) from position 38 to 195, named as 2-C-methyl erythritol -2,4-pyrophosphate synthetase, which is related to terpenoid biosynthesis. The presence and the length of putative domains were confirmed by the Superfamily protein database as well. The results of sequence alignment and BLASTP search in GenBank database (http://www.ncbi.nih.gov) showed DcIspF had high identity and similarity with other IspF proteins of plants, sharing 89% identity with Salvia miltiorrhiza (AEZ55667.1; AFQ95411.1) and Arabidopsis thaliana (NP_850971.1, 87% identity with Erythranthe guttata (XP_012854324.1), 78% identity with Nicotiana sylvestris (XP_009758081.1), Nicotiana tomentosiformis (XP_009590055.1), and Nicotiana tabacum (XP_016512172.1). Moreover, DcIspF also exhibited 68-77% identity with other plants. It only presented a 32-49% identity with microbial IspF protein, which indicated that the protein sequence is conserved in plants.
As shown in Figure 1, conserved active site residues D86, H88 and H134 which putatively contribute to coordinate tetrahedrally with a Zn 2+ ion was also conserved in I. rubescens, in addition, the conserved residues E234 (Mg 2+ or Mn 2+ -binding site) and R236, H97, H241 which conforms the top of the cavity, are highly conserved in plants including the I. rubescens [21]. D230 appears to be a highly conserved feature among the plant MCS (IspF) enzymes was also find in I. rubescens, H226, S231 were also conserved in I. rubescens. Both of them to be critical for IspF of I. rubescens which may play important roles highly conserved regions were detected in the active site residues. These findings suggest that DcIspF shares the same active site motif with A. thaliana. means that the cloned DcIspF from I. rubescens is a new member of IspF gene family in higher plants. In addtiton, Sequence analysis also indicated the amino acid sequence of Escherichia coli is more than 76 amino acid residues in N-terminal compared to I. rubescens. So, it can be speculating the transit peptide structure existed in this region. Chloro P program predicted that the deduced DcIspF has the N-terminal chloroplast transit peptide consisting of 55 residues, similarly to Ginkgo biloba (GbMECS) [19].
To evaluate the evolutionary relationship of DcIspF, a phylogenetic tree was constructed according to the deduced amino acid sequences of DcIspF and other IspF from different organisms ( Figure 2). The tree clearly showed that the IspF from Plasmodium falciparum 3D7 as a different branch from the branches of plants, bacteria, Microcystis aeruginosa, and Thermotoga maritima MSB8. All the IspF selected from the plants clustered together, DcIspF exhibited the highest homology with IspF from Salvia miltiorrhiza, which is involved in terpenoid biosynthesis in Salvia miltiorrhiza [20].
The second-level structure of DcIspF was analyzed in line with SOPMA [22] (Figure 3a). The results showed that the DcIspF peptide contains 29.79% Alpha helix, 15.32% extended strand, 10.64% beta turn, and 44.26% random coils. The random coil and Alpha helix are the most abundant structural elements in the DcIspF protein of I. rubescens, which has high similarity with the known IspF or MECS structures. The three-dimensional protein structures of DcIspF and were modeled using Swiss-Model (2pmp.1.A, Seqidentity: 91.19%) (Figure 3b).

Molecular cloning of the full-length cDNA of DcIspH
Specific primers were designed according to the sequence of the IspF gene in the transcriptome database of I. rubescens, named as DcIspH. Using the RT-PCR technique, we also isolated a DcIspH gene cDNA (1445 bp, from the ATG initiation codon to the TAA stop codon; accession no KT948058), containing a 1389 bp ORF located in the 28-1416 bp, encoding a predicted 462 amino acid polypeptide with a calculated molecular mass of 52.1 kDa and an isoelectric point of 5.82, similar to the HDS protein from Tripterygium wilfordii [23]. The instability coefficient was 37.36, indicating it was a stable protein.
The total average hydrophilicity was -0.455, which suggested that the DcIspH protein was a hydrophilic protein (Table 1).
The results showed the DcIspH protein has no transmembrane structure and no signal peptide was detected in DcIspH protein sequence using the online Signal P programme, which indicated that it was not a secreted protein. Potential phosphorylation suggesting sites for amino acid residues was also predicted using the NetPhos2.0 with a defined threshold of 0. NCBI predicted that the encoded protein of DcIspH was a multi domain protein belonging to the lytB_ispH (4-hydroxy-3-methylbut-2-enyl diphosphate reductase) family. The protein family has an iron sulfur cluster structure.
In the multiple sequence alignment of DcIspH proteins from other species, we found all of the plant IspH or HDRs all contain an extra N-terminal conserved domain (NCD) compared with the bacterial IspH, that is essential for their function [24,25] (Figure 4). Beyond the NCD, DcIspH had an extended N-terminal sequence with 110-112 lengths, which suggests that there may be a transport peptide sequence in this region, and it may serve as transit peptides to target I. rubescens DcIspH to the chloroplast. Chloro P 1.1 Server revealed the DcIspH protein transporter peptide with a length of 83 amino acid residues. We also find three conserved cysteine residues (C118, C 209, C346) in DcIspH, which are involved in iron-sulfur cluster formation. It is likely that the I. rubescens DcIspH is also an iron-sulfur protein. Similar to the reaction mechanism of A. thaliana and E. coli IspH [26,27]. The tyrosine-68, G105 residue in the amino-terminal which was critical or important for Arabidopsis HDR was also conserved in I. rubescens. The conserved residue E239, T308, S375, H148, H237, E238, T240, N374 are related to substrate binding or catalysis was also found in I. rubescens (Figure 4).
The phylogenetic tree was constructed according to the deduced amino acid sequences of DcIspH and other IspH or HDRs from different organisms. As Figure 5 showed, all the IspHs or HDRs selected from the plants clustered together, and the IspHs from bacteria Escherichia coli str. K-12 substr. MG1655, Shewanella oneidensis MR-1 and Deinococcus radiodurans clustered as a different branch. The DcIspH close to IspH or HDRs from Lavandula angustifolia and Salvia miltiorrhiza.   The NCD among the plants and bacteria is indicated at the top of the alignment. Triangles indicate the critical Cys residues that are involved in iron-sulfur cluster formation. Round dots indicate the conserved amino acids near the substrate-binding site.
The second-level structure of DcIspH were also analyzed in line with SOPMA [22]. The analysis of the DcIspH protein showed 48.05% alpha helix,14.94% extended strand, 11.04% beta turn, and 25.97% random coil (Figure 6a). In the DcIspH peptide, alpha helices and random coils are highly abundant along the secondary structures of the peptide while extended strands and beta turns are intermittently distributed in the protein. The three-dimensional protein structures of DcIspH were modeled using Swiss-Model (template: 3dnf.1.A, Seq identity: 31.25%%) (http://swissmodel.expasy.org/workspace/). The result showed that the DcIspH protein showed an itrefoil-like structure, similar to most plant IspH (HDR) proteins (Figure 6b).

Analysis of gene expression in different tissues
We measured the transcript accumulation of the DcIspF and DcIspH by real-time PCR in various organs (roots, stems, leaves, and callus). As shown in Figure 7 the expression of the genes differs in all tissues examined. The transcript of DcIspF was most abundant in leaves, which was 1.24-fold, 1.82-fold, and 7.75-fold higher in comparison to roots, stems, and callus, respectively. However, the highest transcript level of DcIspH was found in the roots. The second highest transcript      level was found in the leaves, which was 200-fold and 192-fold higher in comparison to callus, respectively. It is interesting that both the genes encoding DcIspF and DcIspH differ in their expression pattern in different tissues, and they were least expressed in callus.

Discussion
Oridonin, an important member of diterpene of I. rubescens, were used as indicators for the quality of this medicinal plant according to Chinese National Pharmacopeia [28]. Many studies have demonstrated that oridonin plays an important role in anti-tumor tumor of I. rubescens. Though oridonin has been extensively studied, very little information is available on genes encoding for the enzymes of the pathway involved in biosynthesis of it. The research on the structure and function of key enzyme in synthetic pathway is the main method to regulate the synthesis of the oridonin. As the precursor for oridonin biosynthesis, are synthesized by MEP pathway. DXS, the first enzyme of this pathway, DXR is the first committed step for terpenoid biosynthesis, 2-C-methyl-D-erythritol 2, 4-cyclodiphosphate synthase (IspF), the fifth enzyme, IspH or HDR, the last enzyme considered to be the rate limiting enzyme in the MEP pathway. Several studies support that each of them has an important role for terpene biosynthesis [29][30][31][32]. In our previous study, we report the DXS and DXR from I. rubescens. In present study, we have cloned cDNA encoding IspF and IspH from I. rubescens.
2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase is the fifth enzyme in the MEP pathway, which catalytic CDP-MEP generates ME-Cpp [33]. The bioinformatics analysis results showed that the DcIspF protein has a typical plant IspF motif and domain, meaning that the cloned IspF is a new member of IspF gene family in higher plants.
The structure and mechanisms of Arabidopsis thaliana IspF have been characterized [25]. The amino acid sequence alignment of DcIspF showed highly conserved residues such as D86, H88, and H134 and other conserved active sites. These results suggest that the I. rubescens IspF proteins may have similar enzymatic mechanisms in catalyzing the formation of ME-cPP.  As the last enzyme in the MEP pathway for isoprenoid biosynthesis, IspH or HDR protein is considered to be the rate limiting enzyme in the MEP pathway and plays a key role in the regulation of the MEP pathway [34]. Many studies showed that most of the plant IspH (HDR) contains a chloroplast transit peptide. In our study, we found that the DcIspH (HDR) gene is located in chloroplasts with a length of 83 amino acid residue transit peptide in the N-terminus, which was important for the protein transport from the cytoplasm to the chloroplast. This is consistent with the localization of IspH (HDR) proteins in other plants and is also consistent with the cellular localization of the MEP pathway, which suggests that the DcIspH protein is involved in the terpenoid synthesis of the MEP pathway in I. rubescens. Previous studies indicated all of the plant IspH contains an extra N-terminal conserved domain (NCD), was also conserved in I. rubescens, which is highly conserved in plants and may protect the enzyme from a high concentration of oxygen during photosynthesis [24]. In addition, many amino acid residues observed to be critical for E. coli and Arabidopsis thaliana, such as three conserved cysteine residues that may be involved in substrate binding were also conserved in I. rubescens. These results indicate that DcIspH might participate in the coordination of the iron-sulfur bridge and encode an active IspH (HDR) enzyme, with a similar enzymatic mechanism to the biosynthesis of IPP and DMAPP.
In our study, Expression analysis in different tissues showed higher level of DcIspF and DcIspH transcripts in leaf, which is reflected with the reported higher rates of oridonin biosynthesis in leaves than in stem and callus [35,36]. Our previous study showed that oridonin accumulation and chlorophyll content showed a significant positive correlation [36]. So, it can be inferred that the biosynthesis of oridonin was higher in leaf may be due to the products of photosynthesis, and complete utilization of reaction products for the synthesis of isoprenoids in addition to high transcription levels. While both of the DcIspF and DcIspH transcription showed higher level in roots, may be because their absence in chloroplast resulting in a lack of photosynthetic products, which leads to no accumulation of oridonin. Lower oridonin accumulation may be due to low but detectable expression of DcIspF and DcIspH transcripts and chlorophyll content in callus. Similar observations for expression of other genes in MEP path way have been observed by us [14,15]. This indicates that callus may be the relatively less active tissue for isoprenoid biosynthesis utilizing substrate from MEP pathway, but leaves are the prime site for oridonin.
As we have little knowledge regarding the gene function and the regulation of the IspF and IspH gene or its enzyme activity in I. rubescens, the diterpenoid biosynthesis pathway of I. rubescens is still unknown and the transgenic regeneration system of I. rubescens remains unsolved, further studies are needed to construct gene over expression or knockout transgenic lines to study the diterpenoid synthesis effect, which may provide insights into the production of oridonin in I. rubescens.

Conclusion
In our study, two important genes, DcIspF and DcIspH, involved in isoprenoid biosynthesis from I. rubescens, have been characterized for the first time. Alignment of multiple sequences and construction of phylogenetic trees determined the classification of these two proteins. These findings might shed light on the mechanism of oridonin biosynthesis in I. rubescens and provide molecular wealth for biotechnological improvement of this medicinal plant.