The santalene synthase from Cinnamomum camphora: Reconstruction of a sesquiterpene synthase from a monoterpene synthase

Plant terpene synthases (TPSs) can mediate formation of a large variety of terpenes


Introduction
Plant species have highly specific chemical profiles, which are often determined by the presence of different terpenes and other secondary metabolites.These profiles serve the plant to function in its ecological niche, but have also been widely employed in human applications, including pharmaceuticals, nutraceuticals, food and cosmetics [1].In some cases, such applications lead to a high demand for the natural source of the metabolites and put increasing pressure on the conservation of the plant species from which they are derived [2][3][4].An example of such an endangered species, which has been over-exploited for the extraction of their essential oils, is the Sandalwood tree.Its heartwood oil is predominantly composed of sesquiterpenes with desirable odour characteristics, and has been used for fragrances and perfumes.
have different pathways responsible for the production of isoprene units.
In the cytosol of plants, the mevalonate (MVA) pathway is active, while the 2-C-methyl-D-erythriol-4-phosphate (MEP) pathway operates in plant plastids [5][6][7].Both these pathways produce isopentenyl diphosphate (IDP) and dimethylallyl diphosphate (DMADP), which are further condensed into allylic diphosphate substrates such as geranyl diphosphate (GDP) and farnesyl diphosphate (FDP) [8,9].Subsequently, terpene synthases (TPSs) convert these substrates to monoterpenes or sesquiterpenes, respectively.Precursors for terpenes such as IDP, DMADP, FDP and GDP can be efficiently produced in industrial microorganisms, using similar pathways [10].Formation of terpenoids from these precursors can also be achieved in microbes, and relies on TPSs, often derived from plants [11,12].A wide variety of synthases have been described for mono-and sesquiterpenes [6,13,14].Plant genomes often encode 10-50 different synthases, which determine the terpene profile in the essential oil of the species, and often different chemotypes arise by diversification of terpene synthases.However, it is still challenging to predict the specific terpene produced by a synthase, the roles of the residues present in the catalytic pocket of the synthase and their involvement in determining the product profile.Overall sequence identity is higher between enzymes with different product specificity, belonging to the same species, compared to enzymes from different species producing the same compound A better understanding of the relationship between individual residues the primary sequence of a TPS and its product profile will facilitate a prediction of uncharacterized synthases [13].
In this study, we set out to isolate a santalene synthase.Sandalwood oil has high value for perfumery and is traditionally extracted by steam distillation of Santalum album trees older than 15 years [15].S. album has been listed as a vulnerable species in the IUCN Red list of Threatened species [16] and the use of this tree for sandalwood oil extraction has been strictly regulated.The four main compounds present in sandalwood oil are α-, β-, and epi-β-santalol and α-bergamotol, which are the hydroxylated analogues of α-, β-, and epi-β-santalene and α-bergamotene respectively.Hydroxylation of these compounds to their alcohols is mediated by cytochrome P450s [17].The TPSs responsible for the production of santalenes have been identified in two Santalum species: S. album and in S. spicatum [18][19][20].In the current work we isolate a santalene synthase from a completely unrelated tree, Cinnamomum camphora.The essential oil of one of C. camphora chemotypes has been observed to contain santalenes.One of the chemotypes of this species also produces santalenes in its essential oil [39].The gene encoding the santalene synthase was isolated together with a highly related gene encoding a monoterpene synthase.These two enzymes, displaying different substrate and product specificity despite their high sequence identity, were used to study the role of individual residues in determining substrate and product specificity.
We demonstrate how few residue positions are responsible for substrate specificity, allowing a monoTPS to acquire sesquiTPS activity, without losing its original function.With the results, new insights on functional residues were obtained, contributing to the larger framework of TPSs substrate and product specificity prediction.

Identification of santalenes in C. camphora
A C. camphora plant of the cineole type was purchased from Planfor (France).Leaves, stems and roots were dissected, and 0.5 g of plant material was weighted in a pre-cooled glass tube and suspended in 2 mL dichloromethane.The samples were vortexed for 1 min, sonicated for 5 min in an ultrasonic bath and centrifuged at 1500 g at room temperature to separate the plant material from the supernatant.1 g Na 2 SO 4 columns were used to dry the obtained supernatant.About 2 μL was analysed by gas chromatography mass spectrometry (GC-MS) as previously described [21,22].All compounds were identified using the mass spectra deposited in the NIST library and confirmed using their retention index, or by comparison to an original standard, when available (Fig. 1).Santalenes were further confirmed by comparison of retention times and mass spectra of a sandalwood oil standard (Merck, Germany).

RNA extraction from root tissue
To extract RNA from the root material of C. camphora, an extraction buffer was prepared (2% hexadecyl-trimethylammonium bromide.2% polyvinylpyrrolidinone K 30, 100 mM Tris-HCl (pH 8.0), 25 mM EDTA, 2.0 M NaCl, 0.5 g/L spermidine and 2% β-mercaptoethanol).3 g of ground tissue was mixed to 15 mL of pre-warmed (65 • C) extraction buffer.The mixture was extracted twice with an equal volume of chloroform:isoamylalcohol (1:24), and ¼ volume of 10 M LiCl was mixed to the supernatant.The RNA was precipitated overnight at 4 • C and harvested by centrifugation at 10000g for 20 min.The pellet was dissolved in 500 μL of STE buffer [1.0 M NaCl, 0.5% SDS, 10 mM Tris-HCl (pH 8.0), 1 mM EDTA (pH 8.0)] and extracted once with an equal volume of chloroform: isoamylalcohol.Two volumes of ethanol were added to the supernatant, incubated for at least 2 h at − 20 • C, centrifuged at 13000 rpm and the supernatant removed.The pellet was air-dried and resuspended in water.Total RNA (60 μg) was shipped to Vertis Biotechnology AG (Freising, Germany).PolyA + RNA was isolated, and cDNA was synthesized using a randomized N6 adapter primer and M-MLV Hreverse transcriptase (Sigma).TPS sequences were identified by a cDNA sequencing approach, as described in detail by Beekwilder et al. ( 2014) [14].

Isolation of CiCaMS and CiCaSSy
Full length open reading frames of putative TPSs were amplified from the cDNA of C. camphora.Specific primers (CCH_TS23_fw and CCH_TS23_re, see Table S1) were designed to amplify total ORFs with a 6-His tag fused at the N-terminus in the plasmid pCDF-Duet1 (Novagen, Merck Chemicals B.V., Amsterdam, the Netherlands).Two variants, namely pCDF-CiCaMS or pCDF-CiCaSSy, were cloned using the same primer pair using BamHI and NotI restriction enzymes.Amplification of cDNA ends (5' RACE; Clontech cDNA RACE kit) experiments were performed to isolate longer versions of both genes, but no cDNA ends with more upstream start codons could be identified.Sequences were deposited in Genbank under accession numbers MN756611 (CiCaMS) and MN756612 (CiCaSSy).

Cloning of CiCaMS/CiCaSSy hybrids and single mutants
To obtain the hybrid proteins of the parental enzymes CiCaMS and CiCaSSy, a library of fragments was designed.When possible, separate fragments were amplified using pCDF-CiCaMS or pCDF-CiCaSSy as template (minimum fragment length was 150bp).Each region was designed to contain 3-6 amino acid substitutions.In Fig. 3A a schematic representation of the design is shown.Table S2 reports the fragment composition of each hybrid, while the primer used for amplification and sequencing are listed in Table S1A.All primers were supplied by IDT (Leuven, Belgium).For fragment amplification, including the vector backbone (derived from Novagen commercial plasmid pACYC-Duet1), Q5 High Fidelity polymerase by NEB was used, following the protocol provided by the supplier.For hybrids assembly, the Circular Polymerase Extension Cloning (CPEC) method was used [23].For fragments smaller than 150bp and for single mutants, the QuickChange site-directed mutagenesis protocol described by Xia et al. [24] was adapted for the use of Q5 HiFi polymerase, as described in the NEB protocol (Q5 site-directed mutagenesis kit protocol, E0554).All single mutants are listed in Table S3, including the primers used.Before transformation in DH5a, the products obtained from QuickChange were digested with DpNI to eliminate the traces of template.All constructs were confirmed by sequencing from Macrogen.All plasmids used in this study are listed in Table S1B.

Heterologous expression of CiCaMS and CiCaSSy in E. coli BL21DE3
To analyse the sesquiterpene product profile of the enzymes, an E. coli expression strain BL21DE3 containing an additional plasmid expressing all genes necessary for the synthesis of FDP (pBbA5c-MevT-MBIS-NPtll) was used.This plasmid is a variant of plasmid pBbA5c-MevT(CO)-MBIS(CO, IspA) [10,25] in which the chloramphenicol resistance marker has been exchanged for a kanamycin resistance marker (Nptll).Another variant, with a different origin of replication (colE1) was also used in the experiments (pBbE5k-MevT(CO)-MBIS (CO)).Fermentations were performed using 20 mL of 2xYT medium (16 g/L tryptone, 10 g/L NaCl, 10 g/L yeast extract) in 100 mL glass flasks.Overnight cultures were diluted to an OD 600 of 0.150 and incubated at 37 • C 250 rpm until an OD 600 of 0.4-0.5 was reached, then IPTG 1 mM and 2 mL dodecane were added, followed by 24 h incubation at 28 • C 250 rmp.A concentration of 50 μg/mL kanamycin and 50 μg/mL chloramphenicol was used to maintain the plasmids in the system.The 2 mL dodecane was then recovered for GCMS analysis by centrifugation at 3600 rpm for 15 min.For the GC-MS analysis, 20-80 mg dodecane were weighted and diluted in 2 mL ethyl acetate.This solution was dried over a Na 2 SO 4 column before analysis.
To confirm the results obtained with the fermentation analysis and to assess the monoterpene activity of the enzymes, in vitro enzyme assays were performed.The BL21DE3 E. coli expression strain was used for protein production.Overnight cultures were diluted to an OD 600 of 0.150 in 20 mL 2xYT and incubated for at 37 • C 250 rpm until an OD 600 of 0.6-0.8 was reached.A concentration of 1 mM IPTG was added and the cultures were grown at 18 • C 250 rpm overnight.Cells were then harvested by centrifugation (10 min, 3600 rpm), medium was removed, and cells were resuspended in 1 mL Resuspension buffer (50 mM Tris-HCl pH = 7.5, 1.4 mM β-mercaptoethanol; 4 • C).Cells were disrupted by shaking 2 times for 10 s with 0.2 g zirconium sand in a Fastprep machine at speed 6.5.Insoluble particles were subsequently removed by centrifugation (10 min 13000 rpm, 4 • C).Soluble protein was immediately used for enzyme assays or stored in a 12.5% glycerol solution.
For enzyme assays, both farnesyl diphosphate and geranyl diphosphate (10 mM, Sigma FDP dry-evaporated and dissolved in 50% ethanol) were used as substrates.In a glass tube, a mix was made of 800 μL of MOPSO buffer (15 mM MOPSO (3-[N-morpholino]-2-hydroxypropane sulphonic acid) pH = 7.0, 12.5% glycerol, 1 mM MgC1 2 , 0.1% tween 20, 1 mM ascorbic acid, 1 mM dithiothreitol).100 μL of crude enzyme extract and 5 μL of FDP or GDP and 20 μL Na-orthovanadate 250 mM. 1 mL pentane was added to the mix to extract the terpenes.This mix was incubated at 30 • C with mild agitation for 2-4 h.Subsequently, the mix was centrifuged at 1200 g to recover the pentane, which was dried over a Na 2 SO 4 column and analysed by GC-MS.

GCMS analysis
The GC-MS analysis was performed on an Agilent Technologies system, comprising a 7980A GC system, a 597C inert MSD detector (70 eV), a 7683 auto-sampler and injector and a Phenomenex Zebron ZB-5ms column of 30 m length x 0.25 mm internal diameter and 0.25 μιη stationary phase, with a Guardian precolumn (5 m).In this system, 1 μL of the sample was injected.The injection chamber was at 250 • C, the injection was splitless, and the ZB5 column was maintained at 45 • C for 2 min after which a gradient of 10 • C per minute was started, until 300 • C. Peaks were detected in chromatograms of the total ion count.Compounds were identified by their retention index and by their mass spectrum in combination with comparison of the mass spectrum to libraries (NIST and in-house).The data obtained with the GCMS analysis were processed in order to obtain the average relative concentration of products produced, with the summed area of all peaks representing 100%, and the function "st.dev.s." of Excel was used to calculate the standard deviation.For concentrations, dodecane samples from three independent cultures were diluted 1:10 in acetone, and analysed by GCMS, using split 10 injection.Concentrations were calculated by comparison of peak areas of selected ions (m/z 69, 93, 94, 119, 122, 204) in samples and in standard curves of santalene oil ingredients (kindly provided by Celina Vossen).For total sesquiterpene concentrations, values for α santalene, β santalene and trans α bergamotene were added.

Expression of CiCaMS and CiCaSSy in plants
Transient expression in Nicotiana benthamiana was employed to determine the product spectrum of the enzymes in a plant heterologous system.CiCaSSy and CiCaMS coding regions were amplified from the pACYC-constructs, using primers CCHattB1-FW and CCHattb2-RE (Table S1).The genes were then cloned using Gateway technique [26], into pBINplus [27].pBINplus was taken along as a negative control.The obtained plasmids were confirmed by sequencing and transformed to Agrobacterium tumefaciens AGL0 via electroporation.
For agroinfiltration, the transformed A. tumefaciens AGL0 cultures were grown overnight in LB medium at 28  GC-MS analysis of pentane extract of Cinnamomum camphora root tissue.The compounds were identified by their mass spectra using the NIST library and confirmed comparing their retention indexes with the reference list provided by Adams [44].The y-axis reports the GC-MS response units, while the x-axis reports the retention times.
The compounds identified for each peak are listed.acetosyringone and grew again for 16 h 28 • C, 300 rpm.The cultures were then centrifuged at 4000 rpm for 10 min, and the pellets were resuspended in a 10 mM MgCl 2 solution, at a final OD 600 of 1.5.Acetosyringone was added at a final concentration of 200 μM.The suspension was left at room temperature with no shaking for at least 3 h before performing the agroinfiltration.Young leaves from 3 to 4 weeks old N. benthamiana were selected for agroinfiltration.The experiment was performed using biological and technical replicates.Each leaf was infiltrated with about 1 mL A. tumefaciens suspension.
Trapping of headspace volatiles was performed as described [22] with following modifications: headspace sampling was performed in a climate room (20 ± 2 • C, 56% RH) with LED lighting (adjusted at 100% white, 10% deep red, 100% far red and 5% blue light).Volatiles were trapped by sucking air out of the jar at a rate of 100 mln/min (inlet flow at 150 mln min-1) for 4 h.
Trapped headspace volatiles were analysed using a Thermo TraceGC Ultra connected to a Thermo TraceDSQ quadrupole mass spectrometer (Thermo Fisher Scientific, Waltham, USA).Settings as described [21], with the following modification: volatiles were injected on the analytical column at split ratio 300.Products were identified using original standards (myrcene standard and sandalwood oil, Sigma, Amsterdam, Netherlands), according to their retention time and mass spectra.

Modelling of a 3D structure for CiCaSSy
A homology model of CiCaSSy was created using multi-template modelling.The templates used were the Hyoscyamus muticus premnaspirodiene synthase (PDB ID: 5JO7), Mentha spicata limonene synthase (PDB ID: 2ONH), and Citrus sinensis limonene synthase (PDB ID: 5UV2); these were selected based on their high sequence similarity with the two Cinnamonum synthases.MODELLER [29] was used to create 500 models using the default automodel approach, and the model with the best N-DOPE score was chosen for further analysis.Furthermore, the position of an analogue of FDP, trifluorofarnesyl diphosphate (FFF) in the model was obtained by superposing the 5-epi-aristolochene synthase from Nicotiana tabacum (PBD ID: 5EAU), using the align command of PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.Residues differing between the two Cinnamonum synthases were visualized using PyMOL.To address the quality of the model, the same command was also used to align the crystal structure of Santalum album santalene synthase (SaSSy, PDB ID:5ZZJ) to the modelled CiCaSSy.The structure alignment of CiCaSSy with SaSSy reveals a high structure similarity between the two enzymes, despite their low sequence identity.The CiCaSSy model can be superposed to SaSSy with a root-mean-square deviation (RMSD) of 1.09 Å over 420 residues.

Phylogenetic tree construction
A phylogenetic tree was constructed to identify the TPS subfamily in which CiCaMS and CiCaSSy belong.268 sesquiTPSs from our previously assembled database [13], along with 59 monoTPSs, and 27 diTPSs from SwissPROT [30], were clustered into groups of up to 70% sequence identity using CD-HIT [31].Representative sequences were taken from each cluster and aligned along with CiCaMS and CiCaSSy using Clustal Omega [32] with the Pfam [33] domains Terpene synth (Pfam ID: PF01397) and Terpene_synth_C (Pfam ID: PF03936) as guides for the alignment.The alignment was preprocessed with trimAl [34] such that columns with over 50% gaps were discarded.The tree was constructed using the ETE3 python library [35] with the pmodeltest-ultrafast and RAxML [36] options, and visualized using iTOL [37].The TPS subfamily assignment was done as in Chen et al. [38].The phylogenetic tree is represented in Fig. S4.

Isolation and characterization of CiCaMS and CiCaSSy
The cineole chemotype of C. camphora has been observed to contain santalenes [39,40], while other chemotypes (camphor, linalool) have not been reported to contain santalenes [41][42][43].Extracts from different parts of a plant from the cineole chemotype were analysed by GC-MS.Roots, leaves and stem of C. camphora appeared to contain compounds that correspond to α-santalene, trans-α-bergamotene, and β-santalene, among other compounds, as shown in Fig. 1.The concentration of santalenes was highest in the roots, therefore this tissue was further selected for RNA extraction and cDNA sequencing.
Among the root cDNAs found to correspond to TPSs, one sequence was identified as a putative santalene synthase.When this synthaseencoding sequence was amplified from root cDNA, two different sequences were cloned, which were 97% identical at the DNA level, and encoded proteins which differed in 22 out of 553 amino acids (95% identity).
The enzyme activity of both variants was investigated in vitro, by testing product formation using GDP or FDP as substrate.One variant converted FDP to santalenes and was referred to as Cinnamomum camphora Santalene Synthase (CiCaSSy).Products were identified as α-santalene, trans-α-bergamotene, and β-santalene, by their mass spectra and by comparison to a sandalwood oil standard.The other variant produced a monoterpene when GDP was used as a substrate and was referred to as CiCaMS.The product of CiCaMS was identified as myrcene, by comparison to a myrcene standard and using the retention time and the retention index [44](Fig.2).
The product profile of the two synthases was confirmed by overexpression in a plant system.For expression in N. benthamiana, both CiCaSSy and CiCaMS full length coding regions were cloned into binary vectors and the effect of their transient expression on the headspace of N. benthamiana was investigated (Fig. S1).Expression of CiCaSSy led to the presence of α-santalene, trans-α-bergamotene, and β-santalene in the headspace, while no monoterpene formation was observed (Fig. S2E).Transient expression of CiCaMS led to emission of myrcene (Fig. S2C), while no sesquiterpene formation could be observed.Standards were used for identification of the compounds (Fig. S2B/D) Therefore, CiCaSSy was identified as a santalene synthase, while CiCaMS was identified as a myrcene synthase.
Thus, it appeared that CiCaSSy was most closely related to monoTPSs, rather than sesquiTPSs.An alignment of 175 synthases was generated to construct a phylogenetic tree (Fig. S4).By following the clades proposed by Chen [38] we identified CiCaMS and CiCaSSy to belong to the TPS-g clade.

Activity of CiCaMS/CiCaSSy hybrids for production of sesquiterpenes in E. coli
Mono-and sesquiTPSs in plants have a common structural fold.They consist of two domains, an N-terminal domain, which is not part of the catalytic site, and a C-terminal domain forming the hydrophobic active site cavity [12].The C-terminal domain region contains two Mg 2+ coordination sites at its opening (labelled DDxxD and NSE/DTE in Fig. 3B) and a loop containing the diphosphate binding site for the substrate (labelled RxR in Fig. 3B).A structure model for CiCaSSy was generated using existing TPSs crystal structures as templates.Fig. 3 highlights the regions which display variation between CiCaSSy and CiCaMS.The genes encoding CiCaMS and CiCaSSy were divided into six regions based on the position of the substitutions in the linear protein sequence.
Regions R1, R2 and R3 are located in the N-terminal domain of the protein.In our 3D model of CiCaSSy (Fig. 3B), these three regions are coloured green, purple, and light blue respectively.The residue substitutions in these regions are distant from the substrate binding cavity, and are therefore unlikely to affect product formation.Regions R4 and R5, and to a lesser extent R6 are close to the active site.
This information was used to investigate which regions of CiCaSSy and CiCaMS play a role in substrate-and/or product specificity.Hybrids exchanging one or two regions between both enzymes were generated (Fig. 3A).Initially, hybrids were tested by in vitro enzyme assays, using cell-free extracts, and FDP or GDP as a substrate (Figs.S6A-B, Fig. 2C).Wild type CiCaMS hardly produces sesquiterpenes when FDP is supplied as a substrate.CiCaSSy produces some linalool when GDP is supplied as a substrate.Interestingly, we did not observe any linalool formation by CiCaSSy when expressed in N. benthamiana (Figure S1 B-C).
As a first step, substrate specificity of hybrids was investigated, by determining the ratio between monoterpene and sesquiterpene products (Fig. 4).While exchanging regions R1, R2, R3 or R6 did not have a strong effect on the ratio between monoterpenes and sesquiterpenes produced, exchange of region R4 or R5 appeared to have a profound effect on both sesquiterpene and monoterpene synthase, as it completely reversed the substrate specificity of the enzyme: MS_R4 and MS_R5 use FDP as a substrate, leading to sesquiterpenes production, while SSy_R4 and SSy_R5 do not use FDP.Apart from having a strong effect on substrate preference (as observed for R4 and R5 in Fig. 4), exchange of these regions also had an effect on the product profile.
Monoterpene profiles of hybrid enzymes were derived from in vitro assays supplying GDP as a substrate (Fig. 5A-B).All CiCaSSy-derived hybrids behaved like CiCaSSy, and showed linalool formation, with traces of myrcene.While quantification based on enzyme assays was difficult, in general linalool peaks in CiCaSSy-derived hybrids were very low, compared to myrcene production in CiCaMS-derived hybrids (see Fig. 2C).All CiCaMS-derived hybrids showed a predominance of myrcene production, as reported for CiCaMS.An exception was observed when both region R4 and R5 were exchanged: MS_R4R5 and SSy_R4R5 showed product profiles more similar to CiCaSSy and CiCaMS respectively (Fig. 5A-B).Apparently, these regions define the identity of the formed monoterpene.
To exclude that the absence of sesquiTPS activity of some of the hybrids (e.g.SSy_R5) was related to poor solubility of the hybrid proteins, an SDS PAGE protein gel analysis was performed of the cell free extracts.This did not reveal obvious differences in the amount of soluble TPS protein between active and inactive synthases (Fig. S8).This supports the hypothesis that hybrids were not compromised in their overall protein folding, and that the lack of sesquiterpene production in some of the hybrids is due to the changes in the structure of the active site of the enzyme.
The sesquiterpene profile of different hybrids was more diverse, and was eventually addressed using an in vivo production method.To this end, the WT parent enzymes and the hybrids were expressed in E. coli, in combination with a plasmid which supplies FDP, and their performance was tested in a flask fermentation, using a dodecane overlay for collecting product.After fermentation, the dodecane layer was analysed by GC-MS, and the sesquiterpene profile was extracted.In the in vivo system, all hybrids displayed sesquiterpene product profiles similar to those observed in the in vitro experiments (Figs.5-7; Figs.S6A-B).No change in sesquiterpene profile was observed for hybrids covering regions R1, R2 and R3.Hybrids covering regions R4, R5 and R6 displayed marked Region R4 localizes in the C-terminal domain (yellow in Fig. 3C).It contains three substitutions located in the active site and two in its proximity, suggesting that its exchange could have an impact on the resultant terpene profiles.When the R4 region of CiCaSSy was introduced in monoTPS CiCaMS, the hybrid protein (MS_R4) displayed ses-quiTPS activity, producing predominantly trans-α-bergamotene, with some β-santalene (Fig. 5C).Conversely, when the R4 region of CiCaMS was introduced into CiCaSSy (SSy_R4), all sesquiterpene production was lost (Fig. 5D).
Region R5 (represented in red in Fig. 3C) is only 12 amino acids long but contains five substitutions, three of which are located very close to the substrate in the 3D model.Again, the exchange of R5 caused the complete loss of sesquiterpene synthase activity in CiCaSSy (SSy_R5, Fig. 5D).On the other hand, introducing region R5 from CiCaSSy into CiCaMS (MS_R5), resulted in production of trans-α-bergamotene, as was also observed for MS_R4 (Fig. 5C).
Region R6 (represented in blue in Fig. 3C) is located at the C-terminus of the protein.It comprises three substitutions between CiCaMS and CiCaSSy.Replacing region R6 from CiCaSSy by R6 from CiCaMS (SSy_R6) resulted in the production of all wild type sesquiterpene products, but with a relatively higher production of trans-α-bergamotene compared to CiCaSSy (Fig. 5D).Introduction of R6 from CiCaSSy into CiCaMS did not result in any sesquiterpene production (MS_R6) (Fig. 5C).
A set of double hybrids was generated for CiCaMS and CiCaSSy (Fig. 3A), in which regions R1 to R4 were simultaneously exchanged in combination with R5.All double hybrids carrying R1-R3 regions in combination to R5 showed the same product profile as R5 single hybrids (Fig. S6C), indicating that none of these regions contribute to product specificity.However, the double hybrid carrying R4 and R5 from CiCaSSy in the mainframe of CiCaMS (MS_R4R5) restored the production of all three main products of CiCaSSy.The relative peak ratio of the products of MS_R4R5 was comparable to the profile produced by SSy_R6, confirming that the residues essential for restoring CiCaSSy product profiles in CiCaMS are located in R4, R5 and R6 (Fig. 5C-D).

Single-residue mutants identify critical residues for sesquiterpene synthase activity
As a next step, the roles of 10 individual positions in the amino acid sequence for producing santalenes were investigated, by exchanging them between CiCaSSy and CiCaMS (Table S3).In addition to the profile, total sesquiterpene production in dodecane was analysed for the most relevant mutants.
The residues were grouped, based on their position in respect to the active site cavity.From R4, residues 267, 291 and 294 appear to belong to the active site (Fig. 3C).These residues were substituted in CiCaMS and CiCaSSy, obtaining MS_S267N, MS_L291I, MS_F294M from CiCaMS and SSy_N267S, SSy_I291L and SSy_M294F from CiCaSSy.When testing the product profile of these mutant enzymes only one residue appeared to be responsible for sesquiTPS activity.Mutant MS_F294M showed low but well detectable production of trans-α-bergamotene.Conversely, complementary mutant SSy_M294F had lost the ability to produce sesquiterpenes in this system.Mutation of residues N267 and I291 each resulted in a change of the product ratio in CiCaSSy (Fig. 6A-B), but no major change in total sesquiterpene production was observed.The more distant residues in R4, 273 and 308, were also probed for their role in terpene synthesis, by testing mutants MS_G273A and MS_E308D for CiCaMS and SSy_A273G and SSy_D308E for CiCaSSy.Among these, mutant SSy_A273G displayed an altered sesquiterpene profile and lower productivity, compared to CiCaSSy.SSy_D308E showed a product profile which was comparable to the wild type CiCaSSy.
The same approach was used for region R5, where positions 401, 403 and 404 participate in the active site, and positions 415 and 419 appear to be located further away, near the bottom of the cavity (Fig. 3C).Among the CiCaMS mutants, production of trans-α-bergamotene was observed for both MS_L403F and MS_L404V (Fig. 6C).Apparently, either of these mutations in CiCaMS is sufficient to confer sesquiTPS activity, albeit that the product accumulation was lower than observed for CiCaSSy and hybrid MS_R5.Conversely, substitution V404L in CiCaSSy resulted in a complete loss of sesquiTPS activity.CiCaSSy mutants A401V and F403L displayed altered product ratios compared to wildtype CiCaSSy, with trans-α-bergamotene being the dominant product (Fig. 6D).Mutations in the two residues further away from the active site did not alter the product spectrum of CiCaSSy (SSy_Q415H, SSy_E419A), nor did they confer sesquiterpene synthase activity on CiCaMS (MS_H415Q, MS_A419E).
Thus, the substitution analysis of region R4 and R5 indicates that three single amino acid positions are crucial to introduce sesquiTPS activity in CiCaMS: substitutions F294M, L403F and L404V each result in sesquiTPS activity of the monoTPS.In CiCaSSy, substitutions M294F Fig. 4. Calculated ratio for (A) monoterpene vs. sesquiterpene production in CiCaMS and CiCaMS-derived single hybrids and (B) sesquiterpene vs. monoterpene production in CiCaSSy and CiCaSSy-derived single-hybrids.Data from in vitro enzyme assays were used.Each enzyme was tested in duplicate.For the ratios, the sum of the areas of the principal compounds was calculated and converted in logarithmic scale.Note that, as all enzymes showed monoterpene activity in vitro, graphic (A) displays an overall higher ratio compared to graphic (B).and V404L result in a complete loss of sesquiTPS activity.
Although R6 was not observed to be crucial for sesquiterpene production, its exchange resulted in a significant variation in product ratio (Fig. 7).Among the R6 variant residues, residue D442 maps close to the active site of CiCaSSy and is part of its NSE/DTE motif (Fig. 3B-C).Two mutations, MS_N442D and SSy_D442N, were tested in variant MS_R4R5 and in CiCaSSy, respectively.As shown in Fig. 7A, hybrid MS_R4R5 with the additional single mutation N442D recovers the product ratio of the wild type CiCaSSy.Conversely, SSy_D442 N results in the same product ratio as SSy_R6 and MS_R4R5 (Fig. 7B).This confirms residue 442 as a major determinant of product ratio in CiCaSSy.
Based on these results, we hypothesised that the combination of the six residues identified above (267,294,401,403,404,442) would be sufficient to effectively establish the CiCaSSy sesquiTPS profile in CiCaMS.To substantiate this, we generated a CiCaMS-derived mutant carrying all these six amino acid substitutions.The obtained variant, referred to as MS_6S, was at least as active as CiCaSSy in producing sesquiterpenes and showed a product profile very similar to wild type CiCaSSy (Fig. S9), with the presence of all three compounds and α-santalene as major peaks (Fig. 7B).Values on the Y axis express the relative ratio of each compound, relative to the total sesquiterpenes produced.Each variant was tested in three independent experiments.Error bars indicate the standard deviation, tables report the calculated mg/L of total sesquiterpenes produced.In the tables, N.D. stands for "not detected" while N.A. for "not analysed".

Discussion
In our study we characterized a novel santalene synthase from C. camphora (CiCaSSy) which shows low similarity with the previously characterized santalene synthases Santalum spp.(~38% identity [20]).We also identified a closely related monoTPS (CiCaMS) which does not show any sesquiterpene activity despite differing from CiCaSSy in only 22 out of 553 (95%) amino acids.Among these, three residue changes (M294F, L403F and V404L) were each able to convert the monoTPS into a sesquiTPS.Three more substitutions (S267N, V401A and N442D) appear to be involved in defining the product profile of CiCaSSy.Thus, six residues define the specific product properties of CiCaSSy, relative to its monoTPS counterpart.To examine the amino acid differences between these two enzymes in a functional context, we used a structural model of CiCaSSy, depicted in Fig. 3B.
Two of the most important residues addressed in this study are at position 294 and 404.Both in CiCaSSy and in CiCaMS, the identity of the residues in these positions determine sesquiTPS activity.In the structure model, the sidechains of these residues point into the active site cavity, although the precise topology of the side chain cannot be accurately inferred from the structure model.For residue 294, the size properties of the Phe side chain as found in CiCaMS may hinder entry of FDP into the active site pocket, thus preventing sesquiTPS activity.The role of this region in TPS function has been earlier addressed by Kampranis et al. (2007) [49], who showed that in the 1,8-cineole monoTPS from Salvia fruticosa, a mutation of an Asn to Ala, in a position corresponding to 291 in CiCaSSy, allowed for the enlargement of the active site cavity to accommodate the bulkier FDP substrate and induce sesquiterpene production.Both position 291 and 294 have differing amino acids in the two Cinnamonum synthases, but only 294 appears to affect sesquiterpene production.An alignment of relevant synthases, highlighting the residues of interest, is presented in Fig. S7.
Residue 404, together with residues 401 and 403, lies around the kink in the G2 helix, which has been studied in many different contexts as being crucial for product specificity in TPSs [49][50][51][52].These previous studies reveal that mutations in these positions can lead to changes in the product specificity, consistently with our observations that substitutions in positions 401 and 403 have a strong impact on the product profile of CiCaSSy.However, a stronger effect is observed for position 404, which induces sesquiTPS activity in CiCaMS (MS_L404V) and disrupts completely the sesquiTPS activity in CiCaSSy (SSy_V404L).The results presented here indicate that conversion of CiCaMS into a bergamotene synthase can be mediated by a variety of mutations which affect the shape of the active site cavity.More subtle changes result from mutations in positions 267, 291 and 442, which predominantly have an effect on the product profile of the sesquiTPS, leading to altered ratios of trans-α-bergamotene, αand β-santalene.Position 267 (Ser in CiCaMS and Asn in CiCaSSy; Fig. 6B) has been implicated in the second cyclization required to produce bicyclic monoterpenes such as α-pinene [53], corresponding to the third cyclization for the production of tricyclic sesquiterpenes.Thus, this functional activity may explain the difference in product ratio of the bicyclic sesquiterpenes (trans-α-bergamotene and β-santalene) compared to the tricyclic α-santalene seen in SSy_N267S (Fig. 6B).
Residue 442 in R6 forms part of the catalytic NSE/DTE motif, which acts as the second Mg 2+ binding motif in TPSs [54].Our previous research [13] showed that this position is predominantly (65%) an Asp among 250 characterized plant sesquiTPSs.Possibly, the prominent role of this residue in determining the product profile of CiCaSSy is related to its involvement in orienting the magnesium ion.
The residues which determine myrcene synthase activity in CiCaMS are mostly located in region R4: when R4 from CiCaSSy is introduced in CiCaMS, the product ratio is affected at the expense of myrcene, while in other hybrids, the ratio between myrcene and linalool is maintained.Interestingly, the myrcene-dominated profile is maintained in MS_R5, which is yet also able to produce the sesquiterpene trans-alpha bergamotene.The role of individual residues in R4 and R5 for myrcene biosynthesis was not further investigated, as the main focus of this work is the identification of residues which determine the multiproduct sesquiterpene synthase profile.
The fact that CiCaSSy and CiCaMS are so closely related raises the question whether the common ancestor enzyme was a sesquiTPS or a monoTPS.Our results do not provide clear answers to this question.The highest sequence similarity of both enzymes was found with a terpineol synthase, suggesting that the santalene synthase CiCaSSy could have evolved from a more common monoTPS present in Cinnamomum spp.On the other hand, the ability of CiCaSSy to produce a suite of bicyclic and tricyclic sesquiterpenes, which require a complex cascade of proton migrations in the sesquiterpene, as opposed to the presumed more simple linear monoterpene myrcene, which requires hardly any proton transfer, could lead to consider CiCaMS as a sort of loss-of-function mutant of CiCaSSy [38,55,56].This view would also be in agreement with the absence of a clear plastid transit peptide in both CiCaMS and CiCaSSy.However, it should be noted that CiCaMS is a functional monoterpene synthase, since it does mediate myrcene formation when expressed in N. benthamiana.Moreover, the chromatogram of the root tissue (Fig. 1) shows a myrcene peak consistent with the activity observed for CiCaMS.Thus, different hypotheses into the evolutionary order of myrcene and santalene evolution find TPSs with a mixed mono/sesquiterpene product profile have been described before.Several studies describe the ability of few sesquiTPS to also behave as monoTPSs in the presence of GDP [57].The santalene synthases and the bisabolene synthases from Santalum spp.have also been observed to produce linalool, geraniol and terpineol when supplied with GDP [20].Other examples are the trans-α-bergamotene synthase from Lavandula angustifolia [58] and the α-bisabolene synthase from Abies grandis [59].Interestingly, all these enzymes result in products which are part of a specific subclass of sesquiterpenes, which derive from the bisabolyl cation [60].One could hypothesize that sesquiterpenes derived from the bisabolyl cation, which present a cyclized "head" and a uncyclized "tail", can be produced by synthases which are closely related to monoTPSs, and may have evolved from them.From this perspective, santalenes, bergamotenes and bisabolenes can also be seen as cyclized monoterpenes with an isopentenyl extension.As mentioned above, a change in residue 294, 403, or 404 seems to be sufficient to change the shape of the active site pocket of CiCaMS and to allow the accommodation of the larger FDP substrate.Thus, this hypothesis is sustained both by the ability of CiCaSSy to produce monoterpenes in vitro and, more importantly, by the demonstration that several single residue substitutions in the active site of monoTPS CiCaMS each are sufficient to trigger the production of sesquiterpene trans-α-bergamotene.

Conclusions
With this study we characterized two novel TPSs, one monoTPSs and one sesquiTPS from C. camphora.Residues essential for the conversion of the monoTPS into a sesquiTPS were identified and we effectively succeeded into converting CiCaMS in a santalene synthase showing the same product profile as CiCaSSy, by substituting only six residues.This work provides new insights into the function of specific residues and their role in the catalytic site of TPSs, contributing to a better understanding of this class of enzymes.

Declaration of competing interest
Authors JB, AvH and DB are listed as inventors in a pending patent application (WO2018/160066; applicant, Isobionics) encompassing information presented here.

Fig. 1 .
Fig.1.GC-MS analysis of pentane extract of Cinnamomum camphora root tissue.The compounds were identified by their mass spectra using the NIST library and confirmed comparing their retention indexes with the reference list provided by Adams[44].The y-axis reports the GC-MS response units, while the x-axis reports the retention times.The compounds identified for each peak

Fig. 3 .
Fig. 3. (A) Schematic representation of hybrid design.The monoTPS (CiCaMS) is represented in gray, while the sesquiTPS is represented in black.The region swaps are highlighted and depicted as for the parental enzyme (from R1 to R6, respectively: green, purple, light blue, orange, red, ocean blue).The asterisks indicate the regions where single residue exchanges were made.(B) 3D model of CiCaSSy, including Mg2+ ions [pink] and FDP analogue (trifluorofarnesyl diphosphate,FFF) [sticks].The model was generated with Modeller [29], using three templates, of which: one sesquiTPS (5JO7) and two monoTPSs (2ONH and 5UV2).The different regions are highlighted in different colors as described in (A).(C) Zoom in into the active site of CiCaSSy.R4, R5 and R6 are visible.In the picture, all amino acid substitutions of the three regions are named, from CiCaMS to CiCaSSy.The asterisks point out the residues essential for the evolution of CiCaMS into a santalene synthase.(For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Figure 5 .
Figure 5. (A-B) Monoterpene profiles of relevant hybrids derived from (A) CiCaMS and (B)CiCaSSy.Activity was tested in an in vitro system using cell-free extracts and GDP as substrate.(C-D) Profile of produced sesquiterpenes by enzyme hybrids in an E. coli fermentation system.(C) Sesquiterpene production profiles mediated by CaCaMS and its derived hybrids; (D) sesquiterpene production profiles mediated by CiCaSSy and its derived hybrids.(A-D) Values on the Y axis express the relative ratio of each compound, relative to the total sesquiterpenes.Each variant was tested in three independent experiments.Error bars indicate the standard deviation.

Fig. 6 .
Fig. 6.Profile of produced sesquiterpenes by amino-acid mutants in an E. coli fermentation system.(A) sesquiterpene production profiles mediated by CiCaMS R4 hybrid and single mutants; (B) sesquiterpene production profiles mediated by CiCaSSy R4 hybrid and single mutants.(C) sesquiterpene production profiles mediated by CiCaMS R5 hybrid and single mutants; (D) sesquiterpene production profiles mediated by CiCaSSy R5 hybrid single mutants.Activity of WT enzymes has been included for comparison.Values on the Y axis express the relative ratio of each compound, relative to the total sesquiterpenes produced.Each variant was tested in three independent experiments.Error bars indicate the standard deviation, tables report the calculated mg/L of total sesquiterpenes produced.In the tables, N.D. stands for "not detected" while N.A. for "not analysed".

Fig. 7 .
Fig. 7. Profile of produced sesquiterpenes by (A) CiCaMS and (B) CiCaSSy variants in an E. coli fermentation system.(A) sesquiterpene production profiles mediated by CiCaMS double hybrid MS_R4R5 in comparison with the single mutant MS_R4R5_N44D.(B) sesquiterpene production profiles mediated by SSy_R6 hybrid and R6 single mutant (SSy_D442 N).In graphic (A) is also reported MS_6S, in comparison with SSy_I291L in (B).Error bars indicate standard deviation, tables report the calculated mg/L of total sesquiterpene produced.In the tables, N.D.: not detected; N.A.: not analysed.