Directed Biosynthesis of Mitragynine Stereoisomers

Mitragyna speciosa (“kratom”) is used as a natural remedy for pain and management of opioid dependence. The pharmacological properties of kratom have been linked to a complex mixture of monoterpene indole alkaloids, most notably mitragynine. Here, we report the central biosynthetic steps responsible for the scaffold formation of mitragynine and related corynanthe-type alkaloids. We illuminate the mechanistic basis by which the key stereogenic center of this scaffold is formed. These discoveries were leveraged for the enzymatic production of mitragynine, the C-20 epimer speciogynine, and fluorinated analogues.

In-Fusion kit (Clontech Takara). Plasmid DNA was isolated from bacterial cultures using the Wizard ® Plus SV Minipreps DNA Purification System kit (Promega).

Metabolomics on M. speciosa tissue
Identical plant samples as used for RNA-Seq were subjected for targeted and untargeted metabolomics. Frozen plant material of mature leaves, young leaves, stems, barks and roots were snap frozen in liquid nitrogen and grounded to a fine powder using mortar and pestle. Fresh tissue weight (100 mg) were mixed with 300 µL MeOH (supplemented with 0.1% formic acid) and vortexed vigorously for 1 min. Afterwards the samples were sonicated for 15 min at room temperature. Cell debris was removed by centrifugation (15000 x g; 20 min). The supernatant was filtered through polytetrafluoroethylene (PTFE) syringe filters (0.22 µm), diluted 1:15 with MeOH (supplemented with 0.1% formic acid) and analysed by UPLC/MS (method 1).

RNA purification and sequencing
Total RNA of M. speciosa (roots, stem, bark, young leaves, mature leaves) was extracted using the RNeasy Mini Kit (Qiagen) according to manufacturer´s instructions. The TURBO DNA-free™ Kit (Thermo Fischer) was used to remove contaminating DNA. Optionally the RNeasy Mini Kit (Qiagen) was used again to improve purification. For each tissue triplicates were prepared. The quality of obtained RNA was analysed using an Implen NanoPhotometer ® N60. All samples satisfied the necessary requirements for total RNA sequencing (≥ 400 ng; A260/280 = 1.8-2.2; A260/230 ≥ 1.8) and were submitted to Novogene (https://en.novogene.com/) for total RNA sequencing using the company´s standard protocols for library preparation and RNA-Seq. ≥ 30 M raw sequencing reads (Illumina, 150 bp paired-end) were acquired per sample.

Coexpression analysis
For gene coexpression analysis the transcriptome provided by the sequencing company (Novogene) was used. Additionally, the raw data was assembled in-house using a standard RNA-Seq. bioinformatics pipeline. In brief, raw read quality was assessed using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). 2 Trimmomatic was used to remove adapter sequences from raw sequencing data. 3 Next, Trinity was used to assemble the M. speciosa transcriptome. 4 The transcriptome assembly was refined using the CD-Hit-Suite to group transcripts with greater than 90% identity and only the longest transcript was retained. 5 Transdecoder (https://github.com/TransDecoder/TransDecoder) was deployed to identify candidate coding regions within transcript sequences. Functional annotation was then performed by running Blast against the Uniprot and Pfam-database. [6][7][8] Finally, Salmon was used to quantify the expression of transcripts. 9 Both the commercially obtained and the in-house generated transcriptomes were used to identify candidate genes. Pearson correlation coefficients were calculated using Microsoft Excel using the expression profiles of MsSTR, MsDCS1 or MsEnolMT as `bait´. MsSTR was putatively identified based on homology to Catharanthus roseus strictosidine synthase and used as bait to identify possible reductase gene candidates. Additionally, a self-organising map (SOM) was used to group transcripts by spatiotemporal expression pattern, as recently reported. 10 Genes that grouped with MsDCS1 and MsEnolMT and were functionally annotated as oxidases were considered candidate genes for hydroxylase activity towards corynantheidine (5ab).

Cloning of gene candidates
Full-length gene sequences of candidates were amplified by PCR from cDNA of M. speciosa using the primers indicated in Supplementary Table S3+S4 and further purified using agarose gel electrophoresis. Each fragment was amplified with suitable overhangs at the 5´ and 3´ end to facilitate In-Fusion cloning into suitable plant or bacterial expression plasmids. In case synthetic gene sequences were subcloned into expression vectors, the commercially obtained oligonucleotide was used as template for the PCR reaction.
For transient gene expression in Nicotiana benthamiana, genes of interest were subcloned into a modified binary 3Ω1 vector. 11 3Ω1 was linearized with the restriction enzyme BsaI (Thermo Fischer), purified by kit (DNA Clean & Concentrator™-5, Zymo) and fused with the gene of interest by In-fusion cloning (Clontech Takara, manufacturer´s instructions). The In-Fusion reaction mixture was transformed into chemically competent E. coli TOP10 cells (Thermo Fischer) and selected on LB agar supplemented with spectinomycin (200 µg/mL). After 1 d positive transformants were identified by colony PCR using universal sequencing primers (#33, #34) for 3Ω1 (Supplementary Table S4). Plasmids of positive transformants were isolated from overnight cultures (37 °C, 225 rpm, LB+spectinomycin) and correct cloning was confirmed by Sanger sequencing (Azenta Life Sciences).
For gene expression in Escherichia coli, genes of interest were ligated in frame downstream of a His6-coding sequence of pOPINF vectors linearized with HindIII and KpnI. Alternatively, a pOPINM vector (encoding an N-terminal MBP-tag) linearized with HindIII and KpnI was used. pOPINF and pOPINM were kindly provided by Ray Owens (Addgene plasmid #26042 & #26044).

Mutagenesis
Site-directed point mutations were introduced into the MsDCS1 or CpDCS gene by PCR. The mutagenesis strategy is illustrated in Supplementary Fig. S27. In brief, to introduce mutations into the respective ADH gene the gene was amplified in two fragments, containing the mutation(s) in complementary overhangs. Both fragments were purified by agarose gel electrophoresis (1%, 120 V, 45 min) and ligated into linearized 3Ω1 vector by In-Fusion cloning (Clontech Takara). Correct cloning was assessed by Sanger sequencing and only sequence verified plasmids were used in downstream applications.

Transformation of Agrobacterium tumefaciens GV3101
Electrocompetent cells of Agrobacterium tumefaciens GV3101 (Goldbio) were thawed on ice and mixed with plasmid DNA (300-600 ng) that had been checked by Sanger sequencing. After incubation on ice for 30 min, the cell suspension was transferred to an electroporation cuvette and cells were electroporated using a MicroPulser™ (BioRad). Cells were mixed with 1 mL LB medium and incubated at 28 °C/225 rpm for 3 h prior to plating on LB agar plates (supplemented with 20 µg/mL rifampicin, 50 µg/mL gentamycin and 200 µg/mL spectinomycin). Plates were kept at 28 °C for 2 d. Single colonies were used to inoculate liquid cultures. Liquid cultures were prepared as 10-20 mL cultures (supplemented with 20 µg/mL rifampicin, 50 µg/mL gentamycin and 200 µg/mL spectinomycin) and cultivated at 28 °C and 250 rpm for up to 24 h. 50 % glycerol stocks were prepared thereof, snap frozen in liquid nitrogen and stored at -80 °C indefinitely.

Transient expression of gene candidates in Nicotiana benthamiana
Transient expression of gene candidates in N. benthamiana was performed as previously reported by Hawes et al. 12 In brief, transformed Agrobacterium GV3101 strains containing the gene construct of interest were cultivated in 10 mL LB medium (supplemented with 20 µg/mL rifampicin, 50 µg/ml gentamycin and 200 µg/ml spectinomycin) for 16 h at 28 °C and 250 rpm. Afterwards the cells were collected by centrifugation (3000 x g, 30 min) and washed with 1-3 mL infiltration buffer (50 mM MES, 2 mM Na3PO4, 27.8 mM glucose, 100 µM acetosyringone). After centrifugation (3000 x g, 10 min) the wash step was repeated. Finally, cells were resuspended in 10-15 mL infiltration buffer and the optical density OD600 was measured. Upon infiltration of a single Agrobacterium strain the suspension was diluted to a final OD600 = 0.3 in a total volume of 15 mL. Upon infiltration of multiple Agrobacterium strains the strains were diluted so that the final OD600 was < 1 (equal concentration for each strain). Resulting suspensions were incubated at RT for 1 h and then infiltrated into the underside of 3-4 week old N. benthamiana leaves using a needleless 1 mL syringe. After 3 days the substrate(s) (usually 700 µM tryptamine [or methoxylated/fluorinated tryptamine analogues] and 700 µM secologanin dissolved in 1 mL ddH2O) were infiltrated into the underside of the same leaves previously infiltrated with the Agrobacterium strains of choice. At 2 days post-infiltration, leaves were harvested (ca. 100-150 mg fresh weight) and snap-frozen in liquid nitrogen. Each individual infiltration experiment was tested at least 2x times, with biological replicates consisting of at least two leaves from two different tobacco plants.
To assess the different ratios of corynantheidine-formation [(5a) vs (5b)] in different mutants of MsDCS1 and CpDCS, each Agrobacterium tumefaciens strain harbouring a mutant ADH construct was co-infiltrated with CrSTR, CrSGD and MsEnolMT into N. benthamiana leaves. For each mutated ADH construct 3x biological replicates were procured, with each replicate consisting of two tobacco leaves infiltrated with Agrobacteria and substrate. All wild-type and mutant ADH genes were tested in parallel, using the same batch of N. benthamiana to minimize batch effects.

Sample harvest and analysis
Harvested, snap-frozen N. benthamiana leaf tissue (100 mg) was homogenized on a TissueLyser II (Qiagen) using 2 x 2-mm-diameter tungsten beads while shaking vigorously at 22 Hz for 2 min. MeOH (350 µL supplemented with 0.1 % formic acid) was added to each sample, prior to vigorous vortexing for 1 min. After sonication (RT, 15 min) the samples were centrifuged at full speed (>13000 x g, 20 min) and filtered through 0.22 µm PTFE syringe filters. Filtered samples were directly analysed by high-resolution LC-MS and individual metabolites were identified based on comparison of retention times and MS2 spectra with authentic standards. DataAnalysis Version 5.3 (Bruker) was used to analyse LCMS data.

Analysis of corynantheidine formation in ADH mutants
To assess the different ratios of corynantheidine-formation [(5a) vs (5b)] in different mutants of MsDCS1 and CpDCS extracted ion chromatograms corresponding to m/z = 369 of corynantheidine (5ab) were analysed: Peak areas of peaks corresponding to (5a) and (5b) were determined using the DataAnalysis software and relative percentages of (5a) and (5b) were calculated. The relative percentages of (5a)-and (5b)-formation with corresponding standard deviations indicated in Supplementary Figure S14 represent the mean relative percentages calculated from the three biological replicates performed for each mutated ADH construct.

Heterologous gene expression in Escherichia coli
For production of recombinant enzymes in E. coli, expression plasmids containing the gene of interest were transformed into chemically competent E. coli BL21(DE3) cells using a standard heat-shock protocol. For the expression of Catharanthus roseus strictosidine synthase (CrSTR) as well as Catharanthus roseus strictosidine glucosidase (CrSGD) previously reported expression constructs were used. [13][14][15] Single colonies from these transformations were inoculated into a 10 mL seed culture (LBmedium; supplemented with either 100 µg/mL carbenicillin or 50 µg/mL kanamycin) and cultivated overnight at 37 °C and 200 rpm. An aliquot of the seed culture (8 mL) was used to inoculate 1 L 2TY medium (supplemented with either 100 µg/mL carbenicillin or 50 µg/mL kanamycin). The resulting expression culture was incubated at 37 °C and 200 rpm until an optical density (OD600) of between 0.4-0.6 was reached. The culture was then moved to an 18 °C shaker set to 200 rpm and protein expression was induced by adding isopropyl β-D-1-thiogalactopyranoside (IPTG) to a final concentration of 200 µM. The culture was incubated overnight for 16-24 h. The cells were then harvested by centrifugation (4000 rpm, 4 °C, 20 min), frozen in liquid nitrogen and stored indefinitely at -80 °C.

Purification of recombinant proteins
Cell pellets were thawed on ice and resuspended in 80-100 mL buffer A (50 mM Tris base, 50 mM glycine, 500 mM NaCl, 20 mM imidazole, 5% glycerol (v/v), pH 8.0; a fresh 100 ml aliquot of this was prepared on the day of protein purification and mixed with 10 mg lysozyme and 1x protease inhibitor cocktail tablet [cOmplete, EDTA-free, Roche]). Resuspended cells were lysed using an ultrasonic liquid processor (vibra cell™, Sonics ® ; 40 % amplitude; 2s on/3s off; total `on´-time: 3 min). Cell debris was removed by centrifugation (4 °C, 35 min, 35000 x g). The protein of interest was then purified on an ÄKTA pure FPLC system (GE Healthcare) connected to a HisTrap™ column (cytiva, column volume = 5 mL). The

Preparation of strictosidine (8)
Strictosidine (8) was produced as reported recently by Caputi et al. 1 In brief, 6 mM tryptamine hydrochloride and 4 mM secologanin were combined in a total volume of 15 mL HEPES buffer (50 mM, pH 7.5). CrSTR was added to a final concentration of 5 µM and the reaction was stirred at 30 °C for 18 h. Initially, strictosidine was pre-purified on a reverse-phase solid-phase extraction (SPE) cartridge (Discovery DSC-18, 1g, Supelco). To do so the column was activated with 4 ml of MeOH and equilibrated with 4 mL of water. The sample was then loaded onto the column and 4 mL of water were used to wash the column. Elution of strictosidine was achieved with 8 mL of MeOH. Strictosidine was further purified by preparative HPLC (Method 2; vide infra). Strictosidine (2.0 mg) was obtained.

Method 1
All compounds and extracts used in this study were analysed using method 1. Method 1 has been previously reported by Kamileen et al. 16 In brief, for LCMS data acquisition an UltiMate 3000 ultrahigh performance liquid chromatography system (UHPLC; Thermo Fischer) connected to an Impact II UHR-Q-ToF (Ultra-High Resolution Quadrupole-Time-of-Flight) mass spectrometer (Bruker) was used. Compound separation was achieved using reverse-phase liquid chromatography on a Phenomenex Kinetex XB-C18 (100 x 2.1 mm, 2.6 µm; 100 Å) column operated at 40 °C. Mobile phases: (A) water with 0.1 % formic acid; (B) acetonitrile; flow rate = 0.6 ml/min. 2 µL sample was injected in each run; authentic standards were prepared as methanol solutions in concentration ranges between 20-100 µM. Chromatography conditions: 10 % B for 1 min; 10 % B to 30 % B in 6 min; 90 % B for 1.5 min; 10 % B for 2.5 min. Mass spectrometry conditions: mass spectrometry was performed in positive electrospray ionization mode (capillary voltage = 3500 V; end plate offset = 500 V; nebulizer pressure = 2.5 bar; drying gas: nitrogen at 250 °C and 11 L/min). Mass spectrometry data was recorded at 12 Hz ranging from 80 to 1000 m/z using data dependent MS2 and an active exclusion window of 0.2 min. Tandem mass spectrometry settings: fragmentation was triggered on an absolute threshold of 400 and restricted to a total cycle time range of 0.5 s; collision energy was deployed in a stepping option model (20-50 eV). To calibrate MS spectrum recording each run was initiated with the direct source infusion of a sodium formate-isopropanol calibration solution (operated by an external syringe pump at 0.18 ml/min using a 5 mL syringe with an ID of 10.3 mm). The initial 1 min of the chromatographic gradient was directed towards the waste.

Metabolite Analysis by Ultra-Performance Liquid Chromatography-Tandem Mass Spectrometry
For the analysis of samples containing fluorinated analogues of corynantheidine (5ab) we additionally performed metabolite analysis by Ultra-Performance Liquid Chromatography-Tandem Mass Spectrometry (data depicted in Supplementary Figure S22, S24 and S26). For this a UHPLC system (Ultimate 3000 RS; Thermo Scientific) connected to a triple quadrupole (EVOQ Elite; Bruker) mass spectrometer was used. Chromatography was performed using a Phenomenex Kinetex XB-C18 column (2.1 x 100 mm, 2.6 µm) kept at 40 °C. Water containing 0.1% formic acid and acetonitrile were used as mobile phases A and B, respectively, with a flow rate of 0.6 ml/min. The gradient was 10% B from 0.0 min to 1.0 min; 10% to 30%B from 1.0 min to 6.0 min; 30% to 100% B from 6.0 min to 6.1 min; 100% B from 6.10 min to 7.5 min; 100% to 10% B from 7.5 min to 7.6 min; 10% B from 7.6 min to 10 min. The analysis was carried out in ES+ mode and the samples were kept at 10 °C. The injection volume of both the standard solutions and the samples was 2 µL. Spray voltage was 3500V; the heated probe was kept at 450 °C; cone temperature was 350 °C; cone gas flow, 20 (arbitrary units); nebulizer gas flow, 50 (arbitrary units); probe gas flow, 45 (arbitrary units). A resolution of 1.5 Da was applied to quadrupole 1 and 2 Da to quadrupole 3. Argon was used as collision gas (1.5 mTorr). Flow injections of (20S)-corynantheidine (5a) were used to optimize the multiple reaction monitoring (MRM) conditions. The spray voltage was experimentally determined and the collision energies were automatically adjusted by MS Workstation software 8.2.1 (Bruker). A dwell time of 167 ms was applied to each MRM transition. For the detection of fluorinated analogues of corynantheidine (5ab), for which no authentic standards were available, MRM signals were predicted based on observed MSMS fragmentation pattern and by comparison to (20S)corynantheidine (5a) (Supplementary Table S5).

Method 2
All compounds isolated in this study were purified using method 2. To this purpose a preparative HPLC system (Agilent 1260 Infinity II) equipped with a Phenomenex LC column (Luna ® 5 µm C18 (2) 100A, 250 x 30 mm, AXIA™ Packed, Ea) and coupled to a multiple wavelength detector and fraction collector was used. As mobile phases A (water + 0.1 % formic acid) and B (acetonitrile) were used. The flow rate was set to 30 ml min -1 and the gradient was as follows: 10-50 % B in 33 min, 50 % B for 2 min, 10 % B for 5 min. Samples were prepared in MeOH as concentrated solutions (1-5 mg mL -1 ), filtered through a 0.22 µm PTFE syringe filter and injected successively (injection volume: 800 µL). All fractions were assessed by LCMS (Method 1) and fractions containing the desired product were pooled and dried using a Genevac EZ-2 Plus (not HCl compatible) evaporation system.

Molecular Docking
Protein models of MsDCS1 and MsDCS2 were generated using RoseTTAFold 17 and/or ColabFold. 18 In both cases standard parameters were used for modelling. For molecular docking of the NADPH cofactor, dehydrogeissoschizine (15) and/or intermediate (16) (compare main text, Scheme 2) into the active site of MsDCS1 and MsDCS2 AutoDock Vina on the Webina webserver was used. 19 Default parameters were selected. Protein, cofactor and ligand coordinates were converted into PDBQT format using AutoDock Tools v1.5.7. 20 Docking results were assessed manually and ligand orientations were selected so that the 4-pro-R-hydride of the NADPH cofactor S9 was in reasonable proximity to C-21 of dehydrogeissoschizine, the site of initial ligand reduction. Note that the depicted orientation does not necessarily correspond to the lowest possible energy solution. Docking Results were visualized using PyMOL.

Supplementary Figures
Supplementary Figure S1

Supplementary Figure S12 | Identification of (20S)-/(20R)-corynantheidine (5ab). (a)
Extracted ion chromatogram and MSMS-data corresponding to m/z of 5ab of assay with MsDCS1; (b) Extracted ion chromatogram and MSMS-data corresponding to m/z of 5ab of assay with MsDCS2; (c) Extracted ion chromatogram and MSMS-data corresponding to m/z of 5ab of assay with CpDCS.

Supplementary Figure S13 | Amino acid alignment of alcohol dehydrogenases (ADH) used in this study. (a)
Protein sequence alignment of MsDCS1, MsDCS2 and CpDCS was created with Clustal Omega; 24 alignment was visualized using ESPript V3; 25 (b) Amino acid sequence identity matrix of ADH enzymes used in this study; Muscle 3.8.425 was used for the calculation of sequence identities. 26

Supplementary Tables
Supplementary Table S1 | Nucleotide sequences for genes cloned and described in this study; start codons are highlighted in bold; stop codons are underlined