The structure of the monobactam-producing thioesterase domain of SulM forms a unique complex with the upstream carrier protein domain

Nonribosomal peptide synthetases (NRPSs) are responsible for the production of important biologically active peptides. The large, multidomain NRPSs operate through an assembly line strategy in which the growing peptide is tethered to carrier domains that deliver the intermediates to neighboring catalytic domains. While most NRPS domains catalyze standard chemistry of amino acid activation, peptide bond formation, and product release, some canonical NRPS catalytic domains promote unexpected chemistry. The paradigm monobactam antibiotic sulfazecin is produced through the activity of a terminal thioesterase domain of SulM, which catalyzes an unusual β-lactam–forming reaction in which the nitrogen of the C-terminal N-sulfo-2,3-diaminopropionate residue attacks its thioester tether to release the monobactam product. We have determined the structure of the thioesterase domain as both a free-standing domain and a didomain complex with the upstream holo peptidyl-carrier domain. The position of variant lid helices results in an active site pocket that is quite constrained, a feature that is likely necessary to orient the substrate properly for β-lactam formation. Modeling of a sulfazecin tripeptide into the active site identifies a plausible binding mode identifying potential interactions for the sulfamate and the peptide backbone with Arg2849 and Asn2819, respectively. The overall structure is similar to the β-lactone–forming thioesterase domain that is responsible for similar ring closure in the production of obafluorin. We further use these insights to enable bioinformatic analysis to identify additional, uncharacterized β-lactam–forming biosynthetic gene clusters by genome mining.


General Synthetic Methods
All chemicals and reagents were purchased from Sigma Aldrich (St. Louis, MO), AK Scientific (Union City, CA), or Fischer Scientific (Hampton, NH), unless otherwise indicated, and used without further purification.All solvents were distilled before use, including THF (Na / benzophenone), DCM (CaH), and MeCN (4Å MS).Anhydrous DMF was purchased from Sigma Aldrich and used without further purification.

Reagent abbreviations are as follows:
PyBOP ester 3 (0.088 mmol) was dissolved in 10 mL anhydrous THF.Pd/C (5.7 mg, 10% wt/wt) was added and the solution stirred under balloon pressure hydrogen gas for 3 h at room temperature.The solution was then filtered through a 0.5 µM filter and the volatiles removed in vacuo.The resulting free acid peptide (0.063 mmol) was dissolved in 2.5 mL of anhydrous DMF.Coenzyme A sodium salt hydrate (0.069 mmol) and DIPEA (0.14 mmol) were added followed by PyBOP (0.075 mmol).The reaction was stirred under inert atmosphere of argon for 2 h at room temperature and then HPLC purified (HPLC method A) to yield protected γ-D-Glu-D-Ala-L-Glu-CoA tripeptide thioester.The lyophilized material was subsequently deprotected by stirring in 2 mL of a 3:1 TFA:DCM solution at room temperature for 1 h.The volatiles were then removed in vacuo and the reaction resuspended in 3 mL of ACN and purified by HPLC method B to yield γ-D-Glu-D-Ala-L-Glu-CoA tripeptide mimic 4 (0.0098 mmol, 11%  Sequences were selected by identifying presence of 1) "QCN" residues (highlighted yellow) at the active site, 2) two out of three Arginine residues (highlighted blue) for interacting with substrate residues and 3) catalytic residue Aspartate on position II (highlighted red).S3.Potential uncharacterized β-lactam producing BGCs.Three BGCs identified from anti-SMASH that may produce uncharacterized β-lactam products.The Stachelhaus code residues in the adenylation domain of the terminal module with high similarity to the sulfazecin cluster are also included.

Supplementary
: (Benzotriazol-1-yloxy)tripyrrolidinophosphonium hexafluorophosphate DCM: dichlorormethane DMF: N,N-dimethylformamide DIPEA: N,N-diisopropylethylamine MeCN: acetonitrile TFA: trifluoroacetic acid Thin layer chromatography (TLC) was carried out on silica gel coated glass plates with the elution conditions indicated, CV refers to column volumes of mobile phase used in silica gel chromatography, and tR indicates retention time.Synthesis of γ-D-Glu-D-Ala-L-Glu-CoA Tripeptide mimic Protected γ-D-Glu-D-Ala-L-Glu tripeptide benzyl ester 3. Protected D-Glu-D-Ala dipeptide 1 (0.45 mol) was dissolved in 5 mL of DCM.To the solution was added doubly protected D-glutamate 2 (1.35 mmol) followed by PyBOP (0.54 mmol) and DIPEA (1.39 mmol) and the solution stirred overnight at room temperature.The organic phase was washed with aqueous NaHCO3 and NH4Cl, and the combined organics were dried over anhydrous MgSO4.Protected γ-D-Glu-D-Ala-L-Glu tripeptide benzyl ester 3 (0.23 mmol 51 % yield) was isolated by silica gel column chromatography eluting with EtOAc/hexanes. 1 H-NMR (400 MHz, CDCl3): δ 7.38-7.28(m, 5H), 7.03 (d, J = 7.8 Hz, 0.5H)6.49(d, J = 6.5 Hz, 0.5H), 5.23 (d, J = 7.8 Hz, 0.5H), 5.18-5.08(m, 2H), 4.61-4.53(m, 0.5H), 4.54-4.45(m, 0.5H), 4.16-4.08(m, 0.5H), 3.49 (dd, J = 8.3, 5.2 Hz, 0.4H), 2.33 (t, J = 7.5 Hz, 1H), 2.29-2.20 (m, 2H), 2.20-1.74(m, 4H), 1.56 (s, 11H), 1.44 (s, 5H), 1.42-1.39(m, 13.5H), 1.36 (d, J = 7.0 Hz, 1.5H) HRMS (ESI) m/z: [M+H]+ C33H52N3O10 calculated 650.3647, found 650.3466 γ-D-Glu-D-Ala-L-Glu-CoA Tripeptide mimic 4. Protected γ-D-Glu-D-Ala-L-Glu tripeptide benzyl Figure S1.Sequence and Structure Alignment of SulM thioesterase domain compared with prior NRPS thioesterase domain structures.Values above the diagonal represent pairwise sequence identity of the thioesterase domains, calculated with CLUSTAL OMEGA.Values below the diagonal represent rms displacement by superimposing the core thioesterase domain with PYMOL super algorithm, lacking the dynamic lid loops Supplemental Figure S3.Chemical structures of final products released from thioesterase domains of structurally characterized proteins.Release products of NRPS systems for which TE domain structure has been solved.In some cases, the product is further modified following release to yield the ultimate natural product.The structures and molecular weight do not include any chemical modifications that occur following release of the NRPS peptide.SulM PCP-thioesterase didomain structure illustrates the phosphopantetheine cofactor.Omit map electron density of the phophopantetheine cofactor was produced by removing the phosphopantetheine group and submitting the final structure to a single round of simulated annealing refinement.The map is calculated with coefficients of the form Fo-Fc and contoured at 2.5σ.arm orientations and interaction interfaces of PCP with NRPS domains.A, orientations of phosphopantetheine arm to approach the neighboring active sites in adenylation domain (pink, PDB 3RG2), condensation domain (green, PDB 4ZXI), and SulM thioesterase (yellow, PDB 8W2C).Interfaces of carrier domain (red) while interacting with B. adenylation domain (pink, PDB 3RG2), C. condensation domain (green, PDB 4ZXI), and D. SulM thioesterase (yellow, PDB 8W2C), highlighting the orientation of the cofactor adopts to approach the neighboring active site.The PCP interactions with three catalytic domains employ different regions of the carrier domain.Supplemental Figure S6.Molecular Dynamics analysis of the SulTE protein.A) RMSD (root mean square displacement) plot of protein atoms, B) Radius of gyration (Rg) plot and C) Cα RMSF (root mean square fluctuation) plot for each residue calculated from the 500 ns simulation.Cylinder, arrow, and line represent the secondary structure elements of α-helix, strand, and loop respectively.Pink colored cylinders represent lid helix 1 and 2 (LH1 & LH2) of the thioesterase (TE) domain are labeled.Supplemental Figure S7.Molecular Dynamics analysis of the SulPCP-TE protein.A) RMSD (root mean square displacement) plot of protein atoms, B) Radius of gyration (Rg) plot and C) Cα RMSF (root mean square fluctuation) plot for each residue calculated from the 500 ns simulation.Cylinder, arrow, and line represent the secondary structure elements of α-helix, strand, and loop respectively.Green colored cylinders represent the PCP domain α-helices, while pink colored cylinders represent lid helix 1 and 2 (LH1 & LH2) of the thioesterase (TE) domain.Supplemental Figure S8.Structure-based sequence alignment of thioesterase domains from NRPS and PKS proteins.Green box shows alignment of "QCN" residues of SulTE domain with other TE domain sequences.Presence of "QCN" residues was only found in SulTE sequence.PROMALS3D server was used for alignment.(http://prodata.swmed.edu/promals3d/results/phpONqyVB.result.html)Supplemental Figure S9.Electrostatic potential of SulTE and SulM_PCP-TE structure.Positively charged substrate binding pocket at the center in SulM_PCP-TE (left) which likely favors binding with negatively charged tripeptide.The phosphopantetheine arm approaches positively charged substrate binding pocket through a tunnel in the SulTE domain (right) Supplemental Figure S11.Sequence alignment with SulTE domain homologous proteins.SulTE protein sequence was used as query for protein BLAST and sequences showing less than 70% sequence similarity were selected.Further screening was done by using structural features of SulTE domain.

Figure
-D-Orn-DAP DIWEVNTDDK * Stachelhaus code residues in adenylation domain of the last modules

Table S2 : Crystallographic Diffraction and Refinement Data.
a Values in parentheses are for high resolution shell