Characterization of a dual function macrocyclase enables design and use of efficient macrocyclization substrates

Peptide macrocycles are promising therapeutic molecules because they are protease resistant, structurally rigid, membrane permeable, and capable of modulating protein–protein interactions. Here, we report the characterization of the dual function macrocyclase-peptidase enzyme involved in the biosynthesis of the highly toxic amanitin toxin family of macrocycles. The enzyme first removes 10 residues from the N-terminus of a 35-residue substrate. Conformational trapping of the 25 amino-acid peptide forces the enzyme to release this intermediate rather than proceed to macrocyclization. The enzyme rebinds the 25 amino-acid peptide in a different conformation and catalyzes macrocyclization of the N-terminal eight residues. Structures of the enzyme bound to both substrates and biophysical analysis characterize the different binding modes rationalizing the mechanism. Using these insights simpler substrates with only five C-terminal residues were designed, allowing the enzyme to be more effectively exploited in biotechnology.

C yclic peptide macrocycles hold promise in pursuing challenging targets involved in protein-protein interactions implicated in diseases as diverse as cancer and antimicrobial infections 1 . Due to their constrained, pre-organized, and protease-resistant structures, these molecules can modulate key complex macromolecular interactions in a manner that has proven extremely difficult for conventional small molecules 1,2 . In contrast to most linear peptides, many peptide macrocycles are highly cell permeable 3 . Ribosomally synthesized and posttranslationally modified peptides (RiPPs) are a particularly attractive class of macrocycles because their enzymatic synthesis is driven by enzymes working in cascade to process a genetically encoded and highly variable peptide precursor 4 . The peptide precursor can be modified by macrocyclization, oxidation, heterocyclization, hydroxylation, and prenylation in a predictable and scalable manner 5 . The patellamide pathway is a paradigm in this system, in which catalysis and recognition are physically separated in many of the enzymatic steps leading to a unique combination of specificity and promiscuity 6 . The macrocyclase in the patellamide biosynthetic pathway (PatGmac) belongs to the subtilisin class of proteases, requiring a substrate with a Cterminal AYD motif, preceded by heterocyclized cysteine or a proline residue 7,8 . The enzyme is otherwise almost insensitive to the core sequence that becomes the macrocycle and only the thiazoline (or proline) are part of the final product, as the AYD is cleaved off during the reaction. This combination of specificity through the use of disposable tags (leader and/or tail sequences) and promiscuity in the core sequence produces a system that is almost infinitely variable. This has made RiPPs appealing for exploitation in biotechnology. In some RiPPs systems, a linker that can also be varied in both length and composition separates the recognition tag and core peptide 9,10 . Despite the appeal of their promiscuity, the PatG family of macrocyclases face a severe drawback as they are slow 11,12 , although in vitro addition of reductant does increase catalytic efficiency 13 .
In addition to PatG, there are four other broad classes of peptide macrocyclases 12, 14-16 that operate through an acyl enzyme intermediate. The sortase class of enzymes, which catalyze transpeptidation by recognizing a C-terminal LXPTG motif 17 , the butelase enzyme, which is an asparagine/aspartate (Asx) peptide ligase 18 , the NRPS thioesterases 19 and the prolyl oligopeptidase (POP) class of enzymes. A further important class of macrocyclases is that of the ATP-grasp superfamily, which as the name suggests rely on ATP hydrolysis to drive macrocycliation 20 . The enzymes that catalyze close to traceless peptide bond formation regardless of the peptide sequence-, i.e., only one residue from the precursor peptide recognition tag is carried over to the final cyclic product-are PatGmac family members, butelase, and POP macrocyclases. The POPB from Basidiomycete fungi such as Amanita bisporigera and Galerina marginata (GmPOPB) species have been reported as having k cat values comparable to butelase, the fastest rate observed for peptide macrocyclisation 15,21 . GmPOPB is the macrocyclase responsible for macrocyclization of amatoxins, eight amino-acid ribosomal peptides with the core sequence IWGIGC(N/D)P. Amatoxins are cyclic peptides further modified by a characteristic sulfoxide cross-link between tryptophan and a cysteine (Fig. 1a), and hydroxylation (the extent of which vary). The genomes of G. marginata and other amatoxin producing Amanita species possess more than 50 gene sequences annotated as AMA1 (amatoxin precursors) in which there is considerable diversity in the long Cterminal tail that follows the core sequence 15,22 . Amatoxins are the cause for the toxicity of Amanita and Galerina mushrooms. They are readily absorbed through the gut, and a lethal dose in adults is <10 mg 23 . Amatoxins are stable to inactivation by either the mammalian digestive tract or cooking, thus consumption of even small numbers of such mushrooms is often fatal. Amatoxin toxicity arises from its accumulation in the liver where it inhibits RNA polymerase II leading to irreparable liver failure 23 . The highly stable and potent toxicity of amatoxins has led to their exploration as warheads for targeted cancer therapy 24,25 . The amatoxin peptide precursor is produced as a 35 amino-acid linear substrate (Fig. 1b), which is first processed to a 25 residue peptide (25mer) by removal of the highly conserved 10 N-terminal amino-acid leader 26 that is discarded. The newly exposed Nterminal eight residues of the 25mer product are then macrocyclized and the tail, which is necessary for macrocyclization, is discarded (Fig. 1c). Remarkably, both proteolysis and macrocyclization steps are carried out by the same enzyme, GmPOPB 15 .
We report the functional and structural characterization of GmPOPB and establish the molecular features that determine whether the enzyme catalyzes proteolysis or macrocyclization. Informed by structural and biophysical studies, we have designed a much simpler substrate with fewer C-terminal residues that can be macrocyclized by GmPOPB at synthetically useful rates. The shorter substrate is more cost effective to produce by solid-phase chemical synthesis allowing the generation of more chemically diverse macrocycles, a valuable biotechnological tool.

Results
Structural biology of Apo and substrate-bound GmPOPB. The apo protein crystals belong to space group P2 1 2 1 2 1 , with one monomer in the asymmetric unit. The structure was determined at 2.4 Å resolution by molecular replacement using the βpropeller domain of the proline oligopeptidase from porcine brain (residues 82-450, PDB:1h2z) as search model. The refined apo model (PDB:5N4F) includes residues 7-222, 230-695, and 704-726, and the missing regions are presumed to be disordered portions of the protein. The protein contains two domains as observed in other POP enzymes 27 . The domain containing the putative catalytic residues (Ser577, Asp661, His698) comprises residues 1-81 and 450-728, and the other domain is a seven bladed β-propeller, comprising residues 82-449 (Fig. 1d). In the apo structure, the two domains are in an "open" conformation, in an arrangement reminiscent of a hinged lid on a bottle. This open conformation has been observed in other POPs in crystal form when free of ligand 28,29 . The catalytic serine sits at the tip of a loop and points toward the β-propeller domain ( Supplementary  Fig. 7). The side chain oxygen of Ser577 is 5.6 Å from the side chain carboxylate of Asp661; His698 is on a loop that is disordered. Ser577 and Asp661 of GmPOPB occupy the same position as Ser554 and Asp641 in the porcine proline oligopeptidase structure 30 .
In all 35mer and 25mer complexes, the enzyme adopts the same "closed" conformation in which the lid (the propeller domain) sits on top of the catalytic domain (Fig. 1e). In both 25mer complexes the N-terminal residues of the substrate (IWGIGCN) are disordered; in the S577A-35mer complex the N-terminal two residues (MF) are missing, while in the H698A complex only the first N-terminal residue is absent. There are no large differences between the protein backbone positions in the complexes with the 35mer and 25mer substrates (root mean square deviation (rmsd) of 0.48 Å over 720 Cα positions for the S577A structures). There are also no major differences between the H698A-35mer and S577A-35mer complexes, and between showing electron density (2Fo-Fc contoured at 1σ level in gray mesh) for peptide. g 25mer peptide bound to GmPOPB-S577A showing electron density for peptide (2Fo-Fc contoured at 1σ level in gray mesh). Supplementary Fig. 11 contains a stereo image depicting electron density maps for the 25mer and 35mer peptide complexes the S577A-25mer and D661A-25mer complexes. Table 1 shows the data collection and refinement statistics for all structures. The recognition tail adopts an identical distorted 3 10 helix conformation inserted into the middle of the β-propeller domain in both the 25mer and 35mer complexes (Fig. 1f, g and 2). The carboxyl terminus sits in a pocket where it makes water-mediated hydrogen bonds to the protein. To our surprise, there are only a few hydrogen bonds between the protein and the tail (Fig. 2c for the S577A-25mer complex and Fig. 2d for the S577A-35mer complex). Comparison to the apo structure reveals that binding of substrate induces no significant changes in the core of the propeller domain (rmsd of 0.75 Å over 360 Cα positions for the S577A structures). The changes that occur (relative to the apo structure) are in loops around the catalytic site.
Significant differences between the two substrates are observed for the residues of the linker region, since it occupies distinct conformations on the 25mer and 35mer complexes (Fig. 2b). V24 faces Arg79 and V14 is toward Phe506. H23 does not form any hydrogen bonds, while H13 is hydrogen bonding with Glu601. Tyr494 is within hydrogen bonding distance from E22, while on the 25mer complex Tyr494 is interacting with W9. This peptide twisting causes a tryptophan present in the peptides from both complexes (residue 19 in 35mer, 9 in the 25mer), C-terminal to the site of cleavage, to occupy a binding pocket close to the active site. The oxyanion-stabilizing Tyr496 is in close proximity to the core proline (P8) in the 25mer structure (D661A mutant), while Tyr496 hydrogen bonds with P18 in the 35mer complex (Fig. 2d).
Substrate residues I11-P18 (the core peptide is not seen in the 25mer, apart from weak density from P8 in the D661A-25mer complex) form a twisted loop, which makes contacts with the protein and within itself; atoms that will ultimately form the macrocycle are 7.6 Å apart (Fig. 2d). The core peptide interposes between substrate P18 and enzyme Ser577, which is over 8 Å away. Substrate P10, the site of proteolytic cleavage, is positioned for attack by Ser577 (the Cβ of the mutated residue is 3.3 Å from the carbonyl with plausible geometry). In the crystal structure of H698A-35mer complex, the hydroxyl of Ser577 is in hydrogen bonding distance from P10, in a position suited for nucleophilic attack but the structure is less ordered, notably the loop containing the mutation H698A. Tyr496 is on the opposite face of the carbonyl 2.8 Å from the oxygen and positioned to stabilize the tetrahedral intermediate from attack of Ser577 (Fig. 2d). Both the interaction and the role of Tyr496 are conserved in other POPs. In the 35mer complex structures, residues 2-9 adopt a helical arrangement that ends up exposed to solvent at the Nterminus and makes few contacts with the protein (Fig. 2d). In none of the GmPOPB structures that we have obtained are the catalytic residues arranged in the traditional manner, the closest approach of the His698 and Ser577 is 13 Å and residues block simple movement ( Supplementary Fig. 7). To confirm the importance of the putative catalytic triad for GmPOPB, the mutants S577A, D661A, and H698A were evaluated for activity, and were inactive with both 25mer and 35mer substrates, using 5 μM GmPOPB and monitoring the reaction progress after 24 h.
Mutations in the histidine loop decrease enzyme activity. To study other residues involved in hydrolysis and macrocyclization, additional mutants H698N, R663A, R663Q, R663K, and W695Δ (deletion) were generated. These mutants were designed based on comparison of sequence alignments between other POP enzymes and the very similar POPA enzyme from G. marginata, enzymes that solely act as proteases. Arg663 is highly conserved in POPs and thought to play a role in catalysis or substrate binding, since it makes hydrogen bonds with the peptide substrate 31   depicts the position of these residues in GmPOPB. H698N was insoluble and not evaluated. The other mutants possessed diminished activities for both peptide bond hydrolysis and macrocyclization. The amount of cyclic peptide present after incubation for 16 h with the 25mer substrate was R663Q > R663A > W695Δ > R663K. When the 35mer was used as substrate, the mutants demonstrated diminished activity for peptide bond hydrolysis and almost undetectable activity for macrocyclization ( Supplementary Fig. 9c).
Kinetic characterization and substrate scope of GmPOPB. Previous analysis employed GmPOPB isolated from the G. marginata mushroom after transformation with Agrobacterium tumefaciens 15 . We examined kinetic parameters and performed a substrate specificity study on the enzyme isolated employing a bacterial overexpression system. Substrates tested are shown in Supplementary Fig. 1. Our results on the native overexpressed enzyme confirm the previous findings 15 obtained for protein purified from mushroom that the full-length 35mer substrate is cleaved and the resulting 25mer is released ( Supplementary  Fig. 3). The kinetic data for expressed protein with the 25mer but not 35mer have been previously reported 15,21 . The 25mer then rebinds (in competition with the 35mer) for macrocyclization. Cleavage and macrocyclization do not occur in a single binding event 15 . The 25mer accumulates as an intermediate although the proteolysis reaction is slower than cyclic peptide formation ( Supplementary Fig. 2). Very similar values for K m and k cat were obtained for all full-length substrates evaluated, with K m values ranging from 8 to 51 μM, while k cat was between 3.2 and 35 min −1 (Supplementary Fig. 2  effect either on kinetic parameters or yield of cyclic product. Less conservative substitutions such as mutation to alanine or 9mer core (IWGIGCANP the bold underlined residue represents the insertion) led to reduced macrocyclization and increased linear peptide, the product of peptide hydrolysis instead of macrocyclization ( Supplementary Fig. 8).
Equilibrium binding of substrates and products. Binding of the inactive mutant S577A to the 25mer, 35mer, a series of truncated substrates (10mer-14mer), as well as the recognition sequence (17mer) WTAEHVDQTLASGNDIC, the truncated recognition sequences VDQTLASGNDIC and TLASGNDIC, and the leader peptide MFDTNATRLP were evaluated by isothermal titration calorimetry (ITC). The results of S577A with both the 25mer and recognition sequence have previously been reported 21 .
Binding of H698A to the 25mer was also measured. The only peptide showing no detectable binding at concentrations up to 1 mM was the 10-residue leader peptide MFDTNATRLP. The full-length substrates and products displayed tight binding (K d-25mer-S577A = 67 ± 14 nM 21 , K d-25mer-H698A = 47 ± 11 nM, K d-35mer = 120 ± 30 nM, K d-recognition = 430 ± 10 nM 21 ; binding is dominated by enthalpic contributions (Supplementary Fig. 6b) (Fig. 3a shows representative ITC traces for the 13mer and 14mer substrates, Fig. 3b shows K d values for all peptides evaluated, Supplementary Fig. 10 shows raw data for all binding curves). The inactive mutant H698A has identical K d-25mer to the S577A mutant suggesting the lack of activity results from catalytic incompetence rather than disruption of substrate binding. Interestingly, despite being longer and having the potential for more interactions with the protein, the 35mer peptide shows slightly weaker binding compared to the 25mer, mostly due to decreased ΔH. A comparison of the complex structures shows that in the 35mer complex there is disorder of side chains in the segment TAEHVD (linker region) but not in the 25mer. To investigate the role of recognition tag peptides corresponding to the entire recognition sequence (linker plus tail, WTAEHVDQ-TLASGNDIC-17 residues), the recognition tail plus the valine from the linker (VDQTLASGNDIC 12 residues) and the highly conserved portion of the tail (TLASGNDIC 9 residues) were  Table 3; Supplementary Figs. 6 and 10). Previously, we showed that the recognition sequence dominates binding, with a difference in ΔG of only 1.34 kcal mol −1 between the 17mer recognition sequence and the 25mer peptide 21 . To explore how much contribution to the binding energy comes from the linker region, we evaluated the binding of truncated recognition sequences. Our data show that the linker region is important in binding as the loss of the linker (shrinking the recognition sequence from 17 to 12 amino acids) reduces binding affinity 20-fold. On its own, the highly conserved nine-residue tail bound rather weakly, consistent with the few interactions observed with the protein. Following from this finding, a series of truncated peptides (core plus parts of the linker) were tested and revealed a trend in which binding affinity increased from 10mer to 13mer (K d-10mer = 83 ± 17 μM, K d-11mer = 39 ± 18 μM, K d-12mer = 21 ± 5 μM, K d-13mer = 2.4 ± 0.1 μM) peptides, but decreased slightly with the 14mer peptide (K d-14mer = 9.5 ± 1.1 μM) (Fig. 3b); the 9mer was not sufficiently soluble for analysis. We noted that the difference in affinity between the 35mer and the core plus linker (13mer) was~20-fold.

Discussion
GmPOPB can form and hydrolyze peptide bonds depending on the substrate length and structure. Both reactions proceed by similar chemical mechanisms, passing through an acyl enzyme intermediate. Typical POPs catalyze peptide bond cleavage following a proline, and less efficiently an alanine, showing strong preference for substrates shorter than 30 amino acids 32 . GmPOPB is unusual in that it processes a 35 amino-acid substrate, the longest observed for a POP. POP enzymes possess an aspartate, histidine, and serine catalytic triad. Molecular dynamic simulation studies comparing the porcine and bacterial POPs have proposed a mechanism in which inter-domain "breathing" is required for catalysis 33 . Although structures of apo and POPs bound to short peptide substrates and inhibitors (ranging from 2 30 to 7 amino acids 34 ) are available, no substrate complex with a long peptide has previously been determined. Interestingly, in GmPOPB complexes the residues of the presumed catalytic triad are not aligned. His698 is 12 Å away (much further than in any other POP, Supplementary Fig. 7) from Ser577 but is essential for catalysis rather than binding. The enzyme-substrate complex shows that, apart from His698, there is no other residue in proximity to the active site capable of acting as general acid/ general base with pK a values near to the 8.0 measured from kinetic analysis 21 . The domain breathing motion similar to, but larger than, other POPs could correctly position the His698. Mutations of the highly conserved Arg663 and deletion of Trp695 (an insertion relative to other POPs, Supplementary Fig. 9a), which we predicted would affect loop structure and dynamics, were severely compromised in activity with both 25mer and 35mer substrates, consistent with our prediction. We cannot, however, exclude the possibility that His698 may be required for positioning toward a productive conformation of the enzyme-substrate complex rather than act as a base per se. The histidine loop is disordered on the H698A mutant structure and the H698N mutant is insoluble, hinting at a stabilizing role for His698. Analogous to lipid acyl hydrolases 35 , the enzyme would function with a catalytic dyad in which Ser577 is activated by a water molecule bridged to Asp661. Comparison of the three-dimensional structures of the enzyme in complex with 25mer and 35mer substrates reveals only minor rearrangements of the protein, mostly in loops that accommodate the longer substrate. Both 35mer (proteolysis) and 25mer (macrocyclization) substrates bind to GmPOPB with high affinity driven mainly by enthalpy (Fig. 3b, Supplementary Fig. 7; Supplementary Table 3). Consistent with the observation of similar binding affinities of the 25mer and 35mer (K d 67 and 120 nM, respectively) 21 , the 10-residue leader (only present in 35mer) does not bind. In both the 25mer and 35mer complex structures, the recognition tail (C-terminal 11 residues) is embedded deeply into the β-propeller domain in an essentially identical arrangement. The linker region, however, adopts very different arrangements in the two complexes, thus its interactions with the protein are quite distinct in the two structures (Fig. 2). ITC measurements show that the linker region, particularly the portion following the core peptide, makes substantial contribution to the binding energy. This is in contrast to the heterocyclase class of RIPP enzymes, where the linker plays no role and can be varied 36 . Our data show that the structure of the linker is important in binding and determines the orientation of the substrate at the active site (thus its fate).  Fig. 4 GmPOPB catalyzes peptide bond hydrolysis and macrocyclization. a The sequence of events catalyzed by GmPOPB starting with the 35mer peptide substrate. Peptide bond hydrolysis yields the leader sequence (which dissociates) and the 25mer peptide. The enzyme-bound 25mer is conformationally trapped and macrocyclization cannot proceed. The 25mer must dissociate to rearrange and re-bind only then does macrocyclization occur. Apo enzyme is shown in gray, the enzyme functioning as a protease is colored green, and functioning as a macrocyclase in blue. The peptide is colored as Fig. 1b. b A 25mer substrate yields a 17 amino-acid by-product, whereas the 13mer substrate generates a 5 amino-acid by-product, a much more economically efficient reaction. Substrates are colored as Fig. 1b Previous kinetic assays and ours reveal that after removal of the leader, the remaining 25mer is released from GmPOPB, then it rebinds and undergoes the macrocyclization reaction 15 . In the 35mer complex, the core and linker adopt a tightly packed arrangement that is wedged between the active site loops. We conclude that the linker and/or core are unable to refold to the conformational arrangement seen in the 25mer complex (required for macrocyclization) in situ on a timescale comparable to dissociation. We propose the arrangement of and interactions between C-terminal 25 residues and protein that are seen in the 35mer complex act as a kinetic trap, which can only be escaped by dissociation (Fig. 4a). Conformationally trapped peptide reaction intermediates have been identified in other systems. For example, inhibition of proteases by serpins is accomplished by a suicide substrate mechanism, in which the complex is trapped in an inactive arrangement 37 .
Having identified the key role of the linker, we predicted that it should be possible to design simpler macrocyclization substrates that lacked the recognition tail. This would be valuable since the use of 25mer substrates to make eight residue macrocycles is not economic. ITC shows a 10-fold reduction in binding from 35mer to 13mer, kinetic analysis reveals the 13mer substrate has a K m (25 μM) within error to the 25mer substrate (50 μM), while the 14mer possesses higher K m (380 μM). Similar k cat values were observed with both shorter substrates (0.49 and 0.58 min −1 for the 13mer and 14mer, respectively, Fig. 3c; Supplementary Table 2) but these are smaller than the 25mer (18 min −1 ). Linear peptide (product of hydrolysis instead of macrocyclization) was observed when shorter peptides were utilized as substrates (Fig. 3d) consistent with the linker playing a key role in substrate positioning. After 16 h of reaction both 13mer and 14mer substrates produce similar amounts of macrocycle, but the 14mer generates less linear product. Linear peptide produced this may not be a significant drawback, as purification of macrocycles from liner peptides is straightforward 12,38 . Compared to PatG this represents a significant improvement, since biocatalytic reactions with PatG in vitro can require over 7 days and utilize up to stoichiometric amounts enzyme 12,38 .
GmPOPB is an unusual enzyme catalyzing, depending on substrate length, proteolysis, or macrocyclization using the same catalytic machinery (Fig. 4a). Further complexity comes from the fact that GmPOPB itself generates its 25 residue macrocyclization substrate. The internal structure of the substrate is critical to how the enzyme binds the substrate and to which reaction is catalyzed. As a consequence of this requirement for a specific substrate structure, the enzyme must release the 25mer peptide, allowing it to refold rather than simply moving to the second reaction in a processive manner. Previous work had identified the crucial nature of the recognition sequence in the substrate, but suggested its full length was a requirement for macrocyclization. Our structural work supported by calorimetry and kinetics reveals that shorter peptides are suitable substrates, if their design preserves important interactions with the protein and maintains the peptide structural recognition. GmPOPB recognizes residues within the linker connecting the core and the recognition tail, and this recognition is critical to position the substrate for macrocyclization. A substrate with five or six C-terminal residues (as opposed to 17) chosen to mimic the linker can be efficiently macrocyclized at synthetically useful rates (Fig. 4b). This work highlights the power of structural and mechanistic studies to redesign substrates or enzymes for use in biotechnology.
Note added in proof: Since the submission of this manuscript two papers were published studying POPs, further demonstrating the importance of this class of macrocyclase enzymes. One reports the structure of the related PCY1 enzyme 55 and the other discusses broadening of the substrate profile of GmPOPB 56 .

Methods
Materials. Peptides were purchased from Biosynthesis, as free amine and free carboxylic acids, at a purity >90%. Buffers and chemicals, unless specified, were from Sigma.
Expression of recombinant proteins. The plasmid pJExpress414 encoding the codon optimized G. marginata POPB gene was purchased from DNA 2.0. Plasmids were transformed into BL21(DE3) cells (Agilent). Cultures (50 mL) were grown overnight at 37°C in the presence of 100 μg/mL ampicillin, then diluted 100-fold into 6 L Terrific Broth (TB) media. These cultures were grown at 37°C with shaking (200 rpm) until the optical density at 600 nm (OD600) reached 0.6. Cells were cooled down for 1 h to 16°C, and protein expression was then induced by the addition of 0.5 mM isopropyl β-D-thiogalactopyranoside (IPTG, Generon). Cultures were incubated for an additional 16 h and centrifuged at 6000×g for 15 min. Cell pellets were resuspended in 250 mL Ni-NTA lysis/wash buffer A (50 mM HEPES pH 8.0, 300 mM NaCl, 10 mM imidazole, 10% glycerol, and 2 mM β-mercaptoethanol) supplemented with complete EDTA-free protease inhibitor tablets (Roche Applied Science). The resulting suspension was lysed by two passages through a cell homogenizer at 30,000 psi, and purified by nickel chromatography. Each desired protein was eluted using a step elution with lysis buffer supplemented with 250 mM imidazole (buffer B). Eluted protein was dialyzed overnight against buffer C (50 mM HEPES (pH 8.0), 50 mM NaCl, 10% glycerol, and 2 mM β-mercaptoethanol) while simultaneously the His-tag was cleaved by TEV protease (prepared in house 39 ). This dialyzed TEV-cleaved mixture was loaded onto a Histrap column connected in tandem to a Hitrap Q-FF column. Both columns were washed with buffer C, and GmPOPB was eluted during this wash. Fractions were pooled and concentrated to <8 mL (at 10 mg/mL approximately). Protein was loaded onto a Superdex S200 gel filtration column (GE Healthcare) pre-equilibrated with storage buffer D (50 mM HEPES (pH 8.0), 50 mM NaCl, 10% glycerol, and 2 mM β-mercaptoethanol). Fractions containing pure protein were combined, concentrated, divided in aliquots, flash frozen, and stored at −80°C. Protein concentrations were determined by absorbance at 280 nm 40 .
Site-directed mutagenesis. Mutants S577A, D661A, H698N, R663A, R663Q, R663K, and W695Δ were generated by a published mutagenesis protocol 41 . Oligonucleotides for mutagenesis were purchased from IDT. Sequences of primers used for mutagenesis and sequencing are given in Supplementary Table 1. Sequencing was performed using at least three primers to cover the entire gene sequence (Eurofins).
General procedure for kinetic assays. Comparison between distinct substrates was performed in 50 mM Tris pH 8.0, 50 mM NaCl, 10 mM DTT with varying concentrations of substrates at room temperature. All reactions were performed in duplicates. Reactions were started by adding GmPOPB (50 nM for GmAMA1_C6S, 1 μM for the 13mer and 14mer, and 20 nM for other substrates) to the assay mixture containing buffer and peptide. Reactions were quenched at several time points by adding 50 μL reaction mixture to 20 μL 6% TFA. Reactants were separated from products for quantification by injecting 50 μL of each quenched time point mix onto a ZORBAX SB-C18, 5 µm, 9.4 × 50 mm (Agilent) column connected to an Agilent LC-MS (G6130B Single Quad, Agilent Technologies). Reactants were separated from products using a gradient from H 2 O containing 0.1% TFA or 0.1% formic acid and 5% acetonitrile to 50% acetonitrile, at 1.5 ml/min for 8 min. Peaks with ultra violet (UV) absorbance at 220 and 280 nm were integrated, the area of peaks corresponding to reactant and products was used to calculate the percentage of product formed after a correction for differences in the extinction coefficient of each peptide was applied (ε 280-25mer = 11,000 M −1 cm −1 , ε 220-25mer = 46,000 M −1 cm −1 , ε 280-cyclic = 5500 M −1 cm −1 , ε 220-cyclic = 34,000 M −1 cm −1 , and ε 280-tail = 5500 M −1 cm −1 ). The sum of product +substrate was assumed equal to the total initial amount of substrate, product converted from % to concentration. This value was divided by concentration of enzyme present to yield v/E t (min −1 ). When enzyme mutants and peptides containing alanine in the core sequence were tested for activity, higher concentrations 5 μM enzyme and 100 μM substrate were incubated for 1 and 18 h at room temperature. For progress curves with the 35mer substrate, measurements were triplicate and quantification relied on ion counts from mass spectrometry. Mass signals corresponding to 35mer (1282.9 Da-M+3H), 25mer (900.7 Da-M+2H), leader peptide (1165.5 Da-M+H), recognition sequence (930.4 Da-M+2H), cyclic peptide (841.3 Da-M+H), linear peptide (859.4 Da-M+H) were monitored, the area of each was integrated and quantified using a calibration curve performed with the 25mer, 35mer, cyclic, and linear peptides as standards. Authentic cyclic peptide was quantified by UV absorbance. Data showing products formed after 1 and 16 h with truncated peptides were performed twice. UV and ion count approaches gave similar results for the 25mer. Kinetic data were fitted to a Michaelis-Menten equation using GraphPad Prism, and values reported are average and standard error of the mean.
Isothermal titration calorimetry. All titrations were performed on a MicroCal PEAQ-ITC instrument (MicroCal, Malvern Instruments, Northamption, MA, USA) and the results were fitted with PEAQ-ITC analysis software (MicroCal, Malvern Instruments, Northampton, MA, USA). Peptide ligand solutions were prepared in 20 mM Tris pH 8.0 containing 1 mM DTT, prior to buffer exchange by three cycles of dilution in 50 mM Tris pH 8.0 with 50 mM NaCl, 10 mM DTT followed by concentration using a Microsep Advance centrifugal device equipped with a 1 kDa cut off membrane (Pall Corporation). The same three cycles of dilution in 50 mM Tris pH 8.0 with 50 mM NaCl and 10 mM DTT followed by concentration were performed with the protein to be used in the titration using a Vivaspin protein concentrator spin column with a 30 kDa cut off (GE Healthcare). A final dilution to the concentration to be used for titration was performed using the buffer that passed through during the protein buffer exchange, both for the protein and peptide to be used to avoid any possible buffer mismatch. The stirred cell contained 300 μL of protein (the inactive mutant GmPOPB_S577A at 20 μM for 35mer, 36 μM for 10mer, 36 μM for 11mer, 29 μM for 12mer, 42 μM for 13mer, 29 μM for 14mer, 37 μM for 9mer recognition sequence, 21 μM for 12mer recognition sequence), and the injection syringe contained 75 μL of peptide ligand (200 μM for 35mer, 924 μM for 10mer, 761 μM for 11mer, 484 μM for 12mer, 442 μM for 13mer, 582 μM for 14mer, 1 mM for 9mer recognition sequence, 677 μM for 12mer recognition sequence). Titrations of peptide into protein solutions were conducted at 20°C. For all the titration experiments, a total of 19 injections of 2 μL were made at 120 s intervals. The heat released due to the first injection (0.4 μL) was excluded from data analyses. Binding data with the H698A mutant were performed by titrating enzyme (319 μM stock) into 25mer peptide (27 μM). Blank runs in which peptide (or H698A) was titrated into buffer were performed to correct for the heats of dilution and mixing, and the dilution isotherm for each peptide ligand was subtracted from the respective binding isotherm prior to curve fitting. Equilibrium dissociation constants (K d ) as well as ΔH and ΔS values for binding of each peptide to protein were obtained by fitting the calorimetric data with a single-site model using the stoichiometry parameter n fixed at 1.0 using Malvern PEAQ-ITC data analysis software. The ITC data for S577A with both the 25mer and recognition sequence (17mer) have previously been published 21 . We performed all ITC binding experiments at least in duplicate, and calculations of average and standard error of the mean were performed with GraphPad Prism.