The structural biology of patellamide biosynthesis

The biosynthetic pathways for patellamide and related natural products have recently been studied by structural biology. These pathways produce molecules that have a complex framework and exhibit a diverse array of activity due to the variability of the amino acids that are found in them. As these molecules are difﬁcult to synthesize chemically, exploitation of their properties has been modest. The patellamide pathway involves amino acid heterocyclization, peptide cleavage, peptide macrocyclization, heterocycle oxidation and epimerization; closely related products are also prenylated. Enzyme activities have been identiﬁed for all these transformations except epimerization, which may be spontaneous. This review highlights the recent structural and mechanistic work on amino acid heterocyclization, peptide cleavage and peptide macrocyclization. This work should help in using the enzymes to produce novel analogs of the natural products enabling an exploitation of their properties. The paper demonstrates the power of the in vitro approach to production of analogues of patellamide. By re-engineering PatE (substrate) the slow PatA can be eliminated from the process. It also reports the oxidation of both linear and macrocyclic substrates. This paper reports a cell free system to produce novel patellamide analogues. Interestingly the heterocyclase was shown to be activated in trans by addition of exogenous leader peptide. This extends in vivo processing efforts to make diverse analogues using the trunkamide enzymes.


Introduction
Small molecules with potent biological properties are commonly isolated from bacteria. Analysis of well established rules for anthropogenic pharmaceuticals [1] indicates that in comparison, natural products tend to have more stereo centres and nitrogen atoms but are, in general, not wildly different [2,3]. Natural products that have some undesirable properties as medicines, but they can be improved by chemical modification. Natural products, including modified variants, are a major component in the pharmaceutical armory for treating diseases ranging from bacterial infection to cancer to immune suppression [4]. Synthetic biology promises, amongst other deliverables, the ability to tailor enzymes in biosynthetic pathways to create natural product variants with desirable properties in the quantities required for drug development [5].
Peptide based natural products are particularly attractive from a chemical point of view; amino acids share a common standard in connectivity (the amide bond) with an almost infinitely configurable element (the side chain). One can thus 'dial' in chemical and structural properties into a shared basic design. Crucially, in the same way as no two proteins need to share any biological property, peptides by virtue of their different side chains can have divergent properties. The cyanobactin family of ribosomally synthesized and post-translationally modified peptides (RiPPs) are peptide macrocycles that exemplify this diversity, ranging from six to over 20 residues, with divergent sequences and very different biological properties including P-gp inhibition, cytotoxicity, immunomodulation, antifungal, antibacterial and antiviral properties [6]. Macrocyclic peptides are particularly appealing as they are intrinsically resistant to protease degradation and several cross membranes [7]. There is significant interest in their biosynthesis with a view to exploiting this class of molecule for novel drugs [8 ,9,10]. In this review, we use the example of the patellamide (an eight-residue macrocyclic cyanobactin) [11] to structure our discussion. In this pathway a single ribosomal seventy one-residue precursor peptide, PatE, gives rise to two eight-residue macrocycles; patellamide A and patellamide C [11] (Figure 1a). PatE contains two eight-residue core peptides and each is converted to the corresponding patellamides ( Figure 1b) by enzyme action. Within PatE each core peptide is flanked by a conserved five-residue protease signature (N-terminal), a conserved three-residue macrocyclization signature (Cterminal) and in addition PatE has a thirty seven-residue leader peptide at N-terminus [11] (Figure 1a). During synthesis the peptide bonds between the core peptide and its flanking regions must be cut and the ends of the core peptide joined (Figure 1b). The final products contain oxazolines (derived from the heterocyclization of serine/ threonine) and thiazoles (oxidized form of thiazoline which is derived from the heterocyclization of cysteine) (Figure 1b). The two residues adjacent to the thiazoles are epimerized from an L-configuration to D-configuration (Figure 1b). The functions of the enzymes that convert PatE into patellamides were first assigned by sequence analysis (Figure 1c) [11]. There are many closely related pathways in other marine organisms, such as the trunkamide and patellin ( Figure 1b) biosynthetic pathway, that utilise biosynthetic enzymes which are very similar to those of the patellamide pathway [6,9,11]. Moreover, many of the chemical reactions that are catalyzed by enzymes in the patellamide pathway occur in the biosynthesis of other natural products that are unrelated to cyclic peptides. We herein discuss insights from these other pathways under the corresponding chemical transformations observed in the patellamide system.

Heterocyclization
The heterocyclase (or cyclodehydratase) class of enzymes modify cysteine residues (and in some cases serine and threonine residues, too) within the context of the core peptide(s), to create thiazolines (or oxazolines) ( Figure 1b) and eliminate water [12]. Patellamides A and C contain four heterocycles each and these five membered rings profoundly change the chemistry and flexibility of the peptide [13]. Their selective introduction is a powerful tool to tune the molecular shape and activity of not only peptide macrocycles, but also linear peptide natural product families such as the antibacterial microcin peptides [14]. The microcin 'Trojan horse' antibiotic peptide MccC7 has a C-terminal phosphoramidate that is linked to adenosine. During the biosynthesis of the peptide an intermediate with a C-terminal five membered succinamide ring is synthesized by heterocyclization of the C-terminal Asn  (a) The patellamide gene cluster contains genes patA to patG, coding for proteins PatA to PatG. The PatE protein contains: (1) a 37 residue leader and (2) two core peptides, which are processed to give patellamides C and A [11]. The core peptides are flanked at their N and C termini; (b) the final natural product patellamides C and A showing the chemical transformations during post-translational processing. The closely related patellin which is prenylated is shown; (c) the enzymes which tailor the core peptide. PatG encodes two functions and in addition has a DUF. PatA encodes the protease and also posses the same DUF.
residue [15]. The structure of the enzyme responsible, MccB, was revealed to have two domains. The larger of the two (the C-terminal) adopts a classic adenylase superfamily fold whilst the fold of the smaller (N-terminal) was novel [15]. MccB activates the substrate peptide carboxy terminus by first converting it to an adenylate then the amide of the side chain of Asn displaces AMP to give the succinamide five membered ring. MccB catalyzes a second adenylation reaction that adds adenosine to the peptide. Domain 1 of MccB acts as a clamp holding the peptide substrate for processing. More recently thiazole/thiazolinecontaining (modified) microcins [16], collectively known as TOMMs have been studied [17 ,18 ]. Enzymatic characterization revealed that the catalytic activity required ATP and that full activity was only observed with two proteins BalhC and BalhD, although BalhD was shown to catalyze the reaction on its own, albeit very slowly [17 ]. Biochemical data established that the protein functioned as a kinase, phosphorylating the hemiorthoamide intermediate formed by cysteine attack at the preceding carbonyl [17 ]. The formation of hemiorthoamide intermediates is commonplace in food chemistry where thiazolines and other five membered rings occur in heated proteins [19]. By analogy then, it could be that the irreversible step is actually the removal of water (either by heat or by formation of a phosphorous oxygen bond followed by elimination of phosphate) and that the reversible formation of hemiorthoamides in a protein may in energetic terms be a relatively low barrier process.
In the patellamide pathway, the enzyme PatD catalyzes the formation of multiple thiazolines (from cysteine) and oxazolines (from serine and threonine); the closely related enzyme TruD (from the trunkamide pathway) only processes cysteines with both enzymes operating in an ATP dependent and processive manner [20,21]. It has been shown that selenocysteine (a rare but known amino acid) is processed to a selenazoline by both PatD and TruD [22], making it possible that natural products could be discovered that contain selenazoline or selenazole modifications. The crystal structure of TruD [23 ] shows it to be a three-domain protein ( Figure 2a). Domains 1 and 2 are, in structural terms, essentially identical to the MccB protein [15,23 ] with both proteins sharing a structural zinc bound in the second domain. BalhC has sequence homology to these two domains, whilst BalhD shares homology with the third domain of TruD, which is structurally novel [23 ]. Thus PatD-like enzymes can be considered fusions of BalhC and BalhD.
NMR and assay data show that TruD produces AMP and pyrophosphate (PPi), the hall mark of an adenylase or less commonly a pyrophosphorylase mechanisms [23 ] ( Figure 2b). However, the loops that form the ATP binding site in domain 2 of MccB [15] are absent in the TruD structure (and in PatD-like sequences more generally) [23 ], precluding ATP binding in the manner seen in other adenylating enzymes [23 ]. An adenylase rather than a kinase mechanism was proposed as the most likely for TruD with the data in hand [23 ].
TruD was shown to have a preferred order of reactivity, proceeding from the C-terminus [23 ]. The processivity of the enzyme required the leader peptide to be attached to the core peptide and similar observations had previously been made for the BalhC/D system [24 ,25 ]. All the heterocyclase enzymes are promiscuous and tolerate diversity in chemical and structural nature of amino acids, which flank the target cysteine (or threonine, serine) [20,21,24 ,25 ].
A very recent paper has reported the structure of the YcaO domain from Escherichia coli with a bound nucleotide [18 ] (Figure 2c). The product and substrate of YcaO are unknown but the YcaO domain has the same structure as the third domain of TruD and shares sequence homology with BalhD [18 ]. The residues that ligate the nucleotide are also conserved, essentially ruling out the adenylase domain of TruD as the nucleotide binding site [18 ] and disfavouring an adenylase mechanism for TruD. The key chemical step in both TruD and BalhD is the irreversible modification of the hemiorthoamide and it is possible, but perhaps unlikely, that one operates by a pyrophosporylation and the other by a kinase mechanism (Figure 2b). It has been pointed out that adenylation and phosphorylation share chemical similarities [26] and in one case similar enzyme folds [27]. It is worth noting that in the absence of substrate, YcaO produced AMP and PPi [18 ], suggesting more work is needed to resolve the mechanism(s) in detail.

Peptide cleavage and macrocyclization
In the patellamide pathway, two domains, one from the PatG protein and one from the PatA protein, possess sequences matching the classic subtilisin fold [11]. However, PatA catalyzes the cleavage of the precursor peptide at the N-terminal protease site, removing the leader peptide ( Figure 1a). The PatG domain catalyzes the formation of the macrocyclic peptide (simultaneously removing the C-terminal protease signature) [11]. The structure of the protease domain from PatA without substrate (Fig. 3ai) was determined independently by two groups [28 ,29 ]. The apo structure of a close PatA homolog, PagA, was also reported [29 ]. PatA possesses a classic serine protease fold but in both native PatA structures, the catalytic triad is perturbed from the classic arrangement [28 ,29 ]. In a PatA variant (catalytic serine mutated to alanine) [28 ] and in PagA [29 ] the key histidine of the triad adopts its conventional conformation. It would seem unlikely that mis-setting of the catalytic triad in PatA is a consequence of the lack of the C-terminal domain, as similar protease activity is observed in both forms [29 ]. This leaves two explanations for the misalignment: either crystal packing or missing protein-protein interactions (from the substrate). Unfortunately the PagA structure, which has the correct constellation of active site residues, does not resolve the argument, as it has extensive crystal contacts around its active site which could mimic some protein-protein interactions [29 ]. Cleavage of PatE by PatA occurs at a very slow rate (orders of magnitude slower than subtilisin) [28 ,29 ] and it has been suggested a slow protease might be important in helping regulate the timing of the multiple chemical steps during processing [29 ].
The crystal structure of the macrocyclase domain of PatG (Figure 3aii), determined by two laboratories [29 ,30 ], shows it does indeed possess a subtilisin fold (thus structurally similar to PatA [28 ,29 ]). PatG does however possess a unique and additional structural feature: two helices that are inserted between a helix and a strand of the subtilisin fold (Figure 3b). These helices, which have been termed as the 'capping' or 'macrocyclization' insert, sit directly above the catalytic site. Sequence alignment shows that the entire macrocyclase superfamily has an insertion in this region but neither length nor the sequence of the insertions have any obvious conservation. For example, the insertion in one ortholog (TenG) is eight residues longer whilst another (AnaB90) is four residues shorter than PatG [30 ]. As yet, no     [23 ]. Domains 1 and 2 are very similar to a known adenylase MccB [15] and to BalhC from the TOMM pathway. Domain 3 has a novel structure and is related in sequence to BalhD in the TOMM pathway [17 ]; (b) heterocyclization of amino acids (cysteine X = S, selenocysteine X = Se, serine/threonine X = O) to yield azoline rings. TOMMS were first predicted to go through a hemiorthamide intermediate (circled). It is likely that in the patellamide biosynthetic pathway, heterocyclization goes through the same central intermediate. In  ) [23 ]. Either mechanism would be irreversible and would promote elimination yielding the azoline. There is a third possibility, pyrophosphorylation (Y = P 2 O 7 4À , Z = AMP), which although rare, has precedent and is consistent with much of the published data; (c) a cartoon representation of the structure of the YcaO domain [18 ] from E. coli has located the nucleotide binding site. This domain is boxed and is structurally similar to domain 3 of TruD [23 ], shown in Figure 2a. The binding of ATP to domain 3 disfavors the adenylase mechanism previously proposed [23 ]. The protein is colored as Figure 2a, the nucleotide analog AMP-PNP is shown in space fill with carbon atoms in green, nitrogen atoms in blue and oxygen atoms in red.
relationship between the insertion and the size or composition of the macrocycle has been reported. Protease substrate residues are conventionally labelled from the N to C terminus as P5 P4 P3 P2 P1-P1 0 P2 0 P3 0 P4 0 P5 0 . In general, subtilisins bind their substrates as a b-strand with the protein-protein recognition occurring N-terminal to the scissile bond (P1-P1 0 ), that is, residues P5 to P1 of the substrate peptide; this is exemplified by PatA (Figure 1a). This P5 to P1 binding cleft is blocked in PatG by changes in the loop structure and insertion of 116 Catalysis and regulation    large hydrophobic residues; these changes are conserved in PatG homologs [30 ]. As a result PatG actively prevents substrates from binding with an extended b-strand conformation for residues P5 to P1 [30 ]. The crystal structure of a complex between an inactive PatG mutant ( Figure 3c) and a peptide substrate (P8-VPAPIPF-PAYDG-P4 0 ) has been determined [30 ]. The N-terminal region of the substrate curves away from the protein active site (P8-P6 (VPA) are disordered). The P5 (P) and P4 (I) residues do not make any contact with the protein, the P3 (P) residue makes only a few contacts and the aromatic side chain of residue P2 (F) sits in a shallow pocket. The lack of contact is consistent with the promiscuity of the macrocyclase domain since it is largely insensitive to the identity of residues within the core peptide [31]. The complex reveals a number of very specific interactions between the C-terminal region of the substrate P1 0 P2 0 P3 0 (AYD) and residues from the macrocyclization/capping insert [30 ]. PatG stands as a mirror opposite of subtilisin where recognition is Nterminal and promiscuity C-terminal. For a PatG substrate, there is one additional requirement: a proline residue or a thiazoline at P1 [31]. The PatG substrate complex shows that the proline of the substrate adopts a cis configuration of the P2-P1 substrate bond [30 ]. A trans peptide (b-strand) configuration would result in clashes with the macrocyclase due to the conserved insertions and thus only substrates capable of adopting a cis (proline) or cis-like (thiazoline) configuration will bind to PatG, rationalizing the requirement at P1. PatG does form an acyl enzyme intermediate with its peptide substrate [30 ] and a mechanism in which this intermediate is decomposed by attack of the free amino terminus of the core peptide has been proposed ( Figure 3d). An elegant study has shown that the PatG acyl enzyme intermediate can be decomposed with short peptides [29 ]; in essence demonstrating that PatG can function as a true transpeptidase. The rate of macrocyclization by PatG is slow [29 ,30 ], but this does not necessarily imply the enzyme is a poor catalyst; the uncatalyzed rate for peptide macrocyclization has not been reported and in our hands no spontaneous macrocyclization was observed [30 ].

Prenyl transferases, DUF and oxidase
Although patellamides A and C are not prenylated, the related patellin (Figure 1b) and trunkamide natural products are (Figure 1b) [32][33][34][35]. The prenyl transferases from patellamide like pathways in Lyngbya aestuarii (LynF) and Prochloron spp. (TruF1) have been characterized biochemically [35,36]. LynF prenylates a tyrosine residue on the phenol group to generate the corresponding allyl phenyl ether intermediate which then spontaneously undergoes a Claisen re-arrangement to yield the ortho-substituted phenol [36]. On the other hand, TruF prenylates serine and threonine residues on the hydroxyl side chain [35] (Figure 1b). The structure of PatF from the patellamide pathway reveals that it does indeed adopt the classic TIM barrel fold (Figure 4a) seen in other prenyl transferases but no enzymatic activity was detected [37], consistent with the lack of prenylation of patellamides A and C. A detailed analysis of PatF reveals that two residues, His125 and Met136, are located at positions that are usually occupied by conserved and catalytically active Asp and Lys residues in a known prenyl transferase (dimethylallyl tryptophan synthase from Aspergillus fumigatus) [38], rationalizing the inactivity of PatF. Other cyanobactin prenyl transferases possess the Asp and either a Lys or Arg at these key positions [37]. Whether PatF has another role in the synthesis of patellamides A and C, and if not why PatF is conserved in the gene cluster, remains unknown.
Both PatA and PatG are multi-domain proteins and both contain a C-terminal domain of unknown function (DUF) [11] (Figure 1c). The sequences of the two DUF domains are homologous and are found in PatA and PatG homologs of other patellamide like biosynthetic clusters [39]. A number of roles for the DUF domain have been considered including epimerization of the two residues in patellamide A and C (Figure 1b). The pKa of the Ca proton will be lowered by the adjacent thiazoline due to resonance stabilization (Figure 4b); the loss of aromaticity in the thiazole that would occur upon stabilization of the negative charge would be expected to raise the pKa. For this reason, it is commonly assumed epimerization follows heterocyclization and precedes oxidation, but this has not been established. Epimerization has been proposed to be chemically spontaneous [40], if true, DUF domains may have no role in patellamide biosynthesis. The question then would be why the DUF domains are so well conserved in patellamide like biosynthetic gene clusters? The structure of the DUF domain from PatG has been determined and the protein is found as a dimer with two zinc ions at the interface [39] (Figure 4c). The zinc ion binding residues are not conserved in other DUF domains and since the dimerization depends on the zinc, it is unclear whether other domains are dimers [39]. The fold of the DUF domain is novel but gave no clue as to a potential function [39]. It was established that the DUF domain does not bind linear precursor peptides, nor linear precursor peptides with four heterocycles [39]. Further experiments are needed to establish whether the DUF domain binds the macrocycle (or perhaps the core peptide alone). Secondly, an evaluation of the chemical spontaneity or otherwise of epimerization in patellamide C is also needed before a final assessment of the DUF domain activity can be made.
The only domain in the patellamide biosynthetic pathway without a structure is the oxidase (dehydrogenase) domain of PatG (Figure 1c). The domain is conserved in PatG homologs from patellamide-like pathways    (Figure 4d). Sequence analysis identifies the domain as FMN dependent and a recent biochemical analysis of the related thiazoline oxidase (which is a separate protein) in the microcin pathway has been reported [41]. This study showed FMN was bound and identified residues crucial for activity [41]. The domain belongs to the class of FMN oxidases most closely resembling the structure 3EO7 (determined by www.jcsg.org). Although the TOMM and patellamide enzymes are related in sequence, the basis of substrate recognition remains unclear since microcin is linear and patellamides are macrocycles. A recent study has shown that one homolog of the PatG oxidase domain can oxidize both linear and macrocyclic thiazoline containing substrates [42 ], whereas another homolog appeared only to operate on a macrocyclic substrate [42 ].

Future prospects
The patellamide biosynthetic pathway is comprised of multiple different chemical steps (thus enzymes) and this has made for particularly interesting mechanistic and structural studies. The interest in this specific biosynthetic pathway is driven by the potential utility of the compounds that can be made. Several studies have reported the in vitro and in vivo biotechnological application of the pathway to make novel compounds [42 ,43 ,44 ]. In one in vitro approach the slow protease step catalyzed by PatA has been replaced by engineering of the PatE so that trypsin can be used instead; this step has greatly accelerated the production and increased the yield of patellamide analogs [42 ]. Despite all these efforts, the order of the epimerization, macrocyclization and oxidation remains to be experimentally verified. Important questions also remain to be answered about the relationship between the macrocyclase sequence with respect to macrocycle ring size, the mechanism of heterocyclization, the precise substrate for oxidation and the nature of epimerization. Answering these questions will not only enhance our knowledge of the chemistry of the enzymatic reactions found in the patellamide pathway (and other natural product biosynthetic pathways), but should allow a greater diversity of analogs to be created both in vivo and in vitro. Two principal bottlenecks in the in vitro production of analogs remain [42 ]. Firstly the macrocyclase is slow and is the rate-limiting step in the whole process. Secondly the requirement for the leader peptide for introduction of heterocycles means that PatE can only be economically produced by expression in a heterologous host, which limits the chemical diversity within the final patellamide analog. Resolving these issues would transform the prospects for large-scale production of very diverse analogs. PatD has been shown to be activated in trans by addition of exogenous leader [43 ] and this suggests there is scope to overcome at least one of the bottlenecks.

Conflict of interest
Nothing declared.