Structural basis of the amidase ClbL central to the biosynthesis of the genotoxin colibactin

Insight into biosynthetic pathways in the human microbiome can provide details of host–microbe interactions and beneficial symbiosis. In this report, the structure and function of a key bond-forming enzyme in the biosynthesis of the human genotoxin colibactin is presented.


Introduction
The human body hosts a complex community of microorganisms that have increasingly been implicated to play key roles in health (Silpe & Balskus, 2021;Chang, 2020).Increasing evidence links dysbiosis in the microbiome to a variety of diseases/disorders, including inflammatory bowel disease and cancers (Mohseni et al., 2020;Wilson et al., 2019;Xue et al., 2019;Bossuet-Greif et al., 2018;Morgan et al., 2022).A detailed understanding of microbe-microbe and hostmicrobe interactions would be useful towards general functional insights, along with approaches towards diagnosis, prevention and treatment.It has been established that certain strains of gut commensal bacteria produce a toxin, colibactin, that causes double-strand DNA breaks (Xue et al., 2019;Dougherty & Jobin, 2021;Li et al., 2019), promotes tumor formation in mouse models of colitis and is frequently found in patients with colorectal cancer (CRC; Dubinsky et al., 2020).In addition to interactions between colibactin and mammalian cells, it has recently been demonstrated that colibactin targets the gut microbiome using a prophage-inducing mechanism leading to microbial cell lysis (Silpe et al., 2022).
The colibactin biosynthetic pathway is encoded by a 54 kb gene cluster termed clb (or pks) centered around a hybrid nonribosomal peptide synthetase-polyketide synthase (NRPS-PKS) machinery.Colibactin biosynthesis (Supplementary Fig. S1) involves a prodrug-like mechanism (Brotherton & Balskus, 2013;Bian et al., 2013;Volpe et al., 2019) in which precolibactins are assembled in the cytoplasm and then transported to the periplasm by ClbM, a member of the MATE family of transporters (Mousa et al., 2016(Mousa et al., , 2017)).N-Deacylation of precolibactins by the periplasmic peptidase ClbP leads to the formation of the active genotoxin colibactin (Velilla et al., 2023;Volpe et al., 2023).Isolation and structure determination of colibactin has been a challenge because of product instability; however, several precolibactins have been isolated and the bioactive product has been characterized (Williams et al., 2020;Tang et al., 2022;Wernke et al., 2020;Hirayama et al., 2022).Complementary to this, clb+ Escherichia coli have been shown to generate DNA interstrand crosslinks, both in vivo and in vitro, via N7 adenine alkylation by the electrophilic cyclopropane ring of colibactin (Wilson et al., 2019;Xue et al., 2019).
ClbL is one of the five genes in the colibactin gene cluster that are found to be upregulated in CRC mouse models (Arthur et al., 2012).Additionally, gene-deletion studies have implicated clbL as being required for the cytopathic effects of colibactin (Nougayre `de et al., 2006).It has also been demonstrated that ClbL acts as an amide bond-forming enzyme and it has been proposed to be involved in the final coupling step in precolibactin biosynthesis (Fig. 1; Jiang et al., 2019).The unique enzymatic transformation involves the formation of an amide bond between -aminoketone and -ketothioester acyl carrier protein (ACP) thioester intermediates, as demonstrated both in vivo and in vitro (Jiang et al., 2019).ClbL transacylation produces the pseudodimeric precolibactin that is further elaborated to the mature genotoxin.Another distinctive aspect of ClbL is that the two substrates are both ACP-linked phosphopantetheinyl thioesters of two distinct intermediates along the NRPS-PKS biosynthetic assembly line.
ClbL is a member of the diverse amidase superfamily (AS) of enzymes that are characterized by a highly conserved Sercis-Ser-Lys catalytic triad (Supplementary Fig. S2) that is key to amide hydrolysis (Shin et al., 2002;Valin ˜a et al., 2004;Patricelli & Cravatt, 2000;Labahn et al., 2002).The general catalytic mechanism proceeds through an acylenzyme intermediate followed by nucleophilic substitution (commonly water for AS enzymes).In addition to the triad, AS enzymes contain a conserved stretch of approximately 130 amino acids termed the AS sequence that contains a core catalytic motif surrounded by -helices.Enzymes of this family are widely found in both prokaryotes and eukaryotes, and exhibit a wide variety of functions with diverse substrate specificity; they include peptide amidase (Neumann et al., 2002), fatty-acid amide hydrolase (Bracey et al., 2002;Cravatt et al., 1996), malonamidase E2 (Shin et al., 2002) and glutamyl-tRNA amidotransferase subunit A (Nakamura et al., 2006;Curnow et al., 1997).
ClbL differs from the canonical AS family chemistry by linking an -aminoketone to a -ketothioester, resulting in the formation of an amide bond.The -aminoketone nucleophile is conjugated to the carrier domain of ClbI and the -ketothioester to that of ClbO.This specific acyl-transfer chemistry was supported by assaying various thioester-and amine-based substrates (Jiang et al., 2019).The heterodimeric product from ClbL is subsequently hydrolyzed by ClbP, generating the biselectrophile active product (Brotherton & Balskus, 2013;Bian et al., 2013;Volpe et al., 2019).

Cloning, expression and purification of ClbL
The clbL gene was cloned from a bacterial artificial chromosome harboring the colibactin pks island (Nougayre ` de et al., 2006) into pET-28a (NdeI/XhoI sites; Supplementary Table S1).clbL-pET-28a was transformed into E. coli C43(DE3) and grown in LB-kanamycin medium at 37 C to an OD 600 of $0.3 and then at 25 C to a OD 600 of $0.6.100 mM isopropyl -d-1-thiogalactopyranoside was added and growth was continued at 25 C for 16 h.The cells were harvested by centrifugation, resuspended in 20 mM Tris-HCl pH 8.0, 500 mM NaCl, 1 mg ml À1 aprotinin, 1 mg ml À1 pepstatin, 1 mM phenylmethylsulfonyl fluoride (PMSF) and lysed using a microfluidizer.Insoluble material was removed by centrifugation at 11 000 rev min À1 and the supernatant was incubated with 0.5 ml Ni-NTA resin for 1 h at 4 C.The resin was washed with 5Â 10 ml 20 mM Tris-HCl pH 8.0, 500 mM NaCl, 20 mM imidazole and was then eluted with 3Â 1.5 ml 20 mM Tris-HCl pH 8.0, 500 mM NaCl, 250 mM imidazole.The protein was dialyzed against 20 mM Tris-HCl pH 8.0, 100 mM NaCl, 1 mM -mercaptoethanol, 10% glycerol and was purified using ion-exchange (HiTrap Q, GE Biosciences) and size-exclusion (HiLoad Superdex 75, GE Biosciences) chromatography.

Crystallization
ClbL was concentrated to 4.5 mg ml À1 and screened for crystallization in 96-well sitting-drop plates using commercial sparse-matrix screens.The initial crystallization conditions were optimized to 0.1 M Tris-HCl pH 8.5, 30% PEG 3000.The crystals were harvested and flash-cooled in liquid nitrogen, and data were collected on the 23-ID-D beamline at the Advanced Photon Source, Argonne National Laboratory.The 1.9 A ˚resolution diffraction data were indexed and scaled using the XDS package (Kabsch, 2010) and the structure was solved by molecular replacement using PDB entry 5h6s (27% sequence identity, 93% coverage; Akiyama et al., 2017).The structure was refined to an R work of 0.20 and an R free of 0.25; overall refinement statistics are shown in Table 1.Manual and automated model building were iteratively performed using Coot (Emsley et al., 2010) and real-space refinement in Phenix (Liebschner et al., 2019).The PyMOL molecular-graphics system (version 2.0; Schro ¨dinger) was used to generate graphical representations.

Molecular modeling
Protein docking was performed using the LZerD protein-docking webserver (https://lzerd.kiharalab.org/about/).AlphaFold-generated models of ClbL and the ACP domain of ClbO (residues 742-819) were used as input, with a default clustering cutoff of 4 A ˚ (Venkatraman et al., 2009;Senior et al., 2020).The per-residue confidence scores (pLDDTs) for both AlphaFold models were >90 for the majority of the residues, except for a 38-residue stretch (324-362) in ClbL with a confidence score of 50-70.The top ten outputs from the LZerD server were all clustered above the ClbL active site with ranksum scores ranging from 87 to 407.The second-best docked model with a ranksum score of 110 was used for further analysis in Supplementary Fig. S6.

Overall structure of ClbL
To provide insight into its unique catalytic properties and substrate specificity, we determined the structure of ClbL at 1.9 A ˚resolution (Fig. 2a).Based on comparative sequence analysis, a hydrazidase from Microbacterium (PDB entry 5h6s, 27% sequence identity; Schmitt et al., 2005) was used as a molecular-replacement model to determine the structure.ClbL crystallized as a homodimer (space group C121) and a single polypeptide chain consists of 487 residues.Interpretable electron density for residues 318-367 and 208-214 is missing, suggesting flexible loop regions adjacent to and covering the active site.The overall structure displays a compact mixed / fold consisting of 12 -helices and 12 -strands.This general fold is similar to that observed in other members of the AS enzyme superfamily, and a structural homology-based search showed the highest similarities to glutamyl-tRNA amido-transferase (Schmitt et al., 2005), malonamidase E2 (Shin et al., 2002) and fatty-acid amide hydrolase (Cravatt et al., 1996).
Compared with other members of the AS superfamily, a notable structural feature of ClbL is the presence of several disordered regions around the active site.A lack of electron density is evident in both monomers of the asymmetric unit, suggesting that the disordered loops are relevant to the protein in solution.AS family members, which process relatively small-molecule substrates, commonly have an ordered active-site region; bacterial aryl acylamidase (PDB entry 4yj6, 28% sequence identity; Lee et al., 2015) exemplifies this difference (Fig. 2, Supplementary Fig. S3).
To explore the disordered active-site loop and ClbL-carrier domain interactions, we examined a model structure using AlphaFold (Senior et al., 2020).The model is very similar to our experimental structure (Supplementary Fig. S4), with an r.m.s.d. on all atoms of 1.3 A ˚, and includes a model for the disordered regions adjacent to the active site that are not present in our experimental model.Residues 318-367 form a four-helix bundle that does not fully cover access to the active site.An extended helical structure could help to prevent hydrolysis chemistry and could also be involved in carrierdomain interactions.The arrangement of -helices in the AlphaFold model is unlike that observed in canonical AS members such as aryl acylamidase (Fig. 2).

Active-site structure and substrate interactions
The active-site catalytic triad of ClbL consists of Ser179-cis-Ser155-Lys80 (Fig. 3a).The protein crystallized in an active conformation, as is evident from the covalent conjugation of an unanticipated small molecule to Ser179 in the active site.PMSF, which is present in the ClbL purification steps, models well into the orphan electron density (Fig. 3b).Electron density corresponding to the phenyl group is not clear, suggesting disorder/nonspecific interactions.Both O atoms of the sulfonate group make hydrogen-bonding interactions with the side-chain hydroxyl of Ser155 and the cis-amide N atom of Ser155.These interactions could mimic an oxyanion hole-type stabilization of the tetrahedral intermediate of the reaction.Based on the conjugate-bound Ser179, the substrate -ketothioester (Fig. 1, boxed ClbL-bound intermediate) was modeled into the active site (Fig. 3c, Supplementary Fig. S5).An extended conjugate, as modeled, is in position to form a hydrogen bond between the carbonyl group and the backbone amide of Phe177.Additionally, Trp132 and Leu176 are in position to form a hydrophobic pocket to accommodate the aliphatic portion of the substate.The overall conformation places the substrate extending to the surface of the enzyme.
The substrates of ClbL are unique compared with other AS superfamily enzymes.The observed substrate-binding pocket is largely hydrophobic and is of a suitable size to accommodate the two predicted substrates.The hydrophobicity of the binding pocket along with an $50-residue flexible lid can disfavor water from the catalytic site while favoring conjugation over amidase chemistry.Based on the previous reports and our structural data, we hypothesize that Lys80 acts as a general base and abstracts a proton from Ser155, which in turn activates Ser179 for nucleophilic attack on the -ketothioester intermediate (Jiang et al., 2019).The resulting tetrahedral intermediate forms an acyl-enzyme complex stabilized by hydrogen-bonding interactions with the backbone amides of Gly154 and Ser155 (Fig. 3b, Supplementary Fig. S5).The general chemistry of the AS superfamily involves water reacting with an acyl-enzyme intermediate to produce a hydrolysis product.In the absence of water, a properly oriented nucleophile can readily react with the acyl-enzyme complex (Goswami & Van Lanen, 2015).In contrast, ClbL catalyzes -aminoketone addition to a serine-bound acylenzyme intermediate, resulting in the formation of an amide bond (Fig. 3a), an exothermic reaction.

Predicted interactions with partner carrier domains
ACP domains are small proteins consisting of four -helices, generally with a low isoelectric point (pI).ACP-partner enzyme interactions are commonly based on electrostatic interactions (Moretto et al., 2017;Keatinge-Clay, 2016).The specificity of in trans interactions of ClbL with the ACP domains of ClbI and ClbO was examined using sequence analysis.From our structure, the disordered/partially disordered loop regions of ClbL are predominately basic, with an overall negative charge.This is exemplified by a loop (residues 405-412; N-QQPVRKRK) and a unmodeled loop (318-367) with an estimated pI of 9.5.The ACP domain commonly interacts with partner enzymes through interactions of helix 2 and the preceding loop region, and there is not a common interaction mode among representative examples (Gulick & Aldrich, 2018).The post-translationally modified serine residue is located at the N-terminus of helix 2. The overall charge of both ClbI and ClbO is negative, with pIs of $4.0 and $4.9, respectively.ClbO has prominent negatively charged patches, for example EHSEFISECVD; this general spacing of side chains suggests that the acid groups are on the same face of helix 2.
The chemistry of ClbL on the carrier domain of ClbO is noncanonical (amide-bond formation) when compared with PKS carrier-domain enzyme transformations.From looking at sequence differences between the eight carrier domains in the colibactin assembly line, the carrier domain of ClbO more closely resembles peptidyl carrier proteins despite being in a PKS module.To provide supporting evidence for ClbL-carrier domain interactions, we modeled carrier-domain interactions with ClbL using the described AlphaFold model (a secondorder analysis) to dock the carrier domain of ClbO (Supplementary Figs.S6 and S7).The ACP was placed in a productive orientation to deliver a phosphopantetheinyl substrate into the active site.The distance between the modified ClbO serine residue and the active site of ClbL is 22 A ˚, which is within the distance for established carrier domain-enzyme interactions.In addition, the electrostatic interactions predicted by sequence alignment are supported by the modeled didomain structure, with residues 405-412 (QQPVRKRK) in close proximity to helix 2 of ClbO (EHSEFISECVD) and with acidic residues on one face of the helix close to the basic loop of ClbL.
Amidases are ubiquitous enzymes that exhibit a wide variety of functions, including the hydrolysis of a wide range of amide substrates including short-chain aliphatic amides, midchain amides, arylamides, -aminoamides and -hydroxyamides.To understand the unusual transacylation activity of ClbL, we created a sequence-similarity network (SSN; Gerlt et al., 2015) for family PF01425 (Supplementary Fig. S8).The SSN diagram shows clustering of ClbL from different pks+ species in a clade distinct from other representative amidases, further supporting a unique catalytic role of ClbL.

Figure 1
Figure 1 ClbL-mediated transacylation of colibactin NRPS-PKS intermediates.Phosphopantetheinyl thioesters of the ACP domains of ClbO and ClbI are substrates for ClbL transamidation.The bond-forming step (blue square) and product amide (blue circle) are highlighted.

Figure 2
Figure 2 Structure of ClbL along with comparison with a representative member of the AS superfamily.(a) Overall protein structure highlighting Ser179 in the active site (green) along with two disordered loop regions and the basic loop.(b) Structure of a bacterial aryl acylamidase (Lee et al., 2015) shown in the same orientation with the corresponding Ser174 highlighted.

Figure 3
Figure 3 Substrate-ClbL active-site interactions.(a) Schematic of the overall ClbL reaction and substrate interactions.The overall pathway is shown along with the interactions of a modeled PMSF adduct (boxed).R 1 and R 2 represent the extended end chains of precolibactin (Fig. 1).(b) Composite omit electrondensity map, contoured at 1.0 A ˚, of the modeled phenylmethylsulfonyl-Ser179 adduct.Hydrogen-bonding distances are 2.7 A ˚(Ser155/sulfonyl) and 3.2 A ˚(Ser155 cis-amide/sulfonyl).(c) ClbL surface representation of a modeled bound substrate (Supplementary Fig. S5, yellow).

Table 1
Structure-refinement statistics for ClbL (PDB entry 8es6). in parentheses are for the highest resolution shell.