ClbR Is the Key Transcriptional Activator of Colibactin Gene Expression in Escherichia coli

The nonribosomal peptide/polyketide hybrid colibactin can be considered a bacterial virulence factor involved in extraintestinal infection and also a procarcinogen. Nevertheless, and despite its genotoxic effect, colibactin expression can also inhibit bacterial or tumor growth and correlates with probiotic anti-inflammatory and analgesic properties. Although the biological function of this natural compound has been studied extensively, our understanding of the regulation of colibactin expression is still far from complete. We investigated in detail the role of regulatory elements involved in colibactin expression and in the growth conditions that promote colibactin expression. In this way, our data shed light on the regulatory mechanisms involved in colibactin expression and may support the expression and purification of this interesting nonribosomal peptide/polyketide hybrid for further molecular characterization.

is oriented in the opposite direction, is separated by an approximately 400-bp intergenic region from clbB, and codes for the phosphopantetheinyl transferase ClbA and the designated transcriptional activator ClbR (1,33). Both gene products are necessary for the activation of colibactin production, either by switching on the synthesis proteins or by regulation of colibactin gene transcription. Upstream of these initiating genes, a specific structural element is located, i.e., a region with variable numbers of tandem repeats (VNTR) (Fig. 1). This region consists of an 8-bp nucleotide sequence, 5=-ACAG ATAC-3=, and can vary in size, with 2 to 20 repeats of the octanucleotide sequence, depending on the individual bacterial isolate (2). So far, only the numbers of repeats present in the VNTR region have been described to differ between different strains (2), but it is yet not clear whether variations in the size of the VNTR region affect colibactin expression. On the basis of the localization in the intergenic region between the regulatory gene cluster (clbR-A) and the biosynthesis gene cluster (clbB-S), we hypothesize that this genetic element may affect regulation of colibactin expression.
Most of the previous research on colibactin focused on elucidation of the molecular structure and of the mode of action of the active compound or functional intermediates thereof. In contrast, the aim of the present work was to achieve a better understanding of the mechanisms of regulation of colibactin expression and their implications for the biological role of colibactin. The observation that the colibactin biosynthetic pathway can produce different small compounds, such as an analgesic peptide, which can have completely different effects than colibactin itself (14,15,17,22), serves as a great motivator to advance the elucidation of the factors and processes that contribute to the regulation of colibactin expression. Our understanding of the regulation of pks island expression revolves around the activity of its proposed key regulator, ClbR. Until now, ClbR has been described as a LuxR-like protein with a helix-turn-helix DNA-binding motif (33), suggesting that this protein is involved in regulation of pks island transcription. Therefore, our aim in this study was to characterize the function of ClbR as a transcriptional regulator and further details of the regulation of colibactin expression in E. coli.
(Data reported in this study appeared in part in the diploma thesis of M. Selle, the M.Sc. thesis of M. Brinkmann, and the Ph.D. thesis of A. Wallenstein.)

RESULTS
Genome sequence analysis of fecal E. coli isolate M1/5. For the detailed investigation of regulatory aspects of colibactin expression, we selected E. coli strain M1/5 as our main model organism. This fecal isolate from a healthy volunteer is a reliable colibactin producer and represents the highly relevant group of colibactin-positive E. coli strains of phylogroup B2, which colonize the intestinal tract of many humans. On the basis of the complete genome sequence, E. coli M1/5 was allocated to sequence type 550 (ST550)/clonal complex 14 (CC14) and serotype O75:K5:H5. In addition to the 5,138,587-bp chromosome, the M1/5 genome includes two plasmids, pM1/5-120 (119,964 bp) and pM1/5-30 (29,585 bp) (see Fig. S1 in the supplemental material). Although E. coli M1/5 is a colibactin-positive strain, its genome does not contain many E. coli virulence-associated genes such as those encoding characteristic toxins and adhesins of intestinal or extraintestinal pathogenic E. coli. A group II capsule (serotype K5) gene, multiple fimbrial adhesin operons, several autotransporter-encoding genes, two type six secretion system genes, and different siderophore system genes may contribute to the fitness and competitiveness of E. coli M1/5. Further characteristics regarding the E. coli M1/5 genome content are provided in Table S1 in the supplemental material. Determinants coding for common antibiotic resistance phenotypes in E. coli has been detected in the genome sequence of strain M1/5, which is sensitive to colistin, ␤-lactams, aminoglycosides, sulfonamide/trimethoprim, phenicols, glycopeptides, tetracyclines, quinolones, rifampin, nitroimidazole, and macrolides.
ClbR is a transcriptional activator of colibactin gene expression. The clbR gene was originally annotated as a "putative transcriptional regulator" based on the significant similarity of the deduced ClbR amino acid sequence to sequences of transcription regulators of the LuxR/FixJ family (1,33). The ClbR protein exhibits a high level of similarity to the transcription regulator GerE of Bacillus subtilis. Both GerE and ClbR contain a C-terminal helix-turn-helix (HTH) DNA-binding motif but lack an N-terminal regulatory receiver (REC) domain (34) (Fig. S2). Accordingly, and in contrast to many other LuxR/FixJ family members, both proteins are autonomous effector domain regulators and not response regulators. To gain the first insights into the role of ClbR in regulation of colibactin gene expression, we deleted clbR in E. coli M1/5 and compared the results seen with respect to cytopathic effect (CPE) and DNA damage in infected HeLa cells. In contrast to wild-type strain M1/5, deletion mutant M1/5 ΔclbR neither caused cell cycle arrest as shown by microscopic analysis and flow cytometry ( Fig. 2A and B) nor increased levels of phosphorylated histone H2AX in HeLa cells (Fig. 2C). Complementation of E. coli M1/5 ΔclbR with pBAD-clbR restored the ability to block the cell cycle in HeLa cells as well as to induce the DNA damage cascade (Fig. 2).
Furthermore, we transformed previously described reporter strains of E. coli Nissle 1917 (EcN) carrying a transcriptional fusion of the clbR promoter, clbA promoter, clbB promoter, or clbQ promoter and the promoterless luciferase (lux) operon (33) with a plasmid which allows clbR expression under the control of the tetracycline-inducible promoter tet p/o . These strains were cultivated in lysogeny broth (LB), and levels of clbR promoter, clbA promoter, or clbB promoter activity in response to increased ClbR levels were compared by luminescence measurements. This experiment demonstrated that increased ClbR levels resulted in markedly increased promoter activities of the clbR and clbB genes, whereas the clbA promoter activity did not strongly respond to increased ClbR concentrations (Fig. S3). These results indicate that ClbR is a transcriptional (auto)activator of colibactin gene expression.
Promoter activity of clbR and colibactin expression depend on medium composition. To search for factors and conditions that affect clbR expression, we employed a reporter gene fusion based on the clbR promoter and the promoterless lux operon in E. coli M1/5. We tested different media such as lysogeny broth (LB), terrific broth (TB), M9 medium with and without Casamino Acids, interaction medium (IM), brain heart infusion (BHI), and Todd Hewitt broth (THB) (Fig. 3). Even though the use of each growth medium led to a characteristic pattern of clbR promoter activity, luminescence peaked during the transition from exponential growth to stationary phase except for IM and BHI media, where clbR promoter activity peaked during mid-exponential growth. Expression levels of clbR were higher in poorer than in richer media. The highest values for relative light units (RLU)/optical density at 600 nm (OD 600 ) were observed upon bacterial cultivation in M9 media, and growth in M9 medium with Casamino Acids (M9ϩCAS) resulted in the most highly defined peak of clbR promoter activity. Cultivation in TB, THB, LB, and BHI medium resulted in a much lower expression level than growth in defined media, such as interaction medium and M9 minimal medium. We also constructed a reporter module based on the frr promoter as a "housekeeping reference" in E. coli M1/5. We measured the clbR and frr promoter activities in E. coli M1/5 upon growth in LB and in M9ϩCAS medium. The corresponding data are presented in Fig. S4. Under both growth conditions tested, the curve shapes were similar for clbR and frr promoter activities. However, only the clbR promoter activity and not the frr promoter activity increased significantly with growth in M9ϩCAS medium compared to LB. This supports our observation that clbR promoter activity is specifically induced in poorer media than in LB. On the basis of these findings, we decided to perform all further analyses of colibactin or ClbR expression in E. coli M1/5 in bacterial samples harvested in the late exponential growth phase in M9 medium with Casamino Acids.
To see whether the clbR promoter activity would respond to the growth medium and growth phase to the same extent as that seen with other E. coli isolates, we integrated the same the clbRp-lux reporter module used in E. coli M1/5 into the chromosomal -attB site of different model strains, in which colibactin expression has been studied previously, including probiotic strain Nissle 1917, uropathogenic strain UTI89, and newborn meningitis isolates IHE3034 and SP15. We then compared the levels of clbR promoter activity upon cultivation in M9ϩCAS medium or in LB in these strain backgrounds. In principle, the expression profiles seen with the clbR promoter were very similar to those seen in the other strain backgrounds such as E. coli M1/5; i.e., the promoter activity reached its maximum in the (late) logarithmic-growth phase (Fig. S5). In general, the promoter activity in LB was also significantly lower than in M9ϩCAS. Interestingly, there were also differences in the clbR promoter activity among the strains; the promoter activity was always higher in E. coli isolates Nissle 1917 and IHE3034 than in E. coli strains UTI89 and SP15. In particular, the levels of promoter activity seen with probiotic strain Nissle 1917 and fecal isolate M1/5 in M9ϩCAS were very similar (Fig. S5). The results obtained in different strain backgrounds showed that colibactin expression in E. coli reached its maximum in the late exponential-growth phase and was generally higher in poor media than in rich media. Despite basically uniform expression profiles, the levels of strength of clbR promoter activity may differ in different strain backgrounds.
Expression of clbR responds to iron availability. Previous studies have shown that regulation of colibactin expression responds to iron availability via Fur-dependent and RyhB-dependent regulation of clbA transcription, thus affecting colibactin production (Tronnet et al. [35]). To find out whether expression of the main transcriptional activator of the colibactin genes is also regulated in response to iron availability, we employed the E. coli M1/5 -attB::5VNTR-clbRp-lux reporter strain described above to study clbR promoter activity under conditions of iron limitation or in the presence of an increased Fe(III) ion concentration. An increase in ferric iron availability in LB mediated by adding 100 M FeCl 3 as well as iron limitation mediated by adding 0.2 M deferoxamine had no significant effect either on growth of the E. coli M1/5 reporter strain or on clbR promoter activity (Fig. 4A). Interestingly, addition of 100 M FeCl 3 to M9ϩCAS resulted in a strong reduction of clbR promoter activity, whereas the presence of 0.2 M deferoxamine did not affect reporter gene expression during growth (Fig. 4B). These results indicate that ClbR expression is directly altered by the availability of iron also.
ClbR interacts with the clbR-to-clbB intergenic region of the colibactin island. To investigate whether ClbR interacts with DNA and to identify putative ClbR binding sites within the colibactin island, a series of electrophoretic mobility shift assays (EMSAs) were performed with purified ClbR, focusing on the intergenic region between clbB and clbR. With DNA probes of decreasing sizes, we scanned the clbB-R intergenic region for those parts which interact with the ClbR protein. We identified a ClbR binding site close to clbB and found that the binding motif is located between position bp Ϫ40 and position bp Ϫ107 upstream of the clbB translational start, since no interaction of ClbR with probes 7, 8, 11, and 12 was detected ( Fig. 5A and B). We also studied the interaction of ClbR with the immediate upstream region of its own coding sequence by scanning a 123-bp region upstream of clbR, including the VNTR region, with DNA probes of differing size. ClbR interaction with probes generated from the clbR upstream region were observed with probes 13, 14, and 15 but were no longer observed with probe 16 (Fig. 5C). Accordingly, the DNA stretch upstream of position Ϫ2 relative to the clbR translational start is required for ClbR binding. In contrast to the clean shift observed in EMSAs performed with the clbB upstream fragment, the region close to clbR exhibited more-complex interactions with ClbR (Fig. 5C). The clbR gene is preceded by a VNTR region (Fig. 1), and such regions can differ in size. Between 2 and 20 repeats have been described so far (2). We assume that the VNTR region or the overall tertiary structure of this DNA stretch interfered with efforts to reveal a clearer assessment of the migration behavior of the probes designed for this part of the intergenic region between clbB and clbR. Purified ClbR protein did not interact with the probe representing the lacZ promoter region that served as a negative control (Fig. 5D). Accordingly, we have demonstrated that ClbR can interact with the clbB-R intergenic region. We narrowed down the DNA stretch in clbB and also that in the clbR upstream region to which ClbR binds. Our results corroborate the predicted function of ClbR as transcriptional regulator and (auto)activator.
ClbR binding regions and overlapping of clbR and clbB promoter regions. To further characterize the putative role of the VNTR region as a regulatory element located in the clbR-B intergenic region (Fig. 6A), we investigated the exact transcription start site (TSS) of clbR and clbB by differential transcriptome sequencing (RNA-seq) and compared mapped sequencing reads of untreated and terminator 5=-phosphatedependent terminator exonuclease (TEX)-treated (enriched for primary transcripts) RNA samples isolated from E. coli strain M1/5 ( Fig. 6B and C). Judging on the basis of the number of sequence reads mapped to the chromosomal region close to clbR, this gene is only weakly transcribed. We identified the clbR TSS start 16 bp upstream of the VNTR region, suggesting that this stretch of repeats belongs to the 5= untranslated region of clbR (Fig. 6D). In contrast, clbB is much more strongly expressed at the transcriptional level than clbR ( Fig. 6B and C), and the clbB transcriptional start site was mapped to position Ϫ24 relative to the clbB translational start (Fig. 6D). The EMSA and differential RNA-seq data demonstrate that the clbR and clbB transcription start sites overlap the ClbR binding regions within the intergenic region between these two genes ( The VNTR region affects clbR expression at the transcriptional level. To analyze whether a VNTR region of a different size would affect clbR expression, we inserted FIG 5 ClbR binds to clbR and clbB upstream regions. To demonstrate ClbR-DNA interactions using EMSA, PCR-generated, digoxigenin-labeled DNA fragments (300 pM) obtained from the upstream region of clbR and clbB, respectively, were incubated with increasing amounts of purified ClbR protein (for probes 1 to 12, 0 nM, 50 nM, or 100 nM ClbR per lane; for probes 13 to 16, 0 nM, 50 nM, 100 nM, or 150 nM ClbR per lane). The size and position of each of the probes are given relative to the translational start of clbR and clbB, respectively. (A to C) Probes 1 to 12 were used to narrow down the ClbR binding site upstream of clbB (A and B), and probes 13 to 16 were used to analyze ClbR binding to the clbR upstream region (C). Panels A and B refer to different subsets of probes tested for the clbB promoter region. (D) To confirm specific binding of ClbR, a negative control, i.e., a promoter fragment that lacks the ClbR binding motif, was included. For this purpose, a lacZ promoter-based probe [1 nM] was incubated with increasing amounts of purified ClbR protein (0 nM, 50 nM, 100 nM, and 200 nM ClbR per lane). The use of ClbR concentrations at which clear shifts were observed with probes representing the clbR or clbB promoter regions did not lead to reduced migration behavior of the lacZ probe.
luciferase-based reporter constructs, which are fused to the clbR upstream region with a VNTR region of either 5 or 20 repeats, into the chromosomal attachment site of bacteriophage (-attB) and tested for luciferase expression. We measured 2-foldhigher luminescence with the 20-repeat VNTR region than with the VNTR region containing 5 repeats ( Fig. 7A and B), suggesting a regulatory impact associated with To further characterize the role of ClbR as a transcriptional activator of clbR and clbB, we determined the transcriptional start sites of both genes in E. coli M1/5 rpsLK42R by differential RNA-seq. By comparing mapped sequencing reads of TEX-treated (blue) and untreated (red) RNA samples, the transcriptional start site (TSS) of clbR was identified upstream of the VNTR region. (C) Using the same method as that described for panel B, we determined the clbB transcriptional start upstream of the clbB translational start site. (D and E) The corresponding nucleotide sequence and predicted promoter elements of the transcriptional start sites of clbR (D) and clbB (E) are indicated. this particular DNA stretch. We conclude from our reporter gene studies that clbR transcript levels depend on growth phase, resource availability, and length of the VNTR region.
As colibactin expression correlates with the level of available ClbR, we also assessed colibactin production changes in response to the growth medium or size of the VNTR region by quantifying C14-Asn as a by-product of colibactin biosynthesis. For this purpose, the repeat number of the VNTR region had to be modified by scarless mutagenesis, and we generated variants of the native VNTR region comprising either 5 or 19 repeats. The C14-Asn levels produced by isogenic E. coli M1/5 variants carrying a VNTR region with either 5 or 19 repeats upstream of clbR supported our observations made with the chromosomal -attB site-inserted luciferase-based reporter fusions comprising 5 or 20 VNTRs. The concentration of C14-Asn increased with increasing VNTR region size and was also higher upon cultivation in M9 medium supplemented with Casamino Acids than in LB ( Fig. 7C and D). As a result, our data show that clbR expression is modulated by the composition of the growth medium and by the size of the VNTR region located in the 5= untranslated region of clbR.
Modulation of colibactin expression via transcriptional activation or altered performance of the production machinery. To further investigate the role of ClbR as a key regulator of colibactin production as well as of general regulatory aspects of the pks island, we decided to compare the levels of colibactin production seen upon deletion or overexpression of clbR, i.e., in the absence of transcriptional activation or full induction of transcription of the colibactin determinant. Additionally, we tested whether colibactin production is subject to feedback regulation and therefore analyzed the levels of colibactin production seen upon deletion or overexpression of clbQ. ClbQ encodes a type II-family editing thioesterase, which controls the flux of substrates and intermediates during colibactin biosynthesis as well as the overall performance of the production machinery (26). Luminescence measurements performed with different chromosomally inserted reporter constructs that enable analysis of clbR promoter activity supported our finding that overexpression of clbR in trans resulted in increased clbR promoter activity whereas clbQ overexpression in trans had no drastic effect on clbR promoter activity. Furthermore, these luciferase assays also convincingly demonstrate that the size of the VNTR region upstream of clbR promoter affected the clbR transcription level (Fig. 8A). Infection of HeLa cells followed by indirect assessment of colibactin expression via quantification of phosphorylated histone ␥-H2AX indicated that clbR overexpression in E. coli strain M1/5 led to a strong increase of ␥-H2AX levels, whereas ␥-H2AX levels were markedly decreased in the clbR deletion mutant relative to the wild-type strain (Fig. 8B). In contrast, overexpression of clbQ as well as clbQ deletion in E. coli M1/5 reduced the detectable amount of ␥-H2AX in infected HeLa cells.
Complementation of E. coli M1/5 ΔclbR with pBAD24-tetAp-clbR-rrnBt and complementation of the clbQ deletion mutant of E. coli M1/5 with pBAD24-tetAp-clbQ-rrnBt resulted in ␥-H2AX levels in infected HeLa cells that corresponded to those observed upon infection with M1/5 derivatives overexpressing clbR and clbQ, respectively (Fig. 8B). UPLC-HRMS (ultra-high-performance liquid chromatography-high resolution mass spectrometry) measurements of the colibactin biosynthetic by-product C14-Asn confirmed that colibactin production was significantly increased upon overexpression of clbR, whereas clbR deletion, but also clbQ overexpression and clbQ deletion, abolished colibactin expression (Fig. 8C). These results do not support the idea of a potential form of feedback regulation of colibactin gene expression but rather suggest that modulation of the ClbQ protein level reduced the overall performance of the colibactin production machinery or the level of intermediates of the colibactin biosynthesis process.
General impact of colibactin production on the E. coli M1/5 transcriptome and proteome. To analyze expression of the colibactin determinant at the transcriptomic or proteomic level, and to find out to what extent expression of the colibactin island is integrated into regulatory and metabolic networks, we compared the transcriptome and proteome of E. coli strain M1/5 with those of M1/5 mutants lacking or overexpressing clbR as well as those of the clbQ deletion and overexpressing mutants. In this way, we also aimed to identify ClbR-dependent determinants located outside the pks island as well as candidate genes whose expression might be affected by the activity of the colibactin production machinery or by the availability of metabolites and intermediates related to colibactin production.
To screen for candidate genes which are markedly deregulated in E. coli M1/5 in response to different levels of available ClbR or ClbQ proteins, we pooled three biological replicates of E. coli strain M1/5 or its corresponding mutants and either isolated total RNA for differential RNA-seq analysis or performed gel-free proteomics to analyze the protein content of whole bacterial cells. Transcript levels of only 62 genes were deregulated in at least one of the clbR or clbQ mutants relative to wild-type strain M1/5 with a log 2 fold change value less than or equal to Ϫ2 or greater than or equal to ϩ2 (Fig. 9A; see also Table S2). Clustering of deregulated genes identified five groups of genes with different expression profiles in the four strains (Fig. 9A). Whereas transcription of the individual genes of the colibactin gene cluster (group 1) was downregulated in the clbR deletion mutant, it was upregulated upon overexpression of clbR ( Fig. 9A; see also Fig. S6A and B). The transcriptomic data for individual genes of the colibactin determinant are in good agreement with quantitative reverse transcription-PCR (qRT-PCR) results for clbA, clbR, clbB, and clbQ (Fig. S6A). Apart from that, the transcript levels of group 2 genes involved in histidine biosynthesis were more strongly repressed upon overexpression of clbR and deregulation of clbQ expression than in the clbR mutant. In contrast, transcription of the group 3 genes, which comprise the two iron-regulated sodB and fhuF genes, was specifically reduced in response to clbR overexpression. Deletion and overexpression of clbQ had only a weak effect on transcript levels of the genes in the colibactin gene cluster, except for clbQ and clbS ( Fig. S6A and D) but markedly affected, among others, the transcript levels of genes involved in amino acid and secondary metabolite biosynthesis (Fig. 8A, gene groups 2, 4, and 5). Although the transcriptome profiles of strains overexpressing clbR or clbQ could be distinguished from those of the clbR or clbQ deletion mutants, modulation of availability of both ClbR and ClbQ in E. coli M1/5 had similar overall impacts on gene expression at the transcriptional level. Whereas expression of genes involved in histidine biosynthesis was markedly repressed under all four tested conditions, genes required for biosynthesis of secondary metabolites and for aromatic amino acid (tryptophan, tyrosine, phenylalanine) biosynthesis or metabolization were upregulated (Fig. 8A).
In addition, we compared the proteomes of these mutants and identified 145 proteins with log 2 fold change values of less than or equal to Ϫ2 or greater than or equal to ϩ2 ( Fig. 8B; see also Table S2). For six identified proteins, we were also able to describe the corresponding genes as deregulated by RNA-seq. In five cases (ClbR, ClbC, ClbD, ClbF, and ClbQ), these were proteins/genes involved in colibactin biosynthesis. Only one of six deregulated genes (ibpB) and its encoded protein, IbpB, were not directly linked to the colibactin production machinery. As expected, ClbR and ClbQ protein levels were markedly increased upon overexpression of clbR and clbQ, respectively. While the expression levels of most detected Clb proteins did not strongly respond to the availability of ClbQ, some Clb proteins exhibited opposite levels of expression according to whether clbR was deleted or overexpressed ( Fig. S6C and E). The global protein expression profiles determined for the deletion of clbR and clbQ as well as for the mutants overexpressing clbR and clbQ were very similar at the protein level also. Cluster analysis indicated that a small number of (group I) proteins was repressed in the four mutants relative to the wild type. The majority of detected deregulated (groups III and IV) proteins were upregulated in the four mutants compared to the wild type (Fig. 8B). Gene Ontology (GO) analysis using E. coli strain CFT073 as reference allowed classification of 38 of 62 deregulated genes and 112 of 145 deregulated proteins detected in the E. coli M1/5 background into functional networks (Table 1). It is interesting that modulation of colibactin production by deletion and by overexpression of either clbR or clbQ had similar effects on gene expression (at the transcriptional or protein level). Although the results of the transcriptome and proteome analyses overlap in only six genes/gene products, the results show that secondary metabolite biosynthesis is influenced by the activity of the colibactin biosynthesis machinery. Several related groups of genes or gene products involved in amino acid  We compared the levels of impact of clbR or clbQ overexpression as well as of clbR or clbQ deletion on global gene expression of E. coli M1/5 rpsLK42R at the transcriptional and translational levels by RNA-seq and proteome analysis, respectively. Three biological replicates were pooled before aliquots were used for RNA-seq or mass spectrometry-based proteome analysis. The expression profiles of genes (A) and proteins (B) displaying a log 2 fold change value of less than or equal to Ϫ2 or greater than or equal to ϩ2 in the different E. coli M1/5 variants relative to the wild type were subjected to cluster analysis. The gene/protein designations are indicated as well as groups of genes/proteins with similar expression patterns. Groups of genes or proteins that displayed an enrichment of functionally related proteins are marked in color, and the corresponding Gene Ontology (GO) term is given.
(histidine, phenylalanine, tyrosine, and tryptophan) biosynthesis or organonitrogen compound and primary metabolic processes, which are in principle also relevant for polyketide biosynthesis, were deregulated at either the transcriptome or the proteome level. In summary, our observation that the clearly contrasting forms of regulation seen upon clbR deletion and overexpression were detectable only for the colibactin gene cluster supports our view that ClbR is the specific key transcriptional regulator of the colibactin determinant. Our global gene expression data also indicate that colibactin production is functionally connected to pathways involved in biosynthesis or metabolization of particular amino acids and secondary metabolites.

DISCUSSION
Although our knowledge of the colibactin biosynthetic mechanism and of the molecular structure of this nonribosomal peptide/polyketide and its mode of action is continuously increasing (11, 19, 20, 22-24, 26, 27, 36, 37), we still have little information on the mechanisms of its regulation. So far, colibactin production and regulation of colibactin gene expression have been mainly analyzed in the context of DNA damage and genotoxicity (1,8,9), extraintestinal pathogenic E. coli (ExPEC) pathogenesis (6, 7) but also as factors contributing to the probiotic character of E. coli strain Nissle 1917 (3, 4). Model strains used for the functional characterization of factors involved in colibactin expression are human clinical isolates of ST95 (E. coli O18:K1 newborn meningitis strains IHE3034 and SP15) (1, 35), ST73 (uropathogenic E. coli isolate CFT073) (22), probiotic E. coli strain Nissle 1917 (ST73) (33,36), or laboratory strain E. coli DH10B (pBeloBAC11-pks) (24,38,39). Many E. coli human or murine isolates used for in vivo models of colibactin function have not been characterized in detail at the genomic level (12,13,40). To extend the spectrum of well-characterized model strains, we present here the complete genome sequence of human fecal E. coli isolate M1/5, which was isolated from a healthy human individual. This strain represents the large group of colibactin-positive intestinal colonizers of phylogroup B2 without the increased pathogenic potential of ExPEC.  (43), indicated that these strains differ with respect to the presence of determinants for chaperone-usher fimbriae and other adhesins and of factors involved in serum resistance and iron uptake as well as in gene clusters coding for type 3, type 5, and type 6 secretion systems. E. coli M1/5 is the only isolate among these commensals that carries the colibactin as well as two different flagellar determinants (Flag-1 and Flag-2). Compared to the aforementioned fecal isolates from healthy individuals, the E. coli M1/5 genome comprises the highest number of detected genes that may contribute to fitness of extraintestinal pathogenic E. coli (see Table S1 in the supplemental material).
Our analyses indicate that ClbR is the main transcriptional activator specifically regulating colibactin biosynthesis. ClbR expression directly correlates with the production of functional colibactin. The clbR transcriptional profiles in five fecal and clinical E. coli model isolates were in principle very similar but also exhibited strain-specific differences. The molecular reasons for the different levels of clbR transcription, in particular, the reasons for the perception of the possible presence of inducing or repressing stimuli and their transmission through the interaction of different regulatory elements, are still not understood. It was reported previously that colibactin expression is affected by the bacterial growth state and the composition of the growth medium (33). We also know that the availability of spermidine and other polyamines is required for colibactin production via an as-yet-unknown regulatory mechanism (45). Our reporter gene-based analysis of clbR promoter activity suggests a form of resourcedependent and growth phase-dependent regulation, reflecting distinct clbR expression patterns with varying promoter activities. In most cases, the highest peaks of clbR promoter activity were detected at the transition from late exponential phase to early stationary phase and in less-complex media ( Fig. 3; see also Fig. S3 in the supplemental material). Shifts of the colibactin gene expression peak were also observed in our previous study analyzing the impact of different carbon sources on transcription of the colibactin determinant in E. coli strain Nissle 1917 (33). This suggests that transcription of the colibactin determinant is regulated at least in part in response to the availability of metabolites and/or the activity of the central carbon metabolism. The fact that growth in TB, in contrast to other rich media, led to a significantly longer phase of clbR transcription which extended far into the stationary growth phase (Fig. 3) further illustrates that multiple parameters, such as the supply of nutrients and energy, are integrated into the regulation of colibactin expression. This integration is probably achieved by the action of regulators within the framework of regulatory networks. Screenings of random transposon insertion libraries of colibactin-producing strains have not yet identified any regulatory protein that could be involved in adjustment of colibactin expression in response to changing growth conditions. Importantly, a direct impact on colibactin production via regulation of clbA gene expression has been reported for iron via the ferric uptake regulator (Fur) protein and the RyhB small regulatory RNA (35,46). Here, we show that expression of the main transcriptional activator of the colibactin determinant is directly affected by iron availability also (Fig. 4). The exact mechanism responsible for iron-dependent regulation of clbR remains to be elucidated. While Fur binding sites have been detected upstream of clbA (35,46), we did not identify such regions upstream of clbR. The fact that clbR promoter activity was unaffected by increased or decreased iron availability in LB, whereas addition of ferric chloride led to a drastic decrease of clbR promoter activity in minimal medium, may suggest that iron availability rather than availability of nutrients determines the level of clbR expression. Iron and nutrients are highly abundant in rich LB medium such that a decrease or increase in the iron concentration has no effect on clbR expression. In contrast, addition of ferric iron to the minimal medium poorer in iron (and nutrients) led to a strong reduction of clbR promoter activity. A further decrease of iron availability in minimal medium upon addition of the chelator deferoxamine had no effect. This finding suggests that, at least in the LB and M9ϩCAS medium investigated here, iron availability plays a more important role in the regulation of clbR transcription than nutrient supply. Our results are in accordance with other published data: Transcriptomic analyses indicate that colibactin expression is (at least at the transcriptional level) increased in E. coli upon growth in (iron-limited) urine relative to LB and is detectable during colonization of the intestinal tract and that intestinal inflammation promotes colibactin expression (47)(48)(49). Colibactin is considered a virulence factor of newborn meningitis-causing E. coli during sepsis (7) and seems to be important for long-term intestinal colonization (50). A form of regulation of colibactin expression that responds to iron availability in different body niches may support bacterial fitness in the blood or in the context of an Fe(III) ion gradient between the intestinal lumen and the intestinal epithelium, because it ensures fine-tuned colibactin expression under appropriate conditions. The close connection between the metallophore yersiniabactin and colibactin, on both the genomic and regulatory levels (51), is clearly underlined by the iron-dependent regulation of colibactin expression. Deeper insights into the structural diversity of molecules derived from the colibactin pathway, relevant growth conditions, and regulatory mechanisms will help us to better understand the biological role of this interesting and controversial secondary metabolite, whose production has been described to promote cancer but also to be associated with the probiotic character of E. coli (3,4,13,16,(52)(53)(54)(55)(56)(57).
ClbR contains a LuxR-type DNA-binding helix-turn-helix (HTH) domain in the C-terminal region which is usually found in response regulators of the LuxR/FixJ family. Classical LuxR/FixJ response regulators possess an N-terminal receiver (REC) domain. This REC domain is responsible for the activation of the response regulator (i) upon phosphorylation by a transmembrane sensor kinase (58), (ii) upon binding of N-acyl homoserine lactones (59,60), or (iii) upon binding of multiple ligands (MalT) (61). A LuxR-like response regulator lacking a REC domain has been described previously also: transcription factor GerE regulates transcription of spore coat genes in the late sporulation stage in B. subtilis (34, 62). LuxR-type regulators are usually transcriptional activators, although some can act as repressors or, like GerE, can act as both activators and repressors (58). On the basis of its amino acid sequence and predicted structure, ClbR resembles the GerE protein (Fig. S2). As ClbR and GerE lack an N-terminal regulatory REC domain, it is unlikely that ligand binding results in activation of both regulatory proteins. Expression of GerE is regulated at the transcriptional level by a hierarchical cascade involving two different sigma factors and different levels of regulation, including transcription, DNA recombination, and proprotein processing (63). The molecular mechanism responsible for the growth phase-dependent and metabolite-dependent regulation of clbR expression may be as complex as that for GerE and remains to be characterized.
We demonstrated that ClbR binds to an intergenic region that separates the two divergently oriented gene clusters involved in regulation and activation of colibactin expression (clbR and clbA) or in biosynthesis and delivery of the polyketide (clbB to clbS) (Fig. 1, top panel). This intergenic region comprises the promoters of clbR and clbB as well as an additional regulatory element, i.e., the VNTR region, which is located upstream of the clbR translational start site (Fig. 1, bottom panel; see also Fig. 5A). Our results indicate that the size of the VNTR region affects clbR promoter activity and thus colibactin production ( Fig. 7 and 8). We assume that the number of repeats and thus the size and secondary structure of the clbR 5= region can affect clbR transcription as well as transcript stability and the efficacy of translation.
ClbR seems to be a pks island-specific regulator, because our transcriptome analysis in clbR deletion and overexpressing mutants did not indicate that transcription of other genes located outside the pks island was directly dependent on ClbR availability. Only transcription of the clb gene cluster (group 1 genes) changed accordingly with the deletion or overexpression of clbR ( Fig. 8A; see also Fig. S6A and B). Otherwise, the few genes which exhibited deregulation in response to deletion or overexpression of clbR (Fig. 8A, group 2 to group 5) responded in fairly similar manners to clbQ deletion and overexpression also. Only the members of a small group of genes (comprising group 1 and group 4), including clbP and clbQ, were upregulated upon clbQ overexpression (Fig. 8A). For cases in which some genes/gene products were able to be clustered according to function, these functionally associated groups are highlighted in Fig. 8. The fact that deletion and overexpression of clbR and clbQ led to very similar global expression profiles at the transcriptome and proteome levels (Fig. 8), in particular, the expression profiles of those genes whose products can be functionally associated with amino acid (histidine, phenylalanine, tyrosines, and tryptophan) and secondary metabolite biosynthesis but also with organonitrogen compound and primary metabolism (gene groups 2 and 5, protein group IV), suggests that these processes are indirectly affected by ClbR and ClbQ and are thus responsive to the colibactin biosynthetic process.
We detected a ClbQ-dependent effect on colibactin expression. Whereas expression of the colibactin determinant at the transcript and protein levels was only weakly affected ( Fig. 6; see also Fig. S6D and E), the level of colibactin production, as assessed by the amount of DNA damage and the concentration of the precolibactin cleavage product C14-Asn, was significantly reduced (Fig. 8C). This finding suggests that colibactin production responds to the availability of metabolites, intermediates, or end products of the colibactin biosynthetic process. Both deletion and overexpression of clbQ in E. coli M1/5 resulted in reduced levels of ␥-H2AX upon bacterial infection of HeLa cells as well as in significant reduction of C14-Asn levels ( Fig. 8B and C). This indicates that interference with the biosynthetic flow mediated by increased unloading of intermediates as well as by clogging of the colibactin biosynthesis pipeline reduces the efficacy of colibactin production. Our results therefore corroborate the in vitro observation that ClbQ facilitates an additional unloading of colibactin synthesis intermediates as previously reported (26). Whereas the absence of ClbQ results in stalled biosynthesis and reduced mature colibactin levels, increased clbQ expression may enhance the release of intermediates from the polyketide assembly line and thus also scale down release of the final colibactin product(s).
Our transcriptome and proteome analyses of the clbQ deletion and overexpressing mutants suggest the existence of a regulatory element affecting clbS expression located within clbQ. While the impact of clbQ deletion or overexpression on clbP transcript levels can be explained by read mapping to the partially overlapping genes clbP and clbQ, increased clbS expression may result from the presence of a promoter region within clbQ which may be affected upon clbQ deletion or overexpression (Fig. S6D). Experimental confirmation of transcriptional start sites within the colibactin determinant will be a key future goal to understand in detail the regulation of colibactin expression.

MATERIALS AND METHODS
Genome sequencing, assembly, annotation, and gene content analysis. The genome of E. coli M1/5 was sequenced by combining PacBio and Illumina sequencing technologies. For details on the genome sequencing method, see Text S1 in the supplemental material. Genome assembly was performed with the RS_HGAP_Assembly.3 protocol included in SMRT Portal version 2.2.0. For error correction of PacBio HGAP assembly, Illumina short reads were mapped to the assembled chromosome and plasmid sequences using the Burrows-Wheeler Aligner (BWA) (64). A final quality score of QV60 was confirmed using the RS_BridgeMapper.1 protocol. Automated genome annotation was carried out using PGAP (65). Identification of plasmids, serotypes, and acquired resistance genes was performed with the Web-based tools PlasmidFinder (v1.3) (66), SerotypeFinder (v1.1) (67), and ResFinder (v2.1) (68), respectively. We used a stringent identity threshold of 95% to determine plasmids based on replicon sequences. To examine serotypes and acquired resistance genes, sequence identity levels of 85% and 90% were used, respectively. The length requirement was set to a minimum of 60% sequence coverage for both serotyping and identification of resistance genes. For the determination of virulence factors (VFs), we used the E. coli VF collection (v0.1), which comprises 12 distinct VF groups containing 1,154 deduced protein sequences of virulence-associated genes (69).
Bacterial strains, plasmids, genetic manipulations, and media. Information about the strains and plasmids used in this study is provided in Table 2. All E. coli M1/5 mutants generated and used in this study are based on streptomycin-resistant mutant strain E. coli M1/5 rpsLK42R (51). For the sake of simplicity, we use the shorter description "M1/5" instead of "M1/5 rpsLK42R" in all corresponding mutant designations. Bacterial cultivation was usually performed in lysogeny broth (LB) (10 g liter Ϫ1 tryptone, 5 g liter Ϫ1 yeast extract, 5 g liter Ϫ1 sodium chloride) with shaking at 37°C. If necessary, antibiotics were used at the following concentrations: ampicillin, 100 g ml Ϫ1 ; chloramphenicol, 15 g ml Ϫ1 and 25 g ml Ϫ1 for low-copy-number and medium-copy-number resistance cassettes, respectively; kanamycin, 50 g ml Ϫ1 . L-Arabinose was used at a concentration of 3% (wt/vol) to induce clbR expression from pBAD-clbR. Agar plates were prepared by adding 16 g liter Ϫ1 agar.
For genetic modifications of the bacterial chromosome, the bacteriophage Lambda Red recombinase-dependent approach was used (71), partially refined for scarless mutagenesis (72) or followed by an integrated FRT (FLP recombination target) site/FLP-recombinase-dependent step to generate luciferase-reporter strains (73). The construction of plasmids and mutants is described in detail in Text S1. Oligonucleotides used in this study are given in Table S3 in the supplemental material.
ClbR purification. The ClbR protein was heterologously expressed as a ClbR-intein-chitin binding domain fusion using a NEB impact system (New England Biolabs). This allowed chitin affinity chromatography and subsequent thiol-mediated self-cleavage of the intein during elution, yielding tag-less ClbR protein. Expression and purification of ClbR are explained in detail in Text S1.
Electrophoretic mobility shift assays (EMSA). Specific interactions of ClbR with DNA were detected using a digoxigenin (DIG) gel shift kit (second generation; Roche Diagnostics, Mannheim, Germany). Regions of interest containing a potential ClbR binding site were amplified via PCR and subjected to DIG labeling. Labeled probes were incubated with rising concentrations of ClbR and poly[d(I·C)] as a nonspecific competitor probing for specific DNA-ClbR interactions. Bound and unbound probes were separated by native polyacrylamide gel electrophoresis followed by a DNA blotting and were then detected via chemiluminescence according to the manufacturer's protocol. For details, see Text S1.
Growth-dependent reporter gene assays. To measure promoter activity via the use of a luminescence reporter (33), we generated reporter fusions by cloning the native clbB-to-clbR intergenic region with a VNTR region comprising either 5 or 20 repeats into the attB locus of E. coli strain M1/5, thereby replacing clbR with luxABCDE. Luciferase expression in the resulting reporter strains was under the control of the clbR promoter. Strains containing the reporter fusion were inoculated 1:100 from overnight cultures in 150 l in 96-well flat-bottom white polystyrol plates (Greiner Bio-One, Frickenhausen, Germany). OD 600 and luminescence levels were measured for 23 h in a Tecan Infinite 200 reader (Tecan  Colibactin cytotoxicity assays. HeLa cell infection assays were used to analyze the cytotoxic effect of colibactin on mammalian cells via demonstration of megalocytosis and DNA damage (1). The protocols are described in detail in Text S1.
RNA sequencing. Bacterial cultures were grown as described for the luminescence assays until an OD 595 of 0.4 was reached. After pooling of biological replicates, 0.125 volumes of an ethanol-phenol mix (95%:5%) were added and the suspension was incubated 5 min on ice before the bacterial cells were harvested by centrifugation. After the pellet was frozen at -80°C, the cells were thawed and treated with lysozyme by resuspending the pellet in 35 l Tris-EDTA (TE) buffer that included 85 mg ml Ϫ1 lysozyme and incubating the samples for 10 min at room temperature. The sample was subjected to vortex mixing every minute for 10 s. Total RNA was extracted by the use of TRIzol (Invitrogen, Karlsruhe, Germany) according to the manufacturer's protocol. After DNase treatment and PCR-based quality control, the quality of the RNA samples was further assessed by RNA electrophoresis using a 2200 TapeStation system (Agilent). Strand-specific cDNA libraries processed with and without terminator exonuclease (TEX) treatment to enrich primary transcripts were prepared and sequenced (Illumina NextSeq 500, 1ϫ 75-bp single reads) by Vertis Biotechnologie AG (Freising, Germany). Obtained sequencing data were processed using BWA (64) for mapping of transcripts and ReadXplorer 2 (74) for visualization and utilization of differential gene expression results by the use of DESeq2 (75). Differential RNA-seq, i.e., the comparison of results obtained from samples treated with TEX to those obtained from samples left untreated, allowed detection of primary transcripts and determination transcription start sites (76).
Whole-protein content analysis of RNA sequencing samples. An aliquot of the bacterial cultures grown for RNA sequencing was also harvested for the analysis of differential protein expression by mass spectrometry. A detailed description of the mass spectrometry-based expression analysis can be found in Text S1.

Quantification of colibactin intermediate N-myristoyl-D-asparagine.
We compared results representing the ability of E. coli strains to produce colibactin upon growth in M9 medium supplemented with Casamino Acids by quantifying the precolibactin cleavage product N-myristoyl-D-asparagine (C14-Asn) (36,37). Details are provided in Text S1.
Statistical analysis. Statistical analyses were performed using GraphPad Prism software (version 6.0). The figures show mean values with standard deviations (STDEV.P). Unpaired t tests were used as indicated. A P value of Ͻ0.05 was considered statistically significant and is indicated by one asterisk (*). A P value of Ͻ0.01 is denoted by two asterisks (**), a P value of Ͻ0.001 by three asterisks (***), and a P value of Ͻ0.0001 by four asterisks (****).
Data availability. The complete genome sequence of E. coli M1/5 (the chromosome and plasmids pM1-5_30 and pM1-5_120) has been deposited at NCBI GenBank under accession numbers CP053296 to CP053298. All RNA-seq data files are available from the Gene Expression Omnibus database (accession no. GSE143807).