Widespread Strain-Specific Distinctions in Chromosomal Binding Dynamics of a Highly Conserved Escherichia coli Transcription Factor

In bacterial cells, hundreds of transcription factors coordinate gene regulation and thus are a major driver of cellular processes. However, the immense diversity in bacterial genome structure and content makes deciphering regulatory networks challenging. This is particularly apparent for the model organism Escherichia coli as evolution has driven the emergence of species members with highly distinct genomes, which occupy extremely different niches in nature. While it is well-known that transcription factors must integrate horizontally acquired DNA into the regulatory network of the cell, the extent of regulatory diversity beyond single model strains is unclear. We have explored this concept in four evolutionarily distinct E. coli strains and show that a highly conserved transcription factor displays unprecedented diversity in chromosomal binding sites. Importantly, this diversity is not restricted to strain-specific DNA or mutation in binding sites. This observation suggests that strain-specific regulatory networks are potentially widespread within individual bacterial species.

G ene regulation is at the core of all cellular processes, and its tailoring can drive new phenotypes that benefit bacterial cells (1,2). Bacterial species carry genes that encode hundreds of transcription factors (TFs) that coordinate gene regulation, often in response to environmental stimuli (3)(4)(5). This process has been well studied for pathogens, as virulence factors are usually encoded on horizontally acquired genetic elements that require integration into the regulatory network of the cell. Variation in genomic content extends far beyond genes encoding virulence factors and while diversity in regulatory networks is well accepted for TF orthologues present in different species, the possibility that TFs can be tailored to individual members of the same species is largely unexplored (5,6). Regulatory networks are often studied in Escherichia coli as a model organism (usually the nonpathogenic commensal K-12), but the vast genomic diversity within this species results in ecologically distinct strains that occupy extremely different niches (7)(8)(9)(10)(11). This is particularly prominent in pathotypes such as enterohemorrhagic E. coli (EHEC), uropathogenic E. coli (UPEC) and neonatal-meningitis E. coli (NMEC) that thrive in the terminal colon, urinary tract, and brain, respectively (12). The highly specific mechanisms that drive pathogenesis, as well as basic survival, in such distinct environments require gene regulation to be controlled on an individual level.
We recently discovered that a highly conserved E. coli LysR-type TF (named YhaJ) has been repurposed to perform drastically different roles in EHEC and UPEC (13,14). YhaJ was found to regulate no common genes but activated virulence factors unique to each strain (type 3 secretion in EHEC and type 1 fimbriae in UPEC). We also observed distinctions in binding to conserved chromosomal targets (most strikingly the acid tolerance regulator gadX) and their subsequent regulation, but the reasons driving this were unknown. We noticed that YhaJ expression was dramatically higher in EHEC compared to UPEC when grown under identical conditions and hypothesized that this was a driver of the strain-specific gene regulation observed. This prompted us to examine the phenomenon using the divergent yhaJ-yhaK regulatory region as a model system. This region contains a YhaJ binding site and overlapping promoters that are 100% conserved in four evolutionarily distinct E. coli strains-EHEC, UPEC, NMEC, and K-12 ( Fig. 1A and B). Note that the protein-coding sequence of YhaJ is completely identical except for an amino acid substitution in UPEC, which we previously confirmed does not impact its apparent functionality (14). Despite this commonality, testing YhaJ expression revealed that YhaJ dosage varied drastically between strains grown in minimal essential medium (MEM), with UPEC for example displaying significantly (P ϭ 0.036) lower YhaJ expression than EHEC. In contrast, growth in rich media (LB) yielded almost identical expression levels of YhaJ in all strains (Fig. 1C). The phenomenon of TF dosage can impact specific stress responses and even offer an evolutionary advantage for individual strains, as has been described for the E. coli sigma factor RpoS (15)(16)(17). We reasoned that the natural variation in TF expression would correlate with binding levels to a common target. Surprisingly, chromatin immunoprecipitation (ChIP)-PCR analysis revealed that YhaJ enrichment at the yhaK promoter region did not vary with TF dosage. This was particularly prominent for UPEC in minimal medium, which displayed the highest enrichment of YhaJ signal at this region despite YhaJ expression being comparably lower (Fig. 1D). This result was corroborated by finding that naturally enhancing YhaJ expression levels in LB had no significant effect on YhaJ enrichment at this binding site. To confirm this phenomenon, we analyzed a known YhaJ target gene, yqjF, and similarly found that occupancy was not conditionally dependent or driven by YhaJ expression (see Fig. S1 in the supplemental material) (18). These results collectively indicate that differences in YhaJ enrichment at conserved sites are not exclusively driven by unexpected variations in TF dosage between members of the same species.
We reasoned that variation in YhaJ expression levels between strains would likely result in global binding distinctions and that growth in LB, which normalizes YhaJ dosage, would alleviate these differences. Using ChIP-sequencing (ChIP-seq) of natively expressed YhaJ in each strain's genetic background, we mapped the global binding profile in vivo under the two aforementioned conditions, revealing a total of 78 significantly enriched peaks (P Յ 0.01; two biological replicates) across all strains, including binding sites intragenic in origin ( Fig. 2A; see Fig. S2 and Data Set S1 in the supplemental material) (19). Three major observations were made in light of this. First, increased YhaJ expression levels between conditions correlated with an increase in the number of global YhaJ binding sites relative to each strain (EHEC, 23 to 39; UPEC, 7 to 46; NMEC, 12 to 22; K-12, 12 to 34). Second, only ϳ15% of all binding sites (5/33 in MEM; 12/73 in LB) were occupied in all four strains, regardless of the conditions (Fig. 2B). Third, the majority of strain-specific binding sites identified were not restricted to chromosomal loci unique to each genetic background. While condition-dependent binding sites were not unexpected, these data collectively reveal that the regulatory network of YhaJ is surprisingly heterogenous despite its highly conserved nature across the E. coli phylogeny. This suggests that strain-specific regulatory roles for YhaJ are potentially widespread in E. coli (5,14).
Regulatory adaptations in strain-specific loci represent logical repurposing of a TF, particularly for pathogens encoding horizontally acquired virulence factors. We previously demonstrated that this was the case for YhaJ, directly regulating pathogenicity island-and prophage-encoded type 3 secretion system components in EHEC, as well as type 1 fimbriae in UPEC (13,14). Here, we identified highly significant (P ϭ 4.9 ϫ 10 Ϫ52 ) conditional YhaJ binding in the regulatory region of the EHEC type 6 secretion system (T6SS) cluster, exclusively in LB (Fig. 2C) (20). This system plays a role in EHEC virulence and macrophage survival, and this result highlights important flexibility in YhaJ for controlling several virulence factors in a single pathotype (21). Interestingly, UPEC encodes a distinct T6SS, but no YhaJ binding was evident in vivo, suggesting pathotype-specific requirements for T6SS regulation (Fig. S3) (20).
While binding to strain-specific loci (particularly virulence-associated loci) is not uncommon for TFs, we were more intrigued by the surprising heterogeneity in global Binding sites that match the YhaJ consensus motif are highlighted in red on the right. Specific mutations in binding site sequences associated with a lack of YhaJ enrichment in the particular strain indicated are highlighted by the arrow (black to red sequences). All read tracks were scaled to be comparable to each other for individual gene regions. binding profiles for conserved genes. While YhaJ binding could be driven by growth conditions across all strains (for instance, the known target yceP; Fig. S4), we also identified conditional YhaJ binding to conserved gene regions in specific strains. For example, YhaJ bound (P ϭ 1.35 ϫ 10 Ϫ7 ) upstream of the EHEC yecI gene (encoding ferritin) exclusively in MEM. LysR-type TFs such as YhaJ recognize partial-dyadic T-N11-A sequences in promoter regions (22). Importantly, analysis of the yecI DNA region revealed that while the YhaJ binding sequence in UPEC and NMEC contained a mutation that affects its partial-dyadic symmetry and possibly functionality, the E. coli K-12 motif was identical to the EHEC motif (Fig. 2D). This suggests that strain-specific binding is not exclusively driven by such mutations. We further examined this hypothesis in all cases where binding to a conserved region was absent for one strain. YhaJ motif mutations were present in only three of the nine cases identified (pstB, tdcE, and yedL), revealing that the majority of strain-specific binding distinctions identified are driven by factors independent of mutations to the YhaJ recognition sequence that may include competitive or cooperative binding of other TFs to similar regions in a strainspecific manner (Fig. 2E) (14).
Conclusion. We have observed that a highly conserved TF has adapted its genetic behavior drastically on an individual level to create strain-specific chromosomal interactions in E. coli. These distinctions are amplified according to TF dosage and are not driven purely by binding site mutations or attraction to strain-specific genetic loci. The resulting binding profiles represent a previously underappreciated diversity in intraspecies regulatory potential and highlight that global gene regulation studies should not rely on single model strains. Given the ecological diversity of E. coli as a species and the fact that it dedicates a large proportion of its genome to regulation (ϳ6% in E. coli K-12 [6]), we anticipate that this is a widespread phenomenon allowing the emergence of strain-specific regulatory networks.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, DOCX file, 0.02 MB.