A Systems Biology Approach To Disentangle the Direct and Indirect Effects of Global Transcription Factors on Gene Expression in Escherichia coli

ABSTRACT Delineating the pleiotropic effects of global transcriptional factors (TFs) is critical for understanding the system-wide regulatory response in a particular environment. Currently, with the availability of genome-wide TF binding and gene expression data for Escherichia coli, several gene targets can be assigned to the global TFs, albeit inconsistently. Here, using a systematic integrated approach with emphasis on metabolism, we characterized and quantified the direct effects as well as the growth rate-mediated indirect effects of global TFs using deletion mutants of FNR, ArcA, and IHF regulators (focal TFs) under glucose fermentative conditions. This categorization enabled us to disentangle the dense connections seen within the transcriptional regulatory network (TRN) and determine the exact nature of focal TF-driven epistatic interactions with other global and pathway-specific local regulators (iTFs). We extended our analysis to combinatorial deletions of these focal TFs to determine their cross talk effects as well as conserved patterns of regulatory interactions. Moreover, we predicted with high confidence several novel metabolite-iTF interactions using inferred iTF activity changes arising from the allosteric effects of the intracellular metabolites perturbed as a result of the absence of focal TFs. Further, using compendium level computational analyses, we revealed not only the coexpressed genes regulated by these focal TFs but also the coordination of the direct and indirect target expression in the context of the economy of intracellular metabolites. Overall, this study leverages the fundamentals of TF-driven regulation, which could serve as a better template for deciphering mechanisms underlying complex phenotypes. IMPORTANCE Understanding the pleiotropic effects of global TFs on gene expression and their relevance underlying a specific response in a particular environment has been challenging. Here, we distinguish the TF-driven direct effects and growth rate-mediated indirect effects on gene expression using single- and double-deletion mutants of FNR, ArcA, and IHF regulators under anaerobic glucose fermentation. Such dissection assists us in unraveling the precise nature of interactions existing between the focal TF(s) and several other TFs, including those altered by allosteric effects of intracellular metabolites. We were able to recapitulate the previously known metabolite-TF interactions and predict novel interactions with high confidence. Furthermore, we determined that the direct and indirect gene expression have a strong connection with each other when analyzed using the coexpressed- or coregulated-gene approach. Deciphering such regulatory patterns explicitly from the metabolism point of view would be valuable in understanding other unpredicted complex regulation existing in nature.

G lobal transcriptional factors (TFs) are the cornerstone in formulating an appropriate response to any environmental cue (1)(2)(3)(4). While the importance of global TFs is well appreciated, the true direct targets and indirect effects on gene expression together with their role in particular environment remain unclear. The direct targets represent genes whose promoters have TF binding sites and respond consecutively to that specific TF gene expression or activity under a particular condition. In contrast, the indirect targets are genes whose expression is not directly influenced by the specific TF but which respond to the changes in TF in an unknown manner. Seminal studies have shown that these genome-wide indirect effects are primarily mediated by alterations in growth rates or the physiological state of the organism that encompass regulation by RNA polymerase-associated sigma factors, ribosomes, and intracellular metabolites as well as by other interacting global or local TFs (5)(6)(7)(8)(9)(10)(11)(12). In addition, previous studies have revealed the extent of TF and growth rate-dependent effects on gene expression and metabolic pathways during exponential growth as well as in the event of adaptive evolution (5,8). Therefore, to deepen our understanding of regulation prevalent within a transcriptional regulatory network (TRN), it is fundamental to investigate these indirect or growth rate-mediated genes in conjunction with the known functions of the direct targets of the interrogated global TF for a particular environment.
The global TFs which regulate a wide array of genes are known to work in concert with local or pathway-specific TFs that regulate fewer genes, albeit in a less characterized fashion. In Escherichia coli, it is estimated that around 51% of the gene expression changes are modulated by the global TFs (CRP, FNR, ArcA, IHF, HNS, Fis, and Lrp), while the remaining 49% are driven by their interaction with other pathway-specific local TFs (3,13). As gene expression can be controlled by more than two regulators, it becomes crucial to determine the mode of interaction between the global TFs and other interacting TFs within the TRN. These interactions could quite often represent epistatic interactions rather than direct protein-protein interactions. Deciphering whether such epistatic interactions in the TRN are due to changes in gene expression of their interacting TFs or metabolite-driven changes in their TF activities remains fundamental.
In a canonical sense, the transcript levels majorly dictate the abundance of enzymes followed by changes in metabolic flux and metabolite levels (14)(15)(16)(17). In vivo studies concerning the allosteric effects of metabolites on TFs are now gaining importance, as they further enhance our understanding of metabolism (15,18). Identifying such metabolite-TF interactions during batch exponential growth based on steady-state physiological concentrations of the intracellular metabolites and activities of the TFs has been difficult. However, even a simple correlation analysis of the metabolite concentrations with the inferred TF activities based on gene expression data will provide insights into the plausible metabolite-TF interactions, which can then be verified experimentally. Apart from the metabolite-TF interactions, other modes of growth rate-dependent regulation can control the expression of the indirect gene targets. Further, questions concerning the expression of specific sets of indirect genes and their connection with direct targets remain unanswered. Therefore, deconvoluting the complexity of gene expression in terms of coexpression as well as identifying the clusters of directly coregulated genes becomes imperative (19). Given the availability of enormous gene expression data sets for E. coli K-12 under various nutritional or stress conditions (20), it would be a valuable resource to derive the conserved coexpressed genes. However, the analysis of coexpressed genes in the context of maintaining the demandand-supply equilibrium of intracellular metabolites remains largely unexplored.
Here, we sought to investigate the complex layers of regulation driven by the global TFs using single and combinatorial knockouts of FNR, ArcA, and IHF regulators under glucose-fermentative conditions. FNR and ArcA are best studied for their roles during anaerobiosis and energy metabolism (21)(22)(23)(24)(25)(26). Similarly, IHF was been well characterized for its functions as a nucleoid protein as well as its role in central carbon metabolism (26)(27)(28). In this study, we examined these global regulators with a focus on direct and indirect targets, epistatic interactions with other TFs, and their preferences for specific metabolic genes in order to formulate an appropriate response. To this end, we utilized an integrated framework that disentangled the pleiotropic regulatory responses for a particular environment (Fig. 1). First, we determined the gene expression profiles and metabolite levels in the strains lacking these global TFs, followed by extensive computational analyses. We delineated the direct and indirect target genes and inferred the regulator activities using network component analysis (NCA) (29) and the corresponding metabolite-TF interactions, which together gave us insights into the regulator-driven epistatic interactions within the TRN. Moreover, we elucidated the coordination between the direct and indirectly coregulated genes by employing weighted gene coexpression network analysis (WGCNA) (30-32) on E. coli K-12 compendium gene expression data. Furthermore, we elucidated the cross talk between these global TFs using their combinatorial deletions to decipher whether they exhibit additive or nonadditive effects on the genes that they regulate. Overall, our study can serve as a template for understanding systems-level regulatory events and may even be applicable for deciphering the mechanistic strategies developed during pathogenesis, gut microbiome interactions, and cancer metabolism.

RESULTS
Cross talk and nature of interaction within FNR, ArcA, and IHF regulatory networks. We constructed three double mutants, FA (Dfnr DarcA), FI (Dfnr Dihf), and AI (DarcA Dihf), by deleting arcA, fnr, and ihf (both ihfA and ihfB) genes in combination, using homologous recombination (33). We monitored the gene expression profiles for the double mutants compared to the wild type (WT) in the mid-exponential phase under glucose-fermentative conditions. The gene expression data for the FA mutant were found to be consistent with a previous study (34) (see Data Set S1 in the supplemental material). However, gene expression in the mutants FI and AI had not previously been characterized. The gene expression data for the single mutants were obtained from our previous study (26). We investigated the mode of interaction between each of the regulators in the double mutant-specifically, whether the expression patterns could be explained by either of the deleted TFs or by combined pleiotropic effects of both the regulators, defined here as cross talk. Preliminary analysis was performed for both the double and single mutants using the total number of differentially expressed genes (DEGs) (P , 0.05, Benjamini-Hochberg [BH] adjusted, log 2 fold change $ j1j). We observed ;250, ;510, and ;1,250 DEGs in FA, FI, and AI, respectively, compared to the WT. In FA and FI, the number of DEGs was lower than in their single mutants combined ( Fig. 2A). This was in contrast to AI, wherein the number of DEGs was greater than the sum of the DEGs in its single mutants. This indicated a FIG 1 Overview of the integrated workflow to dissect the complex regulation of gene expression within the TRN. Analyses were carried out for the global regulators FNR, ArcA, and IHF by employing single or combinatorial gene deletions under anaerobic glucose-fermentative conditions. Yellow ovals represent global TFs, blue ovals represent other TFs (global or local), orange ovals represent metabolites, gray boxes represent direct and indirect metabolic gene targets, and arrows represent regulation (positive or negative).

Deciphering Direct and Indirect Effects of Global TFs
Microbiology Spectrum greater extent of cross talk between ArcA and IHF regulators than in combinations involving FNR.
To understand the exact nature of this cross talk as well as to gain quantitative insights, we first assessed whether the DEGs in the single mutants reproduced their gene expression patterns in the double mutants. We found that the genes whose expression was significantly downregulated or upregulated (DEGs) in the single mutants were also downregulated or upregulated in the double mutant (log 2 fold change , 0 or . 0) ( Fig. S1 and S2). Next, we plotted the genes which were differentially expressed in the double mutant but showed a .2-fold change in gene expression in both single knockouts added together (Fig. 2B). The results indicated that either of the single mutants could explain the observed changes in the double mutants, which provided direct evidence of widespread additive effects between the regulators. We quantified the effects of each of the regulator deletions in the double mutants and performed a two-proportion Z-test with Yates' continuity correction for assessing significance (P , 0.01) (Data Set S1). In the DEGs of the double mutants, upregulated or downregulated gene expression changes could be attributed to specific regulator effects rather than a collective effect. In addition, the genes which could not be assigned (;20 to 38%) to either regulator deletion effect were significant in the case of AI followed by FA and FI mutants, depicting the extent of cross talk. Among DEGs of the double mutants, we observed only ;11 to 20% overlapping genes of both the single mutants with a .2-fold change in gene expression. Overall, these data indicated that in the double mutants, each of the regulators showed independent but additive cross talk effects due to their deletion.
Regulatory effects on key metabolic pathways. In our previous work, we demonstrated the system-wide effects of the transcriptional regulators FNR, ArcA, and IHF on key metabolic pathways (26). To decipher whether those metabolic pathways were perturbed in the double mutants as well, we carried out a KEGG pathway enrichment analysis (Proteomaps-based gene classification [35]) on DEGs obtained in double mutants with statistical significance analysis. We found that 69% and 74% of downregulated and upregulated DEGs, respectively, were enriched for metabolic pathways in FA compared to the WT (Data Set S1). In these KEGG pathway-enriched (KPE) upregulated DEGs, we observed significant enrichment of amino acid metabolism related to arginine degradation, which generates internal ammonia, costly tricarboxylic acid (TCA) cycle genes that are unnecessary during anaerobic fermentation, and unneeded lipid and steroid degradation genes (Fig. 3A). Additionally, in KPE-downregulated DEGs, we observed significant enrichment of amino acid metabolism and transport-related genes (Fig. 3B). The amino acid metabolism genes were related to arginine and asparagine biosynthesis, whereas the transporters were related to alternate carbon sources. Among the DEGs in FI, with respect to the WT, 57% and 58% of downregulated and upregulated DEGs, respectively, showed enrichment for metabolic pathways (Data Set S1). Only transport-encoding genes were upregulated in KEGG pathway-enriched DEGs (Fig. S3A). Alternatively, the KEGG pathway-downregulated DEGs (Fig. S3B) showed enrichment for metabolism related to branched-chain-amino-acid biosynthesis, glycolysis, and chaperone-mediated protein folding pathways. On the other hand, 53% and 66% of downregulated and upregulated DEGs, respectively, showed enrichment of metabolic pathways in AI compared to the WT (Data Set S1). Amino acid metabolism genes related to putrescine and arginine degradation, aerobic oxidative phosphorylation genes and TCA cycle genes, transporters, and several transcription factors were upregulated in KEGG pathway-enriched DEGs in AI compared to the WT (Fig. S3C to E). We obtained enrichment for amino acid metabolism related to branched-chain amino acid biosynthesis and bacterial motility proteins in downregulated KEGG pathway-enriched DEGs in AI compared to the WT (Fig. S3F). These data, which indicate significant perturbations of metabolic pathways in the double mutants, were consistent with the observations in single mutants. For instance, the metabolic pathway analysis in FA showed dominant effects of ArcA regulation, whereas both FI and AI were strongly inclined toward IHF-based regulation. In addition, the pattern of enrichment was also consistent with previous observations in the case of single mutants, wherein the necessary metabolic pathways of amino acid and nucleotide metabolism were downregulated and unnecessary metabolic pathways such as the aerobic TCA cycle, oxidative phosphorylation, and alternate carbon transporters were upregulated.
Loss of TFs results in changes in growth rates and metabolite levels. We monitored the physiological effects-namely, glucose uptake, growth rate, and secretion profiles-of mixed acid fermentation products as a result of the deletion of TFs. We observed detrimental effects on growth rate in all the double mutants relative to the WT (Table S1). The mutants FA, FI, and AI showed reductions of 40%, 30%, and 37% in growth rate, respectively. This reduction in the growth rates was consistent with the severe perturbation seen in the metabolic pathways. The glucose uptake reduction in FA and FI mutants strongly correlated with their respective growth rate changes. Interestingly, the reduction in glucose uptake in AI was only 13% compared to the WT, as opposed to the DarcA and Dihf mutants, despite severe reduction in its growth rate. A severe reduction in biomass yield was also observed in the case of AI (23%), indicating reduced carbon partitioning toward biomass (Table S1). The secretion profiles of exometabolites, namely, ethanol, acetate, formate, lactate, and succinate, revealed distinct changes in their yields. Toward a detailed understanding of metabolism, we monitored the intracellular metabolite concentrations for the glucose fermentative metabolism in all the mutants using 13 C-labeled targeted metabolomics (8,26). Among the measured metabolites, the intracellular concentrations of glycolytic and TCA cycle intermediates and proteinogenic amino acids were significantly different compared to those in the WT (Student's t test, FDR , 0.05). In FI and AI, we observed increased accumulation of the metabolic precursors and amino acids consistent with the accumulations seen in the single mutants, as reported previously (26) (Fig. S4). These accumulations reflected the inability of the mutants to utilize the metabolites for biomass synthesis, in agreement with their reduced growth rates. However, we observed lower pool sizes of these metabolites in the FA mutant than in the WT. This might indicate an adverse carbon limitation (36) as well as the reduced availability of active ribosome pools (37)(38)(39)(40)(41)(42) prevalent in slow-growth conditions, which needs to be investigated in the future. Dissecting direct and indirect targets of the TF from their pleiotropic effects on metabolism. We sought to disentangle the growth rate-mediated indirect effects from the direct effects arising from the absence of FNR, ArcA, and IHF regulators and their pairwise combinations, referred to here as focal TFs. Here, we employed a multipronged approach using the growth rate-dependent genes obtained from chemostat experiments, genome-wide studies to enrich for the binding region of the regulators and comparison with available gene expression data to account for all the genes modulated purely by the focal TFs. First, we performed glucose-limited chemostat cultivations of the WT strain under aerobic respiratory and anaerobic fermentative conditions at a fixed dilution rate of 0.21 h 21 to identify the genes whose expression changes were growth rate independent. A comparison of DEGs in the WT in anaerobic fermentation relative to the WT under aerobic conditions highlighted the genes that are specific to regulation or genes that are not altered due to slow-growth effects during anaerobic fermentation. It was shown previously that the gene expression changes in response to growth rate effects are consistent, despite glucose-limited or excess-glucose conditions (5,8,9). Thus, the DEGs identified under the chemostat conditions were then utilized to characterize the indirect targets as growth rate-mediated effects and the direct targets as regulation-specific effects by the focal TF(s) under batch exponential growth conditions. The DEGs enriched for direct targets of the focal TFs in the double mutants were further validated by determining the binding motifs using FIMO analysis (43) together with the available in vivo chromatin immunoprecipitation (ChIP) binding or other in vitro binding data (22-25, 27, 44, 45). Further, a few regulation-specific genes (15 to 20% in double mutants and 20 to 32% in single mutants) (Data Set S1) found in our analysis that were not characterized as direct targets could potentially represent uncharacterized novel targets (46) of the focal TF(s).
We delineated the direct regulation-specific or indirect growth-rate mediated effects on metabolism by taking into consideration the KPE DEGs. Overall, we found that 55 to 60% of upregulated and downregulated KPE DEGs were regulation specific in FA, compared to 30 to 33% in FI and AI (Fig. 4A). The slow-growth-mediated changes in gene expression were dominant in the FI and AI mutants. In the case of single mutants, the proportions of indirect genes were most prominent in the Dihf mutant, followed by the Dfnr and DarcA mutants. The significantly enriched pathways within the upregulated KPE DEGs in the double mutants (except FI) as well as the single mutants (except the Dihf mutant) showed a greater extent of regulation-specific effects. However, the data did not ascertain whether the reduction in growth rate was proportional to the extent of growth rate-mediated or indirect targets. On the other hand, the significantly enriched pathways within the downregulated KPE DEGs of the strains (except FA) aligned well with the reduced growth rate ( Exploring the effects of regulator-driven epistatic interactions on indirect target expression. To understand the epistatic interactions of the focal TF with other global or local TFs, we considered that the enrichment of direct targets majorly responded to the focal TF(s). In addition, we assumed that the probability of regulating the indirect targets by the focal TF is low, as they might represent silent targets (23, 46) during exponential growth under glucose-fermentative condition. Silent targets represent genes which, despite having upstream binding sites for the focal TF, are not regulated under this specific condition. This could be due to either a masking effect by other TFs or a cooperative effect of multiple TFs or condition-specific regulation. Hence, we analyzed the indirect gene targets for any TF enrichment other than the focal TF. These regulated TFs are epistatically interacting TFs (iTFs) whose gene expression or activity is modulated by the focal TF(s). The enrichment of genes for the presence of binding of iTFs was done using EcoCyc or RegulonDB (44,45) and further assessed for statistical significance (Fisher's exact test, FDR , 0.05). The KPE DEGs of each of the mutants were used as an initial set to characterize all the possible iTF enrichments ( Table 1). Classification of the KPE DEGs based on our approach indicated (i) a drastic reduction in spurious interactions as observed from iTF enrichment on indirect targets (Table 1) and (ii) an increased number of direct targets for each of the focal TFs (Data Set S1). The activity of the iTFs was then determined based on the manual interpretation of the changes in their target gene expression (Data Set S1). We found that in most cases, changes in inferred activity of iTFs were either not correspondent or antagonistically correspondent with the changes in their gene expression. Only  (Table 1). Overall, assessment of the changes in gene expressions of the iTFs itself revealed that the inferred activity of the iTFs was perturbed for reasons not involving its gene expression. The other possible mechanisms involve changes at the posttranslational level or the protein level. As the transcript abundances majorly dictate the protein concentrations (47,48), we further investigated the focal TF-iTF epistatic interaction at the posttranslational level. The metabolic precursors and amino acids, apart from being building blocks for protein biomass, can also act as signaling molecules that can allosterically modulate the activities of the TFs to bring about changes in the gene expression of the organism (14,15,17,49,50). The concept used to define focal TF-iTF epistatic interaction was that metabolites perturbed due to changes in direct targets by focal TF(s) allosterically regulate the iTFs, which then modulated the indirect or growth rate-mediated gene targets.
To address the putative allosteric metabolite-iTF interactions, we carried out a linear correlation analysis (Pearson correlation coefficient [r] . 0.75, P , 0.05) between the KPE direct-target-specific metabolite concentrations and the iTF activities. As determination of metabolite-iTF interactions by linear correlation is prone to high numbers of false positives, to increase the confidence of such putative metabolite-iTF interactions, two stringent criteria were applied. First, only metabolites that were perturbed due to the direct targets of focal TFs were considered. To obtain the list of metabolites, a joint pathway analysis was performed using the significantly perturbed metabolites with the combined upregulated and downregulated KPE direct targets for each strain separately and the metabolite-gene pair which satisfied the minimum threshold (see Materials and Methods). The metabolites that responded to changes in direct targets were mainly glutamate, aspartate, phosphoenolpyruvate (PEP), a-ketoglutaric acid (aKG), g -aminobutyric acid (GABA), citrate, glycine, and to a lesser extent glutamine, arginine, and lysine (Data Set S1). The other metabolites which were significantly (FDR , 0.05) perturbed in each of the strains could be ascribable to the concomitant changes in its precursor metabolites. For instance, changes in methionine levels were The iTFs in bold are those that show correlation with metabolite, do not show dual regulation, or are successfully reproduced from NCA analysis, which were further utilized for determining conserved metabolite-iTF interactions seen across all the mutants (Fig. 5A). The underlined iTFs are those whose gene expression matches their activity changes. The iTF in italics represents a new iTF not found in the complete list mentioned in the third column. c Complete list of iTFs (iTF enrichment on upregulated and downregulated genes) without any categorization. d Final iTFs (iTF enrichment for indirect targets) after categorization using our approach. concomitant with the changes in its precursor aspartate (Fig. S4). Second, the metabolite-iTF pair was reanalyzed using the pathway information available in EcoCyc to identify whether the metabolite (Pearson correlation coefficient [r] . 0.9, P , 0.05) is related to the KPE indirect gene targets for that particular iTF. This work is similar to the distance criterion (15) approach, wherein the iTF regulating the gene (that encodes the enzyme) and metabolite are part of the same metabolic subsystem. Additionally, we compared the metabolite-iTF interactions with published studies reporting similar putative metabolite-TF interactions (15).
To quantify the iTF activities, we used the network component analysis (NCA) algorithm (29), which derives the regulatory activity from the experimentally measured gene expression profiles. For each of the strains, NCA was performed separately on the direct or indirect KPE DEGs. With NCA, we were able to reproduce 80% of the iTF activities for each of the strains, including the known interactions as well as novel putative metabolite-iTF interactions (Fig. S6A to F). For instance, we were able to reproduce the known interactions of PuuR with putrescine (inferred from GABA [26]) as well as CRP with cAMP (inferred from PEP [8,51]) ( Fig. 5A and B). The metabolite cAMP was not detectable in our liquid chromatography-mass spectrometry (LC-MS) run, probably due to the limit of detection and anaerobic fermentation conditions with high glucose uptake rate (it was detectable under aerobic conditions). As cAMP and PEP levels are interdependent (52,53), we assumed the association of CRP with PEP, similar to its association with cAMP. For FA, we inferred increased activity of CRP, NtrC, and Fur and decreased activity of PuuR along with their allosteric effectors, such as PEP, aKG, citrate, GABA, glycine, and arginine (Data Set S1). This inference was consistent with increased activity of CRP with its interacting metabolites PEP, aKG, and citrate in the DarcA mutant and NtrC in the Dfnr mutant as well as decreased activity of PuuR with its allosteric metabolites GABA, PEP, and aKG in both single mutants. In the FI mutant, we inferred increased activity of Fur, CysB, and Nac and decreased activity of Lrp. The inferred activity changes of CysB and Lrp along with their interactions with metabolites GABA, arginine, glutamate, aKG, and PEP were consistent with their inferred activity profile observed in the Dihf mutant. For AI, we inferred a large number of regulators with changes in their activity profiles, concomitant with the widespread changes in the gene expression profiles, which reinforces the greater cross talk effects between ArcA and IHF regulators. The decreased activity of Lrp, PuuR, TrpR, and UlaR and the increased activity of CysB along with their interactions with glutamine, glutamate, aspartate, PEP, arginine, and GABA in the AI mutant was consistent with the inferred activity profiles of Lrp, CysB, TrpR, and UlaR in the Dihf mutant and PuuR in the DarcA mutant. Next, we examined across the strains the metabolite-iTF interactions as a function of growth rate (Fig. 5A). This analysis revealed not only the shared patterns of interactions seen in the double and single mutants but also the cross talk effects. For instance, metabolite-ArcA interactions were more prominent at higher growth rates, as opposed to metabolite-Fur interactions, which were seen with lower growth rates. AI showed the highest number of metabolite-TF interactions compared to its single mutants, which reiterated the cross talk effects, in contrast to the interactions seen in FA and FI mutants. Based on all the metabolite-iTF interactions across the strains, we wanted to determine the conserved metabolite-iTF pairs. In total, we observed 119 connections between 10 metabolites and 16 iTFs (Fig. 5B). We found that 12 metabolite-iTF connections which showed positive as well as negative associations (glutamate-ArcA, glutamate/GABA/arginine-PuuR and aKG/ glutamine-Fur) accounted for only 10% of the metabolite-iTF connections, whereas the remaining 90% were unique. We observed that glutamate, aspartate, PEP, and GABA had the greatest number of allosteric connections with iTFs. These were followed by arginine, a-KG, and glutamine with medium connectivity and glycine, lysine, and citrate with low connectivity with their respective iTFs. A total of 29 connections with 9 iTFs and 8 metabolites were observed (Fig. 5B). Among these, ;45% of the interactions/associations were reported in a previous study (15). Overall, we were able to determine with high confidence novel metabolite-TF interactions that provided insights into the regulatory interactions within the TRN (Data Set S1).
Coexpression analysis of the direct and indirect KPE DEGs. Though a significant proportion (30%) of the indirect genes reflected the metabolite-driven allosteric regulation, a higher proportion (70%) of the genes did not show any significant (Fisher's exact test, P , 0.05, BH adjusted) enrichment for any iTF binding (Data Set S1). This remaining set of genes was annotated as being predominantly regulated by RNA polymerase-associated sigma factors either by using EcoCyc or based on promoter prediction using iPromoter-2L (54, 55) (Data Set S1). RNA polymerase-associated sigma factors were used because they are known to constitute an indirect readout of growth rate-dependent global machinery (6,7,11). Further, to gain a systematic understanding of the coordinated changes in direct targets and indirect targets, including those driven by the underlying metabolite-iTF interaction, we utilized WGCNA (31), an unsupervised technique to cluster genes using correlation of their expression profiles. Such analyses can also reveal the coregulated genes directly modulated by the focal TF(s), which has been a major challenge in understanding the regulatory principles (19,56). We employed a large-scale compendium of gene expression data (COLOMBOS) (20) comprising microarray and RNA sequencing data as an input for generating the network of coexpressed genes in E. coli (Fig. S7A and B). Among 4,077 contrasts and 4,189 genes used for the analysis, we observed occurrences of 7 coexpression merged modules (Data Set S1). Among these modules, blue (1,274) and turquoise (1,572) covered the majority (;68%) of the genes. We performed gene ontology (GO) analysis to identify the functions of the KPE genes falling in a particular module for both the direct and indirect upregulated and downregulated genes in each of the mutants.
We identified the coregulated genes and their functions controlled by the regulators FNR, ArcA, and IHF, as indicated by the module colors. Further, the GO categories enriched for each of the modules in the direct targets were found to be similar to the enrichments seen in the indirect targets ( Fig. 6 and Fig. S8A to D). Across the mutants, we found enrichment for generation of precursor metabolites, deoxyribonucleotide metabolism, or protein transport in the blue module in the downregulated direct targets that showed coexpression with negative regulation of macromolecule biosynthesis in the blue module of downregulated indirect targets. Such coexpression patterns between the direct and indirect genes underscore the constraints on precursor or metabolite usage. Similarly, we found enrichment for the a-amino acid metabolic process, the arginine-biosynthetic process, or the glutamate metabolic process in the turquoise module in the downregulated direct targets that showed coexpression with the cellular compound nitrogen biosynthetic process in the turquoise module of downregulated indirect targets. Moreover, in the case of the red module of upregulated direct and indirect targets, we found coexpression of genes enriched for the TCA cycle, fatty acid metabolism, or aerobic respiration with the putrescine catabolic process in most of the mutants. This reflected the regeneration of aKG through putrescine degradation, as the upregulation of the aerobic TCA cycle genes can result in its futile cycling. Overall, if the direct effects of the focal TFs resulted in changes in the biosynthetic process, the metabolite-driven iTFs concomitantly regulated alternative biosynthetic genes (Data Set S1), thereby regulating the economy of the metabolite.

DISCUSSION
Bacteria like E. coli have developed regulatory strategies coordinated by TFs and growth rate-dependent global machinery to sense and respond to environmental cues. To gain a clear understanding of the regulatory logic, delineating the TF-driven pleiotropic effects that exist within the TRN becomes imperative. Using our systemlevel framework, we deciphered (i) direct and indirect pleiotropic effects of the focal TFs, (ii) epistatic interactions between the focal TFs and iTFs, (iii) high-confidence putative metabolite-iTF interactions, and (iv) coordination between the direct and indirect target expression that is related to the conditional regulation by specific iTFs of downstream genes. We evaluated these objectives using multi-omics data derived from E. coli strains with combinatorial deletions of FNR, ArcA, and IHF regulators under glucose-fermentative conditions. Consistent with our previous work that explained the individual effects of each of the focal TFs at the gene level (26), we found that each double-deletion mutant of the focal TFs resulted in downregulation of necessary metabolic pathways and upregulation of unnecessary ones. Despite the similarity at the pathway level, individual genes were modulated by specific focal TFs in each double mutant rather than collectively. Such results indicated the independent yet additive mode of epistatic interactions between the global regulators, with more cross talk effects seen in the combination involving ArcA and IHF. Overall, by employing a combinatorial deletion approach, this study provides a paradigm for identifying the conserved regulatory pattern across the mutants, the amount of overlap in genes and functions, and target expression specificity arising out of the mode of epistatic interactions of the TFs.
Using our integrated approach, we found that across the strains lacking FNR, ArcA, or IHF and their combinations, a greater percentage (.45%) of DEGs were associated with the direct effects of the focal TF than was estimated earlier (;17%) (26). This was determined from the available genome-wide TF binding data, from the detection of novel genes wherein the upstream DNA binding sequence was predicted with good confidence, and from chemostat-based gene expression analyses. Further, we identified the indirect gene targets under batch fermentative conditions for each of the TF deletion strains. These indirect targets of the focal TFs were determined from the gene sets modulated by growth rate in terms of RNA polymerase sigma factor distribution as well as the interactions with other iTFs based on metabolite regulation. Functionally, both the direct and indirect targets of the focal TFs represented not only the metabolic bottleneck genes but also hedging or alternate carbon metabolism genes that can facilitate growth under new environmental conditions. Last, we inferred that the distribution of the sigma factors across the indirect genes could also explain their gene expression changes in response to nutrient limitations (largely carbon and nitrogen and to a lesser extent phosphate or sulfur) as a result of reduced growth rates (12,57,58).
Despite the overriding complexity within the TRN, our findings demonstrated the nature of focal TF-iTF epistatic interactions and their effect on regulation of gene expression. We found that a majority of the gene expression changes modulated by the enriched iTFs could be ascribable to their posttranslational effects, highlighting the metabolite-driven regulation. Among the total putative metabolite-iTF interactions identified with high confidence in this study, we not only validated the metabolite-iTF interactions (;45%) by comparing the results with published experimental data but also predicted many novel potential metabolite-iTF interactions (;55%). Although we report the putative metabolite-TF interactions using correlation analysis, the stringent criteria used to screen metabolites that (i) responded to changes in direct targets of the focal TF, (ii) were retained after pathway classification, (iii) crossed the threshold correlation value of 0.9, and (iv) were frequently observed across different mutants made these overall high-confidence metabolite-TF associations. Apart from metabolite-based focal TF-iTF interactions, we considered that indirect gene targets that could not be assigned to known iTFs were regulated primarily by RNA polymerase sigma factor. However, it is of particular importance to understand the association between the indirect and direct targets which influence growth physiology. Coexpression analysis using WGCNA revealed the complementary nature of indirect gene targets with direct gene targets in accordance with the metabolite adjustments. For instance, to mitigate the competition for available cellular resources such as precursors and proteinogenic amino acids, if the pathways synthesizing the metabolite is perturbed due to the focal TF, the organism will either reduce the usage of the metabolite in other pathways or promote its degradation if the concentration is found to exceed the demand for the metabolite. Moreover, the coexpressed targets of the focal TF elucidate the preference for interaction with specific iTFs as well as their downstream cognate genes. Hence, we were able to dissect the complexity of regulation on gene expression changes relevant to metabolism. Further, these coexpressed metabolic genes can be considered coregulated genes, given the similar expression profiles and upstream consensus motif sequences for any TF. Overall, the insights obtained from the gene coexpression patterns for E. coli K-12 can now be employed to explore other gene-regulatory mechanisms that enable the organism to cope with stress or changes in carbon sources.
Our previous work showed the allocation of necessary and unnecessary proteome sector and metabolite adjustments, driven by the global transcriptional factors FNR, ArcA, and IHF (26). Here, using an integrated flow of analysis (Fig. 1), we elucidated the plausible regulatory interaction and cross talk effects in formulating an appropriate biological response for a particular environment. However, we note that the correlation analysis explores only the possibility of occurrence of putative metabolite-TF interactions and does not imply direct causation. Thus, additional work involving the experimental validation of metabolite-TF interactions (15,59) as well as characterizing other modes of regulation, such as small RNA (sRNA) (60) or protein-protein interaction (61), would further contribute to our comprehensive understanding. In addition, the demarcation of the indirect and direct targets of global TFs is condition dependent, meaning that these indirect targets have binding sites for the TF but represent silent targets for that particular environment (23,46). These sites could also represent relaxed-specificity sites that could enable adaptation to newer environments (62). Additional experiments supporting TF binding to promoters of uncharacterized or putative direct targets will further strengthen our predictions. Nevertheless, this study serves as a valuable approach to accurately deconvolve the complex layers of TF-driven regulation and systems-level regulation, in general.

MATERIALS AND METHODS
Strain construction. The E. coli K-12 strain MG1655 was used to construct the double deletions Dfnr DarcA (FA), Dfnr Dihf (FI), and DarcA Dihf (AI). Dihf represents a deletion of the genes encoding both subunits (Table S2). The strains were generated using lRed-mediated homologous recombination using plasmid pKD46, with pKD13 as the template plasmid and pCP20 for removal of markers (33). The gene deletions were verified by PCR as well as Sanger sequencing using primers against the gene of interest. After verifications, stocks were prepared in 25% glycerol (final concentration) and stored at 280°C.
Media and strain cultivation. The cells were cultivated anaerobically in a 500-mL bioreactor (Applikon) containing 400 mL of M9 minimal medium (6 g/L anhydrous Na 2 HPO 4 , 3 g/L KH 2 PO 4 , 1 g/L NH 4 Cl, 0.5 g/L NaCl plus 2 mM MgSO 4 and 0.1 mM CaCl 2 ) with 2 g/L glucose as the carbon source. The cultivations and bioreactor settings were done as reported previously (26). Briefly, the cells were used to inoculate the 400 mL of M9 minimal medium with 2 g/L glucose in a bioreactor, and the starting optical density (OD) of the culture was set at approximately 0.07. The temperature in the bioreactor was maintained at 37°C, stirrer speed was 150 rpm, and pH was maintained at 7.2. The bioreactor was sparged with nitrogen to maintain the dissolved oxygen levels at zero. The slope of the linear regression line fit the natural logarithm of the OD 600 (OD at 600 nm)-versus-time plot; ln(OD 600 ) versus time (in hours) was used to calculate the exponential growth rate of the organism. All the physiological and multi-omics characterizations were done in the exponential phase of the organism. The dry cell weight (DCW) for each strain was experimentally derived during its exponential growth phase, where an OD 600 of 1.0 corresponds to 0.44 g DCW h 21 .
All phenotypic characterizations were performed in three biological replicates (n = 3) as described previously (26). Extracellular supernatants collected during the exponential growth phase were used to determine the rates of glucose uptake as well as the rates of secretion of extracellular mixed acid fermentation metabolites, such as acetate, lactate, pyruvate, succinate, formate, and ethanol, and the yields thereafter. All physiological characterizations were tested for significance by using an unpaired twotailed Student's t test.
Cell cultivations were performed in a chemostat (5,8), wherein the feed medium composition was the same as that of the batch medium. The WT strain was grown under anaerobic batch growth until 80% of the maximum biomass was produced (biomass formation was monitored by OD measurements) Deciphering Direct and Indirect Effects of Global TFs Microbiology Spectrum or until 80% of the glucose present in the medium was consumed (glucose levels were monitored by high-performance liquid chromatography [HPLC]), after which addition of the feed medium was started. The dilution rate in the chemostat was maintained at 0.21 h 21 . Once the cells reached the steady state, the chemostat cultivation was continued for 3 to 5 residence times, after which the cells were harvested for RNA extraction and sequencing. The RNA sequencing data for aerobic batch growth performed under identical chemostat conditions were obtained from reference (8). RNA extraction and mRNA enrichment. The RNA extraction for each strain was done in the mid-exponential phase for two biological replicates as described previously (26). Library preparation for RNA sequencing was done following a paired-end strand-specific protocol using a NEBNext Ultra Directional RNA library kit. RNA sequencing was carried out on an Illumina HiSeq 4000 system in rapid run mode using the 2 Â 150-bp format at Genotypic Technologies, Bangalore, India. For chemostat cultivations (aerobic and anaerobic), the harvested RNA was extracted and sent to C-CAMP NCBS Bangalore for RNA sequencing with a NEBNext Ultra II Directional RNA library kit (single strand specific) and run on an Illumina HiSeq 2500 system (rapid run mode) using the 1 Â 50-bp format.
Transcriptome data analysis. The raw RNA sequencing files from the mid-exponential phase of batch growth and the chemostat cultivations were first trimmed with CUTADAPT (63) to eliminate the adapter sequences and the low-quality reads. The trimmed reads were mapped to the E. coli K-12 MG1655 genome (GenBank accession number NC_00913.3) using Burrows-Wheeler Aligner (BWA) (64), and the .sam file was then converted to a .bam file using SAMtools (65). Counts at each gene level were assigned with FeatureCounts (66) using the reference genome provided in gene transfer format. The 4,466 genes used for the analysis were obtained from the EcoCyc database (v. 21.5) (44). All rRNA, tRNA, and sRNA genes were excluded from the analysis. Differentially expressed genes were analyzed using EdgeR (67) after eliminating the genes having fewer than 10 reads. The genes that had a $2-fold change in gene expression (in both directions) and had a BH-adjusted P value of ,0.05 were considered DEGs and were used for subsequent analysis. The DEGs were then enriched for metabolic pathways using KEGG pathway classification, and the significance of each pathway was checked using a hypergeometric test (P , 0.05, BH adjusted) in R (4.0.3). The criterion for a pathway to be considered significant was at least nine upregulated or downregulated genes. DEGs without any accession ID (EcoCyc), such as phantom genes, were not included in the analysis.
In the comparison of transcriptomes between the WT anaerobic fermentation versus the WT aerobic respiration from glucose-limited chemostat cultivations, the DEGs were annotated as regulation-specific (Data Set S1) and represent genes that are not altered due to changes in growth rate or low growth rate. These DEGs were filtered with the threshold of a (Benjamini-Hochberg-adjusted) P value of ,0.05 and at least a 2-fold change in expression. These genes were then used to identify the regulation-specific and growth rate-mediated genes in the transcriptome changes of single or double mutants compared to the WT in batch mid-exponential phase. These regulation-specific genes were further validated using previously reported direct gene targets of these regulators from in vivo ChIP binding data as well as binding motif prediction analysis using FIMO from the MEME suite (43). The global and pathway-specific local regulators together with their cognate gene targets were obtained from RegulonDB (v 10.6.3) (45). The KPE DEGs were independently enriched for global and local TFs, and significance was assessed by Fisher's exact test (BH adjusted, P , 0.05) with an arbitrary set of 5 genes. It should be noted that even though the deleted TFs can be enriched in the case of indirect gene sets, they were not considered for analysis based on the assumption that these genes have other TF binding sites that overlap or simply that they are bound yet represent silent targets (23,46) for this particular condition. Further, enriched TFs which have dual regulation were considered neither for NCA nor for correlation analysis. The inferred activity of the iTFs was determined based on the manual interpretation of the changes in its target gene expression. For instance, we defined the activity of an iTF repressor as decreased when there was activation of its gene targets.
Gene ontology analysis was performed using the AmiGO 2 database (68) or topGO package (69). The top categories with either an FDR of ,0.05 or a P value of ,0.05 that accounts for a larger set of genes were considered for further analysis.
Metabolomics. (i) Extraction of metabolites. The metabolite extraction for each of the strains was performed in the mid-exponential phase. Metabolite extraction was performed for 3 biological and 2 technical replicates (n = 3). A fast-cooling method was employed to quench the harvested cells as described previously (8,26). 13 C-labeled extracts were used as internal standards generated by growing WT E. coli K-12 MG1655 cells aerobically and were then used to quantify the key metabolite pool sizes using an isotopebased dilution method as done previously. During the earliest stage of extraction, each of the samples was spiked with a fixed volume of internal standard. The volume of the pooled internal standard was added to each sample such that the external and internal standard peak heights differed less than 5-fold. Following the sample preparation steps (8,26), the samples were completely dried in a vacuum concentrator and stored at 280°C. Before analysis in an LC-MS/MS instrument, samples were reconstituted in 100 mL of chilled (220°C) acetonitrile-buffered water (60:40 [vol/vol]) and centrifuged at 4°C for 10 min, and the supernatant was transferred to prechilled glass vials. Buffered water contained 10 mM ammonium acetate (pH 9.23). pH was adjusted using ammonium hydroxide prepared in HPLC-grade water.
(ii) LC-MS/MS settings. The samples for metabolomic analysis were run on a high-resolution Orbitrap Q Exactive Plus mass spectrophotometer (Thermo) equipped with a SeQuant ZIC-pHILIC column (Merck; dimensions, 150 mm by 2.1 mm by 5 mm) and in sequence with a ZIC-pHILIC guard column (Merck; dimensions, 20 mm by 2.1 mm by 5 mm) with the LC-MS/MS settings and calibrations as described previously (8,26). An alkaline mobile phase was used with an electron spray ionization (ESI) ion source operated in positive (M 1 H) 1 and negative (M 2 H) 2 modes separately. A full scan range of 66.7 to 1000 m/z was applied for positive as well as negative modes, and the spectrum data type was set to profile mode. The mobile phase used for chromatographic separation consisted of a nonpolar phase A (acetonitrile-water mixed at a ratio of 9:1, 10 mM ammonium acetate [pH 9.23] using ammonium hydroxide) and polar phase B (acetonitrile-water mixed at a ratio of 1:9, 10 mM ammonium acetate [pH 9.23] using ammonium hydroxide). A linear gradient with a flow rate of 200 mL/min was set as follows: 0 to 1 min, 0% B; 1 to 32 min, 77.5% B; 32 to 36 min, 77.5% B to 100% B; 36 to 40 min, hold at 100% B; 40 to 50 min, 100% B to 0% B; 50 to 65 min, re-equilibration with 0% B. An injection volume of 5 mL was used for all the samples and standards.
(iii) Metabolomics data analysis. The raw data from the machine were assessed using the Xcalibur 4.3 (Thermo Fisher Scientific) Quan Browser software package as done previously (8,26). A semiquantitative analysis was performed using peak heights of precursor ions with a signal/noise (S/N) ratio of more than 3, a retention time window of less than 60 s, and less than 5 ppm mass error. Height ratios of metabolites were obtained after normalizing the peak heights of the samples to the peak heights of the internal standards used. The absolute concentrations of metabolites were determined using 12 C-labeled chemical standards (mix of 40 metabolites) as described previously (8,26). Metabolite concentrations were imputed for missing values and biomass normalized. These values were g-log transformed for identification of the statistically significant metabolites using MetaboAnalyst. Metabolites with an FDR of ,0.05 (2-tailed unpaired Student's t test) were considered for further analysis. The absolute metabolite concentrations were expressed as micromoles per gram of DCW or height ratio per gram of DCW.
NCA. NCA was performed using the MATLAB scripts and codes made available at GitHub from reference 15 with minor modifications. Only the network component analysis and bootstrap code files were used in this study. NCA predicts TF activity by least-square optimization between the log 10 -transformed gene expression data (in transcripts per million) and the product of connectivity (a priori-known interactions between the TF and cognate genes) and TF activity (29). NCA was performed on a smaller data set, especially on the growth-rate specific or indirect gene sets that belonged to the KPE upregulated and downregulated DEGs. As we used a smaller data set, only regulators which were able to reproduce the inferred activity profiles similar to manual interpretation were retained for the analysis. For instance, if a TF positively regulated 4 to 5 genes and is primarily an activator based on the a priori knowledge and if this TF showed inferred activity values that indicate a repressor or were inconclusive, then the TF was excluded from further analyses. We were able to successfully reproduce the TF interactions in more than 80% of the TFs across all the growth-rate specific genes in the up-and downregulated KPE DEGs for all the strains. An RNA polymerase that connected all the genes within a set was retained. Analysis of sigma factors such as sigma-70 or sigma-38 along with ppGpp was excluded. The simulation was performed in triplicate for each condition along with bootstrapping to ensure confidence in smaller data sets (70) as well as to keep the data symmetric for correlation with metabolite concentrations. The bootstrap was performed on each TF with 95% confidence intervals across all randomized iterations in each replicate and condition.
Metabolite-TF interactions. We analyzed the target enrichments of iTFs available from RegulonDB and EcoCyc in the KPE DEGs restricted to indirect gene targets. The activity of transcriptional regulators was predicted based on the mode of interaction with its target genes. For instance, if the TF is a repressor and the gene targets of the regulator showed a reduction in gene expression, then we inferred that the activity of the TF had increased. A similar analogy was applied in the case of the activator. However, if the TF had less than 70% of its targets positively or negatively regulated, then the regulator was assigned to be a dual regulator. TFs whose interaction with their complete gene lists was not conclusive (,70%) were excluded from the correlation analysis. For instance, if a TF regulated 10 genes, with expression of 7 of them being repressed and expression of 3 being activated, then the TF is regarded as a repressor and retained for analysis. However, if expression of 5 genes was activated and that of the other 5 was repressed, then the TF supposedly functions as a dual regulator, and such regulators were excluded from the correlation analysis. Further, the TFs whose activities were not reproducible in NCA in terms of their manual interpretation were also excluded from the correlation analysis. Next, the metabolites that fell under the direct targets of the focal TFs were retained for correlation analysis. The logic is that the deletion of TF results in changes in gene expression of the direct targets followed by metabolite changes associated with these genes, and these metabolites allosterically affect the inferred activity of other iTFs that regulate part of the growth rate-mediated or indirect gene sets. To identify the direct target-specific metabolites, we performed joint pathway analysis (Metaboanalyst [71]) using both the upregulated and downregulated direct target genes with statistically significant metabolite changes for each of the mutants. Pairs (at least 1 metabolite with many genes or 1 gene with many metabolites) which had an impact value of .0.1 (Metaboanalyst) were retained for the correlation analysis (cutoff arbitrarily set). A Pearson correlation analysis was performed to identify the metabolite-TF interactions. This correlation analysis represents the possibility of occurrence of metabolite-TF interaction and require experimental confirmation. An initial screening of metabolite-TF interactions involved selecting metabolite-TF pairs that had a Pearson correlation coefficient of .0.75 but were statistically significant (P , 0.05). Later, those that had a correlation coefficient [r] of .0.9 and were statistically significant (P , 0.01) and those obtained from literature but found to be significant (r . 0.8, P , 0.05) in our data set were retained for further analysis.
WGCNA analysis on compendium gene expression data. The compendium gene expression data for E. coli K-12 were obtained from the COLOMBOS database (20). COLOMBOS has gene expression data from microarray and transcriptome sequencing (RNA-seq) experiments for various environmental conditions and is further normalized to make it comparable across these conditions. To make the data suitable for coexpression network analysis, the data without centering or scaling were used to impute missing values for the large data set using the global SVD impute function (72,73). This was performed using pcaMethods (74) with the following parameters: no. of principle components = 5, threshold = 0.01, maxSteps = 1,000, cross-validation cv = "q2". Further, the expression data were assessed by the goodSamplegenes function from the WGCNA (31) package in R. Finally, the total numbers of genes and samples considered for each organism were 4,189 genes and 4,077 samples. For construction of coexpression networks, the scale-free topology was computed using the pickSoftThreshold function and biweight midcorrelation with default parameters to obtain robust modules from a larger data set. This was followed by construction of a signed topology overlap matrix (TOM). The coexpressed genes were clustered into modules using the flashClust function with the minimum module size set to 30. The cuttreeDynamic function was used to merge the highly similar modules based on eigengenes and a correlation value of 0.75. Each module was represented using a color, and if not mentioned otherwise, the color gray corresponds to uncorrelated genes. In the figures and for all downstream analysis, the genes that belong to a module color and are regulated by iTF (whether enriched or not) are shown/categorized separately. For coexpression analysis, the module colors which had at least 5 genes were considered. Gene ontology analysis was performed on the gene list assigned to a module color considering the above-mentioned criteria.
Regulator prediction for promoters of direct gene targets. The potential direct targets which were obtained from chemostat analysis and which did not have reported literature (EcoCyc) or experimental backing were reanalyzed using the FIMO tool (5.4.1) from the MEME suite. The consensus motif used for the analysis for FNR (59-TTGATNNNNATCAA-39 or 59-TTGATYWNNATCAA-39), ArcA (59-GTTAATTAAATGTTA-39 or 59-GTTAAWWAAATGTTA-39), and IHF (59-DTWYYYNGYNNATTTW-39 or 59-WWWNWWNWTNTTW-39) regulators was obtained from the literature (22-25, 27, 44, 45). Statistically significant motifs (P , 0.01) were retained such that the binding sites were strictly within 250 bp upstream of the transcription start site of the target gene. Any motif found within the coding region of any gene was excluded from the analysis.
Sigma factor prediction for promoters of indirect gene targets. Indirect genes for which no specific global or local regulator could be assigned by either database or for which, despite the presence of binding sites, the regulators were not significantly enriched were considered to be just under the control of RNA polymerase sigma factors. Promoter prediction was done either using iPromoter2L (54,55) for genes that did not have an annotated RNA polymerase sigma binding site or predictions from EcoCyc. An upstream region 200 bp from the start codon of the gene was chosen for the analysis. The region size (200 to 81 bp) was accordingly adjusted such that the regions did not overlap any coding or promoter regions of other genes. If there was no annotation of any of the sigma factors or if genes were arranged in an operon, then the region upstream of the first gene or the nearest gene was used for the sigma factor identification. The choice and order of arrangement of the predicted sigma factors were done based on the frequency and closeness to the transcriptional start site.
Data availability. The RNA sequencing data and the processed files from this study are available at NCBI Geo (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc) with accession number GSE195954. The metabolomics data presented in this study are available at the NIH Common Fund's National Metabolomics Data Repository (NMDR) website, the Metabolomics Workbench (https://www.metabolomicsworkbench.org), where it has been assigned project ID PR001316.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, XLSX file, 0.4 MB. SUPPLEMENTAL FILE 2, PDF file, 2.7 MB.