A system-wide network reconstruction of gene regulation and metabolism in Escherichia coli

Genome-scale metabolic models have become a fundamental tool for examining metabolic principles. However, metabolism is not solely characterized by the underlying biochemical reactions and catalyzing enzymes, but also affected by regulatory events. Since the pioneering work of Covert and co-workers as well as Shlomi and co-workers it is debated, how regulation and metabolism synergistically characterize a coherent cellular state. The first approaches started from metabolic models, which were extended by the regulation of the encoding genes of the catalyzing enzymes. By now, bioinformatics databases in principle allow addressing the challenge of integrating regulation and metabolism on a system-wide level. Collecting information from several databases we provide a network representation of the integrated gene regulatory and metabolic system for Escherichia coli, including major cellular processes, from metabolic processes via protein modification to a variety of regulatory events. Besides transcriptional regulation, we also take into account regulation of translation, enzyme activities and reactions. Our network model provides novel topological characterizations of system components based on their positions in the network. We show that network characteristics suggest a representation of the integrated system as three network domains (regulatory, metabolic and interface networks) instead of two. This new three-domain representation reveals the structural centrality of components with known high functional relevance. This integrated network can serve as a platform for understanding coherent cellular states as active subnetworks and to elucidate crossover effects between metabolism and gene regulation.

Introduction So far, metabolic processes and gene regulatory events are typically considered individually in system-level investigations. However, ample evidence exists that the majority of cellular processes involves both, metabolism and gene regulation, and thus requires their joint examination [1]. One of the best-investigated individual examples in Escherichia coli (E. coli) is the phosphoenolpyruvate-carbohydrate phosphotransferase system (PTS) which is responsible for import and phosphorylation of sugars [2]. Additionally, the PTS is involved in the regulation of the import process depending on the available carbohydrate mixtures in the growth medium. By carbon catabolite repression and inducer exclusion, primarily the uptake of a preferred carbon source to be metabolized, such as glucose, is selected from other carbohydrates present in the growth medium. In order to understand the underlying principles, not only the effects of both 'layers', metabolism and regulation, need to be taken into account, but also their interface [3].
On a more qualitative level, the importance of the interface of metabolism and gene regulation can be illustrated by having a closer look at their most prominent representatives, namely, enzymes and metabolic transcriptional regulators. Both examples are proteins and can be thought of as a component type organizing the interplay of genes and metabolic reactions ( Fig  1). For enzymes the connection is straightforward: The majority of metabolic reactions can only take place if the corresponding genes of the catalyzing enzymes are expressed. These genes, in turn, are often involved in regulatory processes, especially if they are associated with central biochemical reactions. In contrast, metabolic transcriptional regulators can be illustrated by looking at transcription factors, the probably best-investigated transcriptional regulators. Some of them require the binding of a metabolite to be active and are therefore called metabolic transcriptional regulators. In the context of the integrative view discussed here, it is noteworthy that only the interaction with a metabolic component enables their functionality as gene expression regulators.
Conventional reconstructions of E. coli's metabolism as well as of its gene regulation thoroughly describe the process itself but usually lack information on interacting elements of the other biological system. While there are numerous genome-scale metabolic reconstructions available [4][5][6][7][8][9], only a few large-scale transcriptional regulatory networks exist that are mainly based on the information from RegulonDB [10]. First attempts to integrate both cellular processes started from metabolic reconstructions which were expanded by regulatory genes and stimuli of the associated encoding metabolic genes [11,12]. Both studies started from the number of genes downstream of the transcription factor receiving the feedback from metabolism), their possible function (e.g., in the processing of environmental information) and other properties [19]. In this way, the authors obtain insight in the algorithmic features of the interface between gene regulation and metabolism.
Understanding the interplay of metabolism and gene regulation will help to gain insight in cellular, system-wide responses such as to changing environmental conditions. Here, we present the database-assisted reconstruction of an integrative E. coli network capturing metabolic as well as regulatory processes. The attribution of network components (in terms of individual vertices) to the metabolic and regulatory domains, as well as the protein interface enables the further characterization of the network in terms of its modular organization, its path statistics and the vertex centrality.
In particular, we formulate a new measure by evaluating domain-traversing paths, in order to quantitatively assess the role of components in the interface domain and thus identify crosssystemic key elements contributing to both regulatory and metabolic processes. In all cases, these topological assessments highlight system components and functional subsystems, which are well known for their biological relevance, thus emphasizing the predictive power of network topology. Employing observations on the topological (structural, network-architectural) level, in order to identify components in the system of particular functional relevance has a long tradition in network biology (and in network science in general).
The main results of our investigation are: We present an integrated network representation of gene regulation and metabolism of E. coli and illustrate how it is a promising starting point for the structural investigation of system-wide phenomena. In particular, the network perspective suggests the explicit consideration of a protein interface between the genetic and metabolic realms of the cell. Employing network metrics we (1) argue that a three-domain partitioning is architecturally and functionally plausible, and (2) show that prominent components of the network according to the structural investigation tend to be of evident biological importance. Especially, the evaluation of possible paths through the interface domain of the network reconstruction yields well-known functional subsystems. The overlap of structural and biological relevance here suggests that a careful analysis of such a structural model can guide biological investigations by focusing on a limited number of structurally outstanding components. This network model can also serve as a starting point for a range of topological analyses with methods developed in statistical physics (see, e.g., [20] for a recent review).
Summarizing, in contrast to the separate analyses of (e.g., the metabolic or gene regulatory) subsystems, we expect that the integrative network model shown here will draw the attention to system-wide feedback loops not contained in the individual subsystems and to different roles of individual components, which become only visible from the perspective of interdependent networks.

Database-assisted network reconstruction
By now, the dramatic growth of bioinformatics databases [21], both in content and in diversity, allows addressing the challenge of integrating regulation and metabolism on a system-wide level. We devised a semi-automated framework to integrate information from EcoCyc database [22] and RegulonDB [10] into a network for E. coli including major cellular processes, from metabolic processes via protein modifications to a variety of regulatory events (see Methods). Networks are an efficient data structure for integrating this wealth of information [23][24][25]. In this way, the vast amount of data contained in the bioinformatics databases provide an 'architectural embedding' for metabolic-regulatory networks and guides subsequent steps of model refinement and validation. We augmented and validated the resulting network based on existing reconstructions of metabolic [6,8,[26][27][28] as well as of gene regulatory processes [10].
The integrative E. coli network constructed here comprises the three major biological components, genes, proteins, and metabolites, as well as the metabolizing reactions summing up to more than 12,000 components. Represented as a graph, the network has seven types of vertices depicting the major biological components (Fig 2, Table A in S1 Text) and seven different types of edges including two types of encoding associations, i.e., transcription and translation processes, four types describing the associations within biochemical reactions, and one type summarizing regulatory relations (Table B in S1 Text). Two small annotated biological examples are shown in Fig B in S1 Text. The graph representation facilitates the mapping of reactions and their catalyzing enzymes, as both are depicted as vertices. In contrast, metabolic systems are often represented as hypergraphs to illustrate the Boolean 'AND' association of reaction educts and the fixed stoichiometric ratio of the involved metabolites. Those aspects are assigned explicitly as edge properties in the graph representation. For the purpose of measuring the propagation of perturbations through the network, for example, the following logical assignments are helpful (see [29] for details on these definitions): Besides the associations of reaction educts, the encoding relations of protein complexes are of Boolean 'AND' type, termed conjunct links. On the contrary, associations representing isoforms of protein subunits, isoenzymes as well as reaction products are implemented by Boolean 'OR' links, called disjunct. The third linkage type, regulation, covers approximately 7,300 regulatory associations, i.e., transcriptional, translational as well as metabolic ones (Table C in S1 Text).

The metabolic and regulatory processes
The comparison with existing models reveals that the presented integrative network is a comprehensive representation of the metabolic and regulatory processes in E. coli. The very first approach of embedding metabolic processes in the regulatory context of [11], the iMC1010 model, started from a metabolic model which was extended by the regulation of the encoding genes of the catalyzing enzymes. For the purpose of determining the overlap of the integrative metabolic-regulatory network and the iMC1010 model, transport reactions as well as the artificial biomass reaction have been disregarded and, moreover, only unique metabolites (neglecting compartmentation) have been taken into account. Else, the different levels of details of the transport systems such as PTS as well as of the compound compartmentation would render a Spring-block graph representation and vertex composition of the integrative E. coli network. A scalable force directed placement algorithm has been used. The coverage of the pioneer model from [11] is provided in column iMC1010.
To assess the coverage of E. coli's metabolic processes, the embedded metabolic processes of the integrative E. coli network have been associated to the ones of an established E. coli metabolic reconstruction, namely the iAF1260 model from [6]. About 67% of the involved biochemical reactions, compounds and genes could be mapped directly (see Table D in S1 Text, column 4). Particularly, these two thirds capture almost all biologically relevant components in terms of in silico viability. Using flux balance analysis for simulating the biomass production capacity of the iAF1260 model and taking the overlap with mapped components of the integrative E. coli network revealed that for the default medium setup approximately 75% of the essential reactions (to yield 1% biomass) are covered by the integrative E. coli network.
Analogous to the metabolic processes, the coverage of E. coli's gene regulation has been determined using the transcriptional regulatory network from RegulonDB [10]. This model has been assembled in a similar fashion but is accounting only for transcription factors and their regulated genes. With a coverage of more than 98%, the transcription-related regulatory processes are considered as completely recorded in the integrative E. coli network (see Table D in S1 Text, column 5). Apart from that, for this assessment of overlap a comparison of regulatory processes associated with RNA translation as well as metabolic regulatory events is not possible since the RegulonDB transcriptional regulatory network does not consider protein and metabolic interaction processes.

The interface of metabolic and regulatory processes
The most conspicuous links between metabolic and gene regulatory processes are metabolic transcription factors, i.e., gene expression regulators binding metabolites, and metabolic genes, i.e., genes with significant and coordinated response on the metabolic level such as encoding enzymes. Intuitively, the interface is considered so far as the direct interactions of metabolic elements and gene regulatory elements, and the integrative E. coli network can be partitioned into metabolic and regulatory domain (MD-RD).
However, by examining those interactions in more detail the topological role of proteins becomes apparent. Regarding the metabolic transcription factors, the respective metabolite binds to a protein and this metabolite-protein complex then subsequently regulates the gene expression. In the case of metabolic genes, ultimately the respective gene encodes a protein which either by itself or as a complex serves as an enzyme. In line with this, the interface of metabolic and gene regulatory processes should be considered as the series of interactions of metabolites and genes, respectively, with proteins and subsequent protein modifications. Thus, the interface does not only comprise interactions (edges) but also components (vertices), and the integrative E. coli network will in the following be divided into a metabolic domain, a protein interface and a regulatory domain (MD-PI-RD).
In the next section, the plausibility of the three-domain partition (and the set of biologically motivated rules devised to create it) will be assessed in comparison to the likewise proposed two-domain (MD-RD) representation.
The interface structure-A matter of network partition. In order to assess the largescale structure of the reconstructed network we apply a set of rules that assign each vertex of the network to one of two and three domains, respectively, by considering the biological types of the vertices themselves as well as those of their neighbors (as outlined in the Methods section). Since these rules have been designed to group together vertices connected to the same biological processes we expect them to result in biologically plausible network partitions.
To complement the two functional partitions, MD-RD and MD-PI-RD, two partitions that solely take into account the vertex types have been analyzed, also representing a metabolic-regulatory division into two and three domains, respectively. For the vertex-driven twodomain partition, the sets of gene and protein vertices denote the regulatory processes while in the three-domain partition regulation is given by the set of genes and the interface domain only consists of the protein vertices. In both cases, metabolism is represented by the sets of reactions and compounds. In the three-domain case, the vertex-driven three-domain partition, the vertex set of proteins form an interface similar to the MD-PI-RD partition (Fig 3). The functional and vertex-driven three-domain partitions are of roughly similar size in terms of vertex count, while the respective two-domain partitions have a metabolic-regulatory vertex ratio of 5:1 and 4:3, respectively (see Fig 4). First, the two three-domain partitions will be compared, i.e., the functional partition, MD-PI-RD, and the vertex-driven partition. In the following, we will argue that the additional third domain acts as an interface between the regulatory and metabolic domains in the functional partition, while we will see that the vertex-driven partition fails to give a coherent picture of the domain-level organization of the biological system.
Especially, it will become clear, also in later sections, that the interface domain in the functional partition contains processes that are known to play prominent roles in system-scale communication within the cell, and may therefore be considered an important component of the large-scale organizational structure of the combined regulation and metabolism of E. coli.
A simple quantity to illustrate the domain-level picture is the fraction of inter-module edges (linking to a vertex of a different domain) over all edges connected to vertices of a specific domain (i.e., external and internal edges). Of course, there is no objectively 'correct' partition the result of our procedure could be measured against, but there are a number of fundamental properties that a biologically plausible partition in the given context should possess. On the one hand, a proper interface provides the main means of communication between the regulatory and the metabolic processes, i.e., the majority of paths between the outer two domains should run through the interface. Indeed, the interface of the functional partition shows a considerably larger inter-module edge fraction than the remaining domains (0.7 compared to 0.5 and 0.1, Fig 4), stressing its special character as a bridging module. A high intermodule edge fraction of the interface is also found in the vertex-driven partition, however, its regulatory domain shows an even higher inter-module edge fraction which indicates an entanglement between the two groups rather than one domain acting as a bridging module to another domain. This exactly gives rise for the second criteria, that the domains should capture actual processes (here, structures on the level of several vertices). Unambiguously, regulatory or metabolic processes should be contained within the respective domain so that system-wide interaction takes place between processes. In the following chapter, Interface characterization, it will be shown that this actually is also the case for the interface in the functional partition. In contrast, in the vertex-driven partition already the regulatory domain shows deficiencies with Next, we compare the three-domain partitions with the two-domain partitions. While the introduction of a third domain allows to study the system in terms of an explicit interface, the partitions into two domains is much closer to common biological intuition. The question which needs to be answered is whether metabolism and gene regulation are solely interfaced by the linking processes such as gene expression, and activation or inhibition of transcription factors and genes, so that the system can appropriately be described with two domains. Or whether there is an actual interface that preferably comprises entire processes additionally including protein modifications and suchlike. Here, this question will be assessed from a topological perspective.
A relevant topological quantity is the network modularity [30] of a given network partition. For a biologically meaningful classification, one would expect on the network level that the regulatory and the metabolic domains show high intra-module connectivity (a large number of links are within a domain) and sparse inter-module linkages (a small number of links are between domains). Accordingly, the network modularity should be high for a successful partition. The results for the modularity are listed in Fig 4. The functional partitions clearly outperform the vertex type-driven partitions. Also, when going from MD-RD to MD-PI-RD there is a notable increase in the modularity of the network (M 2 = 0.157, M 3 = 0.287). Note that here we consider specific candidates for biologically plausible partitions, while a purely topological analysis of the module structure of this large network yields a much larger set of significant modules. Here, a detailed biological interpretation is still missing and will be discussed elsewhere.
Altogether, the functional partition into metabolic domain, protein interface and regulatory domain reflects a biologically reliable classification in two delimited domains linked by a bridging module. Reinforced by the topological properties, the interface structure including full protein modification processes will be used subsequently.
Interface characterization. The interface of metabolic and gene regulatory processes of the integrative E. coli network comprises, as expected, predominantly proteins, i.e., monomers and complexes (Table A in S1 Text), and mainly protein modification processes such as protein translation, protein complex formation and biochemical protein conversion ( Table B in S1 Text). On closer examination, the covered processes can be divided in internal and peripheral ones. According to the bridging role of the interface, the majority of these are peripheral processes (Fig 3, Table B in S1 Text). The peripheral processes, in turn, can be subdivided according to their directionality meaning from regulatory to metabolic domain (subsequently termed 'downwards') and from metabolic to regulatory domain ('upwards'), respectively. To enumerate the portion of peripheral processes forming complete paths across the interface, direct downwards and upwards links and the new topological concept of domain-traversing paths (or short: traversing paths) have to be considered. A traversing path connects regulatory and metabolic domain via the protein interface, whereby only starting and end vertex are not affiliated to the bridging domain and the path direction is considered carefully (see Methods).
Examination of the downwards-upwards subdivision, especially the traversing paths, reveals a considerable (though biologically expected) asymmetry of the interface (Fig 5): The downwards interface is much more pronounced comprising predominantly the transcription of enzymes, i.e., metabolic genes, and the formation of enzymatic protein complexes. On the contrary, the upwards interface is comparably sparse with roughly half the direct (102/283) and quarter the traversing paths (4,070/18,904) connections of the downwards interface. These few upwards processes mainly include the formation of metabolic regulators, especially transcription factors, and the corresponding regulatory events.
In addition to confirming the interface asymmetry, the traversing paths reveal the bottleneck characteristic of the interface. First indications for this special property are (1) the low number of involved vertices and (2) the distribution of traversing path lengths. For both, downwards and upwards traversing paths, the number of distinct interface vertices in the traversing paths is low compared to the total number, i.e., 1,393 and 449 interface vertices of 2,286 in total, respectively ( Fig 5). On the other hand, for both, downwards and upwards traversing paths, emerges a remarkable clustering of paths of length 8-10 and four, six, and 9-11, respectively (Fig 6). This is in contrast to a smooth distribution one would expect in random graphs. By enumerating the involved vertices it is striking that approximately 46% of traversing paths contain one of five three-vertex-combinations, respectively. The respective combinations of downwards and upwards traversing paths pertain to three functional systems, the phosphoenolpyruvate-dependent sugar phosphotransferase system, PTS, the ribonucleotide reducing system, RNR system, as well as the nitrogen regulation two-component signal transduction system, NtrBC system (Table E in S1 Text).
All three biological subsystems, the PTS [2,31], the RNR [32][33][34][35] as well as the NtrBC system [36][37][38] are well-studied with respect to their functionality and their cellular context. A schematic representation of the three subsystems is provided in Fig 7. The PTS is an enzymatically active protein complex involved in the transport and phosphorylation of several sugars, so-called PTS-sugars [2]. In the integrative E. coli network more than 18 different sugars serve as potential substrates which are imported from peroxisome to cytosol at the same time ( Table F in S1 Text). The substrate variety together with the manifold usage of the associatively produced pyruvate point out the key role of the PTS in E. coli's metabolism and, moreover, suggest that the PTS acts as a bottleneck in the interface.
The RNR system, the second system dominating the downwards traversing paths, provides the major DNA building blocks [34]. Each of the different core enzyme classes, ribonucleotide reductase class I-III, are capable of catalyzing the reduction of all four nucleotides. Its transcriptional and metabolic regulation ensures the balanced supply and, thus, avoid the increase of mutation rates and the loss of DNA replication fidelity [39]. The central cellular role which is reflected in its regulatory embedding, together with its alternate substrates point to its special position in the interface.
The NtrBC system is a two-component signal transduction system initiating the nitrogen starvation response regulation. More precisely, depending on the nitrogen availability NtrB can autophosphorylate and the transfer of the NtrB phosphate group activates the global transduction regulator, NtrC. In E. coli, more than 40 genes known to be activated are involved in the nitrogen-response reaction such as active transport and mobilization of nitrogen in terms of N-containing compounds (for integrative E. coli network see Table G in S1 Text). The extensive regulatory function and the linkage to metabolism due to the allocation of ATP for NtrB autophosphorylation indicate that also the NtrBC system acts as a bottleneck in the interface, in the opposite direction to the PTS and RNR system.
The three central traversing paths systems and their biological relevance suggest that a topologically prominent position can be indicative of a biologically important functional entity. To corroborate the general validity of this indication, in the following section different topological properties have been analyzed and the prominent elements have been further characterized from a functional perspective. In order to also assess these traversing paths on a statistical level, we studied the percentage of traversing paths passing through specific vertices, pairs of vertices and triples of vertices (i.e., three-vertex combinations). Fig E in S1 Text shows the percentage of downwards (RD ! MD) and upwards (MD !RD) traversing paths for each of these three cases. The vertices and vertex combinations listed in Table E in S1 Text are highlighted in red. The histograms in Fig E in S1 Text show that, while the vast majority of vertices or vertex combinations is only involved in a small fraction of paths, some vertices or vertex combinations are involved in a much larger fraction of paths. These 'outlier' vertex groups (large number of paths are explained by these groups of vertices) also appear on the level of pairs and individual vertices, but on the level of triples they become biologically meaningful. Note that large parts of each of the domain lie in the largest strongly connected component (SCC), thus tightly coupling the three domains (Fig F in S1 Text). Regarding the domain-traversing paths we observe that  about 96% of the downwards and 67% of the upwards traversing paths are fully contained in the largest SCC.
Is the 'interface' nature of the protein domain also visible on a purely structural level? Fig G in S1 Text shows that the average betweenness centrality of the vertices in the interface domain is typically higher than the average betweenness centralities of randomly chosen subsets of vertices from the whole network of the same size as the interface domain, even though this distribution has a long tail to high values going beyond the average value of the interface domain.
Prompted by the recent study [19], we analyzed the feedback loops formed by our upwards and downwards paths. We passed from our sets of upwards and downwards traversing paths to traversing feedback loops by searching for combinations of upwards and downwards paths linked both, on the gene regulatory domain and in the metabolic domain. A closure of the loop in the regulatory domain is, for example, a transcription factor at the end of the upwards path regulating the gene, which serves as the starting node of the downwards path. A closure of the loop in the metabolic domain can be a direct path between the compound ending the downwards path and the compound starting the upwards paths. Surprisingly, the downwards paths (from RD to MD) contributing to loops are also dominated by the same triples, pairs of vertices and individual vertices as listed in Table E in S1 Text (and highlighted in the further statistical analysis in Fig E in S1 Text), while the upwards paths (from MD to RD) contributing to feedback loops deviate visibly from the set obtained by analyzing the upwards paths alone (i.e., as listed in Table E in S1 Text). Fig H in S1 Text summarizes this observation in the same format as Fig E in S1 Text.

Cross-systemic key elements of E. coli
The integration of metabolic and regulatory events allows us to determine the key elements of E. coli, especially those beyond the individual processes. In particular, the functional threedomain partition facilitates to recover network components (in terms of individual vertices) of evident biological relevance, e.g., by means of simple centrality measures. In the following, two different aspects of centrality have been examined [40]: degree centrality depicting the direct linkage of a vertex, and betweenness centrality which can be thought of as the participation of a vertex in the network flow [41].
Starting with the prominent local vertex structure, the so-called hubs (here, vertices with a total degree larger than 50), it is noticeable that they are primarily compounds and proteins, in particular protein complexes and appear in all three domains (see Table H in S1 Text, columns 3-5). In the metabolic domain, hubs include trivial compounds such as H + and H 2 O and, socalled, currency metabolites, e.g., ATP, NAD(P)H and coenzyme A, while hubs of regulatory processes are obviously global regulators which characteristically exhibit a remarkably strong asymmetry of in-degree and out-degree. Particularly, well-known transcriptions factors top this list such as FNR (fumarate and nitrate reduction) [42], Fis (factor for inversion stimulation) and H-NS (histone-like nucleoid structuring protein) [43]. As stated above, hubs predominantly occur in metabolic and gene regulatory domain while only a few are affiliated to the protein interface. However, it was not to be expected to identify cross-systemic elements solely based on their degree.
To assess/detect cross-systemic key elements an extended approach of degree centrality has been used that additionally accounts for the domain boundaries. The intra-domain degree fraction ξ, also termed embeddedness [44], denotes the ratio of the internal degree of a vertex, within a domain, and the total degree in the network. This measure very clearly distinguishes between, on the one hand, metabolic and regulatory hubs which show intra-domain degree fractions ξ > 0.87 (except one single compound with ξ = 0.185) and hubs in the interface which in contrast have ξ � 0.06 (see Table H in S1 Text, last column). Thus, while metabolic and regulatory hubs are embedded in their respective domains, hubs in the protein interface are mainly connected to vertices in the neighboring domains. In total, seven hubs show a significant low intra-domain degree fraction pointing to their prevalent interactions with the other two domains (Fig A in S1 Text and Table K in S1 Text, column 5). Six of them are affiliated to the protein interface exhibiting numerous interactions with the regulatory domain. Their linkages to the metabolic domain become visible when considering their composition, in case of the protein complexes, and their modes of action, respectively. The former involve the four protein-compound complexes Crp-cAMP (cyclic-AMP receptor protein binding cyclic-AMP) [31,45,46], DksA-ppGpp (dnaK suppressor binding guanosine 3'-diphosphate 5'-diphosphate) [47][48][49], NsrR-NO (nitrite-sensitive repressor binding nitric oxide) [50][51][52] and Lrp-Leu (leucine-responsive regulatory protein binding leucine) [53][54][55] whose naming schemes already indicate the metabolic link. The latter, namely, protein complex Cra (catabolite repressor activator) and protein monomer Lrp (leucine-responsive regulatory protein) form in the presence of appropriate metabolites, i.e., fructose 1,6-bisphosphate/fructose 1-phosphate and leucine, complexes affecting their regulatory effect. The remaining hub is the metabolic-domain vertex representing guanosine 5'-diphosphate 3'-diphosphate (ppGpp). Besides its special domain-affiliation among the low intra-domain degree hubs, ppGpp acts as an important regulator of both, metabolism and transcriptional processes. More precisely, it regulates several enzyme activities as well as numerous transcription initiations by allosterically binding to RNA polymerase.
So far, we demonstrated that the protein interface of the E. coli network reconstruction acts as a bridging module between regulatory and metabolic domain enabling their interaction and communication. Therefore, we expect the betweenness centrality to directly highlight vertices from the interface. Indeed, ten out of the top-25-ranked (still including currency metabolites) vertices are from the interface (see Table I in S1 Text, column 5), while overall the interface only accounts for about 18% of the vertices of the network. Especially, the already mentioned protein-compound complexes Crp-cAMP and DksA-ppGpp are among these compounds. In general, currency metabolites and trivial compounds (see above) as well as global regulators are among the central components with respect to betweenness. Apart from that, biochemical reactions building up and/or breaking down these metabolites and proteins as well as the other involved reactants pertain to the most betweenness-central components. Component association to functional systems allows to assess the systemic feature and by considering the corresponding network affiliation to depict the candidates for cross-systemic key elements. In this manner the network analysis allows us to detect the central role of Crp-cAMP, Lrp-Leu and ppGpp on purely topological grounds, as each component is the focus of such a functional system with high betweenness. Additionally among the top-ranked vertices with respect to betweenness centrality are five further cross-systemic components which are assigned to the protein interface, namely, phosphorylated PhoB (PhoB-P), Fur-Fe 2+ , and three outer membrane proteins (Omp), OmpC, OmpE and OmpF (Table I in S1 Text). The former two components are transcription factors and therefore acting in the gene regulatory domain, while at the same time they are protein complexes binding a metabolic small molecule depicting the connection to the metabolic processes. The latter three, the outer membrane porins, form hydrophilic channels, enabling non-specific diffusion of small molecules across the outer membrane [56][57][58]. In this role these proteins represent the most obvious connections of gene regulatory and metabolic domain-their encoding genes are highly regulated while the porins enable numerous metabolic transport reactions.
By focusing on the connecting domain of gene regulation and metabolism, the two centrality measures reinforce the key role of further cross-systemic elements. Considering the protein interface-induced subgraph both centralities point out the vertices that top the list of the above-discussed downwards traversing paths (Table J in S1 Text). In more detail, both major systems contributing to the downwards traversing paths are represented each by three vertices, namely, PTS and RNR system (Fig 7, panels A and B). Having a look at the intra-domain degree fraction, which put the focus on protein interface vertices as described above, additionally highlights a representative of the upwards traversing path system NtrBC (Fig 7, panel C), as the second non-hub (Table K in S1 Text). This corroborates the predictions from the traversing paths and, thus, shows that our new topological measure reveals cross-systemic elements which otherwise only stand out under detailed scrutiny of a large amount of biological information.

Discussion
Here we present an integrative network covering metabolic processes as well as regulatory events of E. coli but, especially, the interaction between both systems. With more than 10,000 vertices, it comprises around two third of the metabolic processes currently integrated in metabolic reconstructions [6] and concerning regulatory events, the presented network incorporates more than 95% of the established transcription-related processes [10]. Both, metabolic and gene regulatory processes are integrated on a genome scale. This approach differs from the procedures, where one of the two provides the network basis which is then expanded by closely related processes in the other subsystem. The latter is the dominant approach, for example, in conventional metabolic reconstructions which solely involve the encoding genes indirectly. Hitherto, integration of transcriptomics data could only be achieved using the socalled gene-protein-reaction (GPR) associations. On the one hand, this procedure limits the applicable data set to metabolic genes and, on the other hand, it acts on the assumption that all expressed enzymes are present in their active form. Starting from the integrative E. coli network, integrating transcriptomics data is much more straightforward and, more importantly, the complete data set can be applied. In this way, multi-domain variants of the frequently employed network-based interpretation of 'omics' data [59][60][61][62][63] can be formulated and indirect and regulatory impacts on metabolism can be examined.
The novelty of the reconstruction, the connection of metabolism and gene regulation, allows us not only to investigate the separate systems but also to assess their interactions. The most relevant connecting links are proteins, on the one hand, those acting as enzymes and, on the other hand, metabolic transcription factors. The functional classification, together with the topological analysis, suggests a network division into three domains: metabolic domain, protein interface and regulatory domain. This partition was corroborated by different connectivity measures and reflects a biologically reliable categorization in two delimited modules linked by a bridging module.
The principal structural feature of the network model, the three-domain organization, is reminiscent of the 'bow-tie' architectures frequently discussed in the theory of complex systems, where an input and an output layer are connected via a (typically much smaller) intermediate network [64][65][66]. Such a bow-tie structure (or, rather, the presence of several nested bow-tie architectures) has for example been discussed for metabolic networks [67], where the diversity of inputs (nutrients) and outputs (biomass components) is much larger than the intermediate processing layer. It has been hypothesized that such a bow-tie organization is a prerequisite for the robust operation of a complex system [64,65]. Here we observe a bow-tie organization in a system consisting of a rich 'material flow' system (metabolism) and a similarly rich 'control' system (gene regulation) connected via a protein interface.
As our topological assessment shows, the bridging character of the protein interface entails a bottleneck functionality. The analysis of the new topological measure, termed traversing paths, highlighted three major biological systems represented by 12 vertices forming more than 40% of these paths (comprising in total 1465 distinct vertices). These traversing path systems, namely phosphotransferase system (PTS), ribonucleotide reducing (RNR) and nitrogen regulation two-component signal transduction (NtrBC) system, are well-investigated ones with key biological relevance for E. coli's metabolism as well as its gene regulation suggesting that a topologically prominent position points to an important biologically functional entity.
Further detection of cross-systemic key elements in the network was accomplished using additional topological measures. In particular, two centrality measures were studied to account for different aspects of importance in terms of direct linkage and participation in network flow. Apart from conspicuous components, such as trivial compounds, currency metabolites and global regulators, a group of seven hubs were revealed by degree centrality whose characteristic is a significant low intra-domain degree fraction what numerically reflects the bridging feature of the protein interface. As expected, these components are located in the interface except for one, the vertex representing guanosine 5'-diphosphate 3'-diphosphate (ppGpp) which is affiliated to the metabolic domain. On the other hand, the inspection of betweenness centrality highlights rather biological systems than single components and as such point to key components detected before in their functional context. Besides trivial compounds and currency metabolites, this includes Crp-cAMP (cyclic-AMP receptor protein binding cyclic-AMP), Lrp-Leu (leucine-responsive regulatory protein binding leucine) and ppGpp which stand out due to their intra-domain degree fraction as well as seven further components already revealed as hubs.
Intriguingly, the interface-specific key elements of the network could be corroborated by exactly these two centrality measures. The assessment of the interface-induced subgraph using both centralities emphasizes altogether eight vertices of the downwards traversing paths discussed above contributing to the two major systems PTS and RNR. Taking into account the intra-domain degree fraction points out a representative of the upwards traversing path system NtrBC. In conclusion, the importance of vertices revealed by the here presented traversing paths could be reinforced by well-established topological measures showing the predictive power of the new measure.
Eventually, the key elements of the integrative E. coli network according to both centralities illustrate the importance of the different domains and their combined consideration (Fig 8). Unsurprisingly, the majority of key elements are affiliated to the metabolic domain and represent trivial compounds and currency metabolites, e.g., H + , H 2 O, ATP and NAD(P) + . Moreover, predominantly cross-systemic components top this combined list of central elements. First of all, the vertices emphasized also by their low intra-domain degree fraction attract attention, namely, Crp-cAMP, Lrp-Leu and ppGpp. These vertices demonstrate the value of the integrative approach: Only when embedded in domain context their vertex importance emerged. In case of the former two components, additionally, the composition unveils the cross-systemic role, i.e., a transcriptional factor protein binding a metabolic small molecule affecting its regulatory activity. Likewise the two regulatory key elements, Fur-Fe 2+ and PhoB-P, exhibit this conspicuous linkage to the metabolic domain illustrating their cross-systemic property. In other words, they belong to the so-called metabolic transcription factors and, thus, are related to the upwards interface. The opposite is the case for the three metabolic Omp (outer membrane porin) transporters that are among the key elements. While their metabolic linkage is more than obvious, the relation to the regulatory domain appears when the encoding genes are examined. These are highly regulated amongst others by the global regulators Crp-cAMP, Fur-Fe 2+ , Lrp-Leu and PhoB-P. In this manner, the Omp's are classical representatives of proteins related to the downwards interface, even though they are not affiliated with it. The remaining key elements are three metabolic small molecules which are counter-intuitively also related to the interface and the cross-systemic elements detected by the traversing paths. While in case of pyruvate the connection to PTS is apparent at first glance (Fig 7, panel A), the link of glutamate and ammonium and the NtrBC system is less perceptible. The actual connecting element is glutamine which is the ligase product of glutamate and ammonium. It activates the (de)uridylylation of the regulatory protein PII which, in turn, inhibits NtrB autophosphorylation [36,68]. Altogether, the links to the three major traversing path systems are certainly not the only important processes these elements are involved in but they reinforce their biologically central roles. Remarkably, these connecting elements show up when considering the entire network while to acknowledge their importance the interface-specific analysis is needed. Beyond the detection of key elements, the integrative approach will allow to examine the interplay and distribution of short-term and long-term regulation in E. coli's metabolism. While metabolic regulation of, for instance, enzyme activities occurs on a short time-scale, regulation of gene expression is a long-term control process. Both types of regulation have been incorporated in the network even though only on a qualitative level, i.e., as activator or inhibitor. Like this, the different effective ranges in metabolism can be assessed and, thus, its covering by one or both regulation types where central metabolism is said to be highly controlled.

From the perspective of recent advances in network theory
With their balance of structural detail and functional simplicity, network models are capable of revealing organizational principles, which are hard to recognize on a smaller systemic scale (e.g., by analyzing individual pathways) or in functionally richer system representations (e.g., in dynamical models). One purpose of the network provided here is to enable work at the interface of statistical physics and systems biology, where the rich toolbox of complex network analysis is employed to identify functionally relevant non-random features of such biological networks.
The recent work of [69], for example, showed that network structure can reveal, whether an enzyme is susceptible rather to genetic knockdown or pharmacologic inhibition. While in the present study, the network measures do not distinguish between different kinds of vertices or links, the rich biological meta data concerning the different biological roles of the components could be translated into distinct vertex and edge classes. In our own investigation [29] we used this fact to study, in a further example of such an interdisciplinary effort, the balance of robustness and sensitivity in the interdependent network of gene regulation and metabolism, based on the reconstructed network provided here.
In general, we expect that our network reconstruction can serve as a relevant data resource for the application of methods from the analysis of multiplex [70] and other multilayer networks [20,71]. Recently, there has been a growing interest in the properties of these systems, especially in the presence of explicit interdependencies between vertices [70,72]. In contrast to monoplex networks interdependent networks can show a qualitatively different robustness against failures, i.e., cascading failures leading to a sudden system breakdown at a critical initial attack size [73,74]. The case of different vertex types (as opposed to different edge types) has been considered, for example, in the context of secure communication in a network where eavesdroppers control sets of vertices [75].
On a general level, analyzing statistics of paths with respect to the network's large-scale structure, like the domain-traversing paths used here, might prove useful for the evaluation of other networks that show (possibly more than one) interface-like features.

Concluding remarks
In summary, the analysis of network topology allows to determine key system components in the integrative E. coli network. In line with expectations, trivial compounds as well as currency metabolites showed up regardless of the measure that has been applied. In addition, further obvious components including several global regulators were identified. More striking is the detection of components and systems which solely emerge when analyzing specifically the interface. These hidden elements are associated to two of the biologically well-investigated functional subsystems, PTS and NtrBC. Both well-established and newly designed measures of the interface point out the same subsystems, and even the analysis of the entire network discloses components indirectly related to these hidden subsystems.
Apart from trivial and currency metabolites, every detected key element of the entire network contributes to some extent to the downwards and/or upwards interface. This unlookedfor cross-systemic property is reflected either in the complex composition, the intra-domain degree fraction, the proximity to key systems, and/or the interplay with regulatory and metabolic processes. The biological relevance of these components supports their detection and reinforces the predictive power of the novel traversing path measure. In general, we believe that the presented integrative E. coli network allows further investigations of the interplay of metabolism and gene regulation which will provide insights into cellular, system-wide responses.

Methods
The interconnected E. coli network is based on the EcoCyc database [22], release 20.0, which includes verified information of metabolic and regulatory processes (corresponds to RegulonDB 8.6 [10]) for E. coli K-12 substr. MG1655. The network is represented as a graph comprising four different types of vertices: (1) encoding genes, (2) protein monomers and complexes (including enzymes), (3) small compounds, and (4) (bio)chemical reactions (Table A in S1 Text). The protein vertices are further subclassified into protein monomers, protein-protein complexes, protein-compound complexes, and protein-RNA complexes. Regarding the edges, we distinguish three types: encoding and catalyzing associations, reaction connections to educts and products, and regulatory links to sources and targets (Table B in S1 Text).

Extraction of database information
First, relevant information of the database has been extracted and arranged (Algorithm 1 in Fig 9). For each regulatory process, the respective source and target were specified and converted to match one of the vertex types ('regulation.dat', file name of the EcoCyc-archive). To this end, the transcript units were separated into promoter, genes and terminator (if applicable), and the regulatory processes were multiplied per comprising gene. Moreover, each regulating RNA has been translated into its encoding gene to meet the vertex types. In case of the metabolic processes, the reaction educts and products as well as the catalyzing enzymes have been assembled and converted to match one of the vertex groups, the respective educt and product stoichiometry have been assigned and the reaction compartmentation and reversibility have been assessed ('reactions.dat'). Thereby, as cell compartments the periplasmic space, the inner membrane, and the cytosol have been taken into account and reversible reactions have been split up.
Second, vertex candidates have been validated ('reactions.dat', 'compounds.dat', 'proteins. dat', 'genes.dat', 'rnas.dat') and divided into reaction, compound, protein monomer, protein-protein complex, protein-compound complex, protein-RNA complex, and gene. In doing so, generic terms such as DIPEPTIDES have been substituted ('classes.dat') and double annotations, e.g., CPD-15709 and FRUCTOSE-6P have been decoded. Thereupon, the compositions and the encoding genes of the assembled proteins have been gathered and matched to the vertex groups and the respective logical operation and stoichiometry have been annotated ('protcplxs. col'). Based on the validated vertex lists, the regulatory and metabolic processes have been updated whereby each process was removed with at least one unidentified vertex resulting in the final edge lists. Fig C in S1 Text provides a flowchart of this algorithmic procedure.

Network implementation
With the validated vertex and edge lists the graph has been assembled and its largest weakly connected component has been extracted. The three domain partition MD-PI-RD (Table 1 and A in S1 Text) as well as the two-domain partition are implemented as vertex properties affiliation and metabolic. The initial categorization of both partitions is based on the vertex type 'reaction' which is denoted as purely metabolic and interface-related if all educts and products are compounds and proteins, respectively. Mixed educt and product types demand further clarification later on. Similarly, non-ambiguous vertices of type 'compound', 'protein' and 'gene' are affiliated based on the affiliation of their neighbor vertices. This means that, if the influential adjacent vertices have the same affiliation, the vertex will be assigned to the same or its assignment needs a detailed consideration. In this way, genes and proteins that are Moreover, the mapping to the E. coli model of [11] has been annotated which integrates the metabolic network iJR904 published by [5] and the transcription regulatory events related to the encoding genes of the catalyzing enzymes. To this end, genes, proteins, metabolites as well as biochemical reactions of the metabolic model have been mapped to the EcoCyc database (release 20.0), in a first step automatically based on their identifier and the resulting dictionaries have been manually curated. As the EcoCyc database does not account for compartmentation of compounds and reactions as well as for exchange reactions, unique metabolites and internal reactions have been considered resulting in a coverage of more than 93%. By additionally disregarding internal transport reactions a coverage of 96.5% can be achieved (Table 1).
Integrating the manually curated Covert dictionaries, each vertex has attributed (1) a unique identifier, according to the EcoCyc identifier but also indicating the compartment, (2) a unique type reference, (3) a unique assignment of the model components from [11], if applicable, and (4) the affiliations of the two-and three-domain partition. Furthermore, vertices of types gene and reaction have (5a) a name assigned, the Blattner ID and the EC number, if applicable. The remaining vertices have additionally (5b) a compartment assigned, where cytosol (c), extracellular space (e), periplasmic space (p), inner membrane (i), outer membrane (o) and membrane in general (m) were taken into account. Similarly, each edge of the network has the attribute (1) type, specifying the connected vertices, and the corresponding (2) stoichiometry, where zero is assigned if not applicable or ambiguous. For edges depicting regulatory processes the stoichiometry actually denotes the mode of regulation, namely activation (+, 1) inhibition (−,−1) or combined (0). These edges additionally have assigned (3) an identifier, according to the EcoCyc identifier and (4) a name, specifying the regulation type. All other edge types can be classified as either representing conjunct or disjunct links in the sense that all or solely one incoming link is required for functionality (Table B in S1 Text).
The fully annotated integrative reconstruction of E. coli's metabolic and regulatory processes is provided as a graph representation in S1 File.

Graph properties concerning intra-and inter-module connectivity
The following measures have been used in the assessment of the graph partitioning scheme.
Inter-module edge fraction c. Given the set of vertices with the domain label D, edges connecting these vertices to a vertex of a different label are considered external, while edges between vertices of the same label are internal. We call c D ¼ #ðexternal edges of DÞ #ðexternal þ internal edges of DÞ the inter-module edge fraction of domain D.
Network modularity M. denotes the degree to which a given partition divides the network in highly connected groups, modules, which are comparably sparsely connected among each other. Therefore, the intra-module links are counted against the total degree of the module vertices (Eq 1), Here k v is the degree of vertex v and link(v, w) denotes an undirected edge between vertices v and w. Note that this formulation of modularity (taken from [30]) coincides with the definition from [76].

Domain-traversing paths
A traversing path connects the regulatory and the metabolic domains via the protein interface, specifically, a traversing path of length k is of the form ½ðu; v 1 Þ; ðv 1 ; v 2 Þ; . . . ðv kÀ 1 ; wÞ� ð2Þ where the vertices u and w are from the regulatory and the metabolic domain (and vice versa) and the vertices v i are distinct and part of the protein interface. Starting from the set of edges directly at the intersection of two domains iteratively the vertex successors of the interface domain as well as the final, first successor in the third domain have been determined (Algorithm 3 in Fig 12).

Vertex centrality
The key elements of the integrative E. coli network have been determined based on two graph properties. Degree Centrality DC. is a local centrality measure and denotes the total number of inand out-going edges of a vertex, (Eq 3), Here, the vertices with a total degree greater than 50 are termed hubs (see the degree distribution in Fig D in S1 Text).
By additionally accounting for the domain boundaries, the intra-domain degree fraction ξ (also termed embeddedness [44]) has been defined as ratio of internal degree, within domain D, and total degree of a vertex, (Eq 4), where A denotes the adjacency matrix of the graph. Betweenness Centrality BC. describes the impact on the flux through the network, under the assumption that the transfer follows the shortest paths. In particular, it quantifies the fraction of shortest paths between all pairs of vertices which involve the designated vertex (Eq 5), where σ st is the number of all shortest-paths between the vertices s and t while σ st (v) yields the number of these paths that run through v [41].