Using gene expression data and network topology to detect substantial pathways, clusters and switches during oxygen deprivation of Escherichia coli

Background Biochemical investigations over the last decades have elucidated an increasingly complete image of the cellular metabolism. To derive a systems view for the regulation of the metabolism when cells adapt to environmental changes, whole genome gene expression profiles can be analysed. Moreover, utilising a network topology based on gene relationships may facilitate interpreting this vast amount of information, and extracting significant patterns within the networks. Results Interpreting expression levels as pixels with grey value intensities and network topology as relationships between pixels, allows for an image-like representation of cellular metabolism. While the topology of a regular image is a lattice grid, biological networks demonstrate scale-free architecture and thus advanced image processing methods such as wavelet transforms cannot directly be applied. In the study reported here, one-dimensional enzyme-enzyme pairs were tracked to reveal sub-graphs of a biological interaction network which showed significant adaptations to a changing environment. As a case study, the response of the hetero-fermentative bacterium E. coli to oxygen deprivation was investigated. With our novel method, we detected, as expected, an up-regulation in the pathways of hexose nutrients up-take and metabolism and formate fermentation. Furthermore, our approach revealed a down-regulation in iron processing as well as the up-regulation of the histidine biosynthesis pathway. The latter may reflect an adaptive response of E. coli against an increasingly acidic environment due to the excretion of acidic products during anaerobic growth in a batch culture. Conclusion Based on microarray expression profiling data of prokaryotic cells exposed to fundamental treatment changes, our novel technique proved to extract system changes for a rather broad spectrum of the biochemical network.


Background
Over the last decades our understanding of cellular metabolism has increased considerably [1], in particular for less complex organisms such as Escherichia coli [2][3][4]. The gained knowledge includes cellular adaptation programs that respond to changing environmental conditions such as nutrient excess and starvation [5]. Current microarray technology allows for the investigation of all genes of an organism under various conditions, resulting in the generation of a massive amount of expression data. One of the greatest challenge we are faced with is to then analyse the data as a whole and extract the meaningful relationships among specific genes. Standard methods such as SAM [6] or machine learning algorithms [7] are able to detect patterns in gene expression data, distinguishing between different states of a cell. However, the above methods for classification and pattern discovery do not consider interactions between different genes and their corresponding proteins. Functional relationships between genes can be assembled by e.g. regulatory, signal transduction and metabolic networks. Gardner and coworkers used gene expression microarray data to infer a regulatory network for E. coli [8]. They developed a linear model and effectively reduced the number of parameters by assuming a sparse regulatory network. Finally, they verified their inferred regulatory network on a smaller subset, i.e. the regulation of the SOS pathway. In a recent study, a large compendium of gene expression microarray data for E. coli was analysed using an information theoretical approach revealing new regulatory interactions [9]. When analysing a metabolic network, every enzyme can be represented by its corresponding gene. For sets of genes, pathway scores have been calculated improving the sensitivity to detect crucial enzymatic pathways when taking network distances for enzyme pairs into account [10]. Transcription data and the topological information derived from the metabolic network was connected by calculating Zscores of highly correlated sub-networks [11]. Genes with common biological processes or functions were grouped by their gene ontology terms [12] and gene set enrichment tests performed on these groupings [13]. Additionally, gene set enrichments were tested by their common pathways in the corresponding networks [14,15]. However, these approaches do not take into account direct interactions within the network. In contrast, a Potts-spin clustering algorithm on metabolic networks was developed depending on direct nearest-neighbour relationships. It was applied yielding sub-graphs stimulated by environmental conditions [16]. Furthermore, common gene expression levels of neighbouring nodes in a metabolic network were calculated by averaging over all neighbours of a gene and revealed several interesting regulated pathways for the human immune system [17]. Rapaport and co-workers extracted gene expression patterns of neighbouring genes in the network yielding good classification of the profiled samples by calculating Fourier transformations and rejecting high frequency signals [18]. However, these approaches did not consider switch like behaviours of neighbouring genes. To detect common and contrasting tendencies, an image-like representation of the cellular metabolism can be used by interpreting expression levels as pixel intensities with grey values and the network topology as relationships between pixels. Image processing methods may then be applied to extract crucial features from such an image. Wavelet transforms are such an image processing method and were applied for texture classification [19], for feature generation to automatically classify microscope images [20] and large-scale functional genetic screens [21]. Without taking any network information into account, wavelet transforms have been used together with other image processing methods for analysing microarray data [22,23], in particular the Haar wavelet power spectrum for feature selection [24]. The application potential of this powerful technology to analyse biological networks is clear, yet challenging. While the underlying topology of an ordinary image consists of a lattice grid, biological networks have a rather scale-free architecture [25]. We recently reported one approach that applied image processing methods on the two-dimensional and therefore image-like adjacency matrix of the network [26]. In the present study we expand upon this method using the original architecture of the metabolic network. We analysed gene expression changes for each pair of neighbouring nodes combining their values additive (common response) and subtractive (opposing response, switch like behaviour). In a second step all combined nodes with a common response were again combined to yield significant clusters of co-expression. Such a simple approach allowed the analysis of the cellular stress response, not only for highly connected regions of the network but also for linear chains as well as the identification of specific switches. We analysed gene expression changes of E. coli during oxygen deprivation. With this technique we were able to detect the expected substantial regulatory adaptation programs, including up-regulated formate fermentation, mixed acid fermentation, metabolisms of hexoses and down regulation of the respiratory TCA cycle (see Figure 1). Furthermore, our technique revealed a down-regulation of the iron processing metabolism due to reduced oxidative stress during oxygen deprivation. The revealed up-regulation of the histidine biosynthesis pathway may constitute the adaptive response of E. coli to an acidic environment due to the excretion of acidic products during anaerobic growth in a batch culture.

Results and Discussion
Testing the method with simulated data Setting up the network and calculating the simulated expression data To test our method with simulated data on a simplified model network, we constructed a regular grid of 30 × 40 artificial reactions (workflow see Figure 2). On this simulated image-like metabolic network we randomly selected pathways of connected reactions with lengths 7, 10 and 24. These lengths corresponded to an expected length of a biological pathway (7,10) and to the most frequent path length of the shortest paths between all pairs of nodes in the regular grid, respectively. 100 runs were performed generating 44 experiments of simulated expression data with a ground level of 6, in rather good agreement with our normalised gene expression data. To this, a Gaussian noise of mean 0 and standard deviation 1 was added. Two classes were formed with 22 experiments each. In one class the reactions of the randomly chosen pathways were up-regulated by adding a constant level ∆ to the random expression levels.
Performing the method on random data In each run the expression data of all 44 samples (22 class 1 and 22 class 2) were mapped on the nodes (reactions) of the simulated metabolic network. Features were generated by applying the one dimensional Haar-wavelet transform onto each pair of neighbouring nodes. This yielded 9320 features for every sample. A t-test was applied for every feature to rank the features with respect to their discriminating property while correcting p-values for multiple testing (Bonferroni) [27]. Every feature for all reaction-pairs was ranked according to its p-value. The pvalue cut-off was set to 0.01. Reactions were regarded as up-regulated if the corresponding simulated genes were significantly differentially expressed (p-value of a t-test ≤ 0.05) and not significantly differentially expressed otherwise. Not differentially expressed end-nodes were discarded. We compared our technique to a standard method.
Comparison to a standard method A standard Students t-test was applied on the simulated expression data without taking any network information into account. For both methods true positives, false positives, false negatives and true negatives were calculated. To investigate a broader spectrum for the precision and sensitivity of our technique, the validation was performed with a variety of added constants (∆ = 2, 4, 6). Our tech-General workflow of the method Figure 2 General workflow of the method. The metabolic network of E. coli was put up using the EcoCyc database. Gene expression data was mapped onto the reactions of the network resulting in an image like representation (red boxes). Features were generated by using the Haar wavelet transformation on every connected reaction pair. The most discriminative features were identified by a t-test. Sub-graphs were built by connecting significant reaction pairs. Regions with identical regulation of more than four reactions were extracted (clusters). Reaction pairs with opposite regulation were identified as switches and were also extracted. The resulting pathways were analysed by literature scanning in-depth. Assembling the found pathways yielded an overall picture of the metabolic processes.

Pathways
Overview of the found significantly regulated metabolic path-ways during oxygen deprivation (green: up-regulated, red: down-regulated, blue: metabolites) Figure 1 Overview of the found significantly regulated metabolic pathways during oxygen deprivation (green: up-regulated, red: down-regulated, blue: metabolites). For more details, see: Figure 4 (yellow cross-hatched), Figure 5 (blue hatched) and Figure 6 and 7 (red and light blue boxes, respectively). Note that in this Figure, the metabolic pathways of Figure 7 are represented by two boxes. This is due to the unspecific hublike nature of L-glutamine (see Conclusions). nique decreased the number of false positives significantly ( Figure 3). In a step further we investigated how our technique performed on a biological network, choosing the metabolic network of E. coli, constructed as described in Methods. Out of this network we selected randomly pathways of lengths 5, 7, 10 and performed the same method as described above for different constants (∆ = 2, 4, 6). We obtained a similar superior performance of our approach. The number of false positives was reduced nearly threefold, while the detection power of true positives was identical (Results not shown).

The metabolism of E. coli under oxygen deprivation
The general workflow was briefly as follows:

C B A
regarded as up-regulated if the corresponding genes were significantly over-expressed under anaerobic conditions (p-value ≤ 0.05 of a mutant corrected t-test, see Methods), as down-regulated if significantly under-expressed and not significantly differentially expressed otherwise. Neighbouring, connected nodes that showed identical regulation (up or down) were grouped together to simplify interpretation. We refer to these groups as "clusters" in the following. Not differentially expressed end-nodes in a cluster were discarded. The resulting 10 clusters containing at least five reactions were interpreted in more detail and grouped according to their functional role ( Table 1, supplement 1 contains the corresponding EcoCyc reac-tion-ids). Furthermore, pairs of reactions with significantly opposing regulatory behaviour were defined as switches. All significant switches were extracted (p-value ≤ 0.01). 64 such switches were identified. The first 20 switches are discussed in detail ( Table 2 shows the first 20 switches, supplement 2 provides all 64 switches). An overview of the extracted pathways is given in the next paragraph.

Functional description of the extracted clusters
Pyruvate processing, formate fermentation, anaerobic respiration and anaerobic synthesis of deoxyribonucleosides Two clusters belonged to this sub-group (Table 1). The first cluster (Figure 4) consisted of nine reactions. Edges were due to the metabolites formate, fumarate and reduced menaquinone. Reactions connected via formate: Pyruvate formate lyase was up-regulated under anaerobic Fermentation of formate was up-regulated processing pyru-vate into formate via pyruvate lyase Figure 4 Fermentation of formate was up-regulated processing pyruvate into formate via pyruvate lyase. Pyruvate is degraded to formic acid (formate), which then is either expelled (via transporters), or further degraded into H 2 and CO 2 by the formate hydrogenlyase complex (for more details see text). Reactions are symbolised by squares, metabolites by circles. Green   Iron processing in an anaerobic environment Figure 6 Iron processing in an anaerobic environment. Iron is scavenged by E. coli using enterobactin, whose biosynthesis (blue bordered nodes) was down-regulated. For box colours see Figure 4.

Processing of hexoses
Two clusters represented the processing of hexose nutrients during anaerobic growth ( Table 1). The first cluster was formed by eight up-regulated reactions ( Figure 5). Connections between reactions were due to metabolites D-fructose-6-phosphate, fructose-1,6-bisphosphate and β-D-glucose-6-phosphate. Two major pathways were involved in the cluster: glycolysis, and fructose/mannose metabolism. The Embden-Meyerhof pathway is used when switching from aerobic respiration to fermentation during growth under anaerobic conditions on minimal medium with glucose [46], yielding in a strong increase of glucose consumption [29,46]. All reactions processing glucose down to fructose-1,6-biphosphate were up-regulated: Glucokinase converting glucose to glucose-6-phosphate and phosphoglucose isomerase transforming glucose-6-phosphate to fructose-6-phosphate/6-phosphofructokinase yielding fructose-1,6-bisphosphate. The increased conversion of D-fructose-6-phosphate to mannitol-1-phosphate generating the electron acceptor NAD+ normally produced in the Krebs cycle [47] explains the up-regulation of mannitol-1-phosphate 5-dehydrogenase. The higher amount of 1-phosphofructokinase is in agreement with previous findings [46]. The EIIMan transporter was up-regulated to increase the up-take of glucose. The second cluster of reactions processing hexose nutrients contained five up-regulated reactions which were connected due to the metabolites 2-keto-3-deoxy-6-phosphogluconate, D-glyceraldehyde-3-phosphate and 1,3diphosphateglycerate. Phosphoglycerate kinase, glyceraldehyde 3-phosphate dehydrogenase and triose phosphate isomerase are induced by anaerobiosis [48,49]. Note that they are part of the glycolytic pathway of E. coli. Finally, phosphogluconate dehydratase and 2-keto-3-deoxy-6phosphogluconate aldolase which are key enzymes of the Entner-Doudoroff pathway, were up-regulated, further demonstrating anaerobic glucose metabolism [50].

Iron processing
Two clusters represented the processing of iron in an anaerobic environment ( Table 1). The first cluster contained eight down-regulated reactions. From these, seven belonged to the complete biosynthesis pathway of enterobactin which is used by E. coli to scavenge iron, starting with isochorismate synthase and ending at the aryl carrier protein [51] (see Figure 6). Enterobactin biosynthesis is repressed under anaerobic conditions as it is used for aerobic iron transport [52]. Directly connected to the enterobactin pathway is enterochelin esterase. Enterochelin esterase uses enterobactin as an educt [53]. As biosynthesis of enterobactin was down-regulated, the down-regulation of enterochelin esterase is explained by the lower availability of its educts. The second cluster for iron processing contained five down-regulated reactions. Metabolites connecting the reactions were L-alanine and L-cysteine. The majority of the reactions are involved in Fe-S cluster biogenesis. The most connected node was cysteine desulfurase. This reaction assembles Fe-S complexes into Fe-S proteins to repair them when damaged during oxidative stress [54]. Under anaerobic conditions damage by oxidative stress is negligible explaining the down-regulation of cysteine desulfurase [55], whereas upregulation as an oxidative stress response has been reported under aerobic conditions [56]. Directly linked to cysteine desulfurase was the thiamin (thiazole moiety) biosynthesis protein, which is a catalyser transferring sul-During anaerobic growth E. coli performed mixed acid fer-mentation, resulting in a more acidic environment  fur from cysteine to the ThiS protein. It was down-regulated because during anaerobic growth a lower level of thiamin is needed compared to aerobic conditions [54]. Furthermore, selenocysteine lyase was connected to cysteine desulfurase via alanine. Selenocysteine lyase seems to be regulated by IscR and to form an alternate pathway involved in Fe-S biogenesis under aerobic conditions. In an anaerobic environment this reaction is known to be down-regulated [55].

Acid response
One prominent cluster was formed by 18 up-regulated reactions. Ten of these represent the complete histidine biosynthesis pathway, beginning with ATP phosphoribosyltransferase and ending at histidinal dehydrogenase ( Figure 7). When growing anaerobically on glucose, E. coli synthesises acids via mixed acid fermentation [46,57] and histidine is used to buffer acidic milieu [58]. Another reaction in the cluster was CTP synthetase which was also upregulated. Due to the down-regulation of NDP kinase the elevated levels of CTP synthetase are in agreement with previous findings [59] while the concrete functionality of this remains unclear. Furthermore, the cluster consisted of up-regulated reactions that needed or produced aspartate under anaerobic conditions. In yeast it was shown that the aspartate concentration is roughly 100 times higher in the cells under anaerobic conditions [60]. Generating aspartate may facilitate the biosynthesis of further amino acids and other essential compounds. E. coli has two known reactions catalysing the synthesis of asparagine, asparagine synthetase and aspartate-ammonia ligase. Both reactions were up-regulated during anaerobic growth, in agreement with previous findings [61]. The role of aspartate was further reinforced by the up-regulation of the GltP glutamate/aspartate DAACS transporter. Finally the cluster consisted of the starting points for the anaerobic de novo biosynthesis of NAD which were also up-regulated. This pathway uses L-aspartate to form NAD via L-aspartate oxidase and Quinolinate synthase [62,63]. Although NAD may be constitutively produced, the up-regulation of both reactions makes sense, as it has been shown that Quinolinate synthetase is inactive when exposed to oxygen [62].

Nucleosides metabolisms
Two clusters indicating a change in the processing of nucleosides were found (Table 1). One cluster contained eight down-regulated reactions processing GTP, GDP and dGDP. GTP cyclohydrolase I was down-regulated to limit the biosynthesis of cost intensive folate and highly abundant formate under anaerobic conditions. Similarly, GDP kinase, dGDP kinase, GDP reductase, deoxyguanylate kinase and ribonucleoside-diphosphate reductase 2 were down-regulated, which may be due to reducing the metabolism of cost intensive purines. Similarly the downregulation of GDP diphosphokinase and deoxyguanylate kinase can be explained. The second cluster consisted of six down-regulated reactions. Edges between reactions were due to metabolites UTP, UDP, UDP-galactose, α-Dglucose 1-phosphate or dTTP. The highest connected node was UDP kinase. Interestingly, this cluster compares to the cluster above showing down-regulated processing of cost intensive nucleosides.

One carbon units
Six down-regulated reactions formed a cluster showing the processing of one carbon units under anaerobic conditions. Metabolites connecting the reactions were glycine, H-protein-S-(aminomethyldihydrolipoyl)lysine and H-protein-(lipoyl)lysine. The central reaction was glycine dehydrogenase (decarboxylating) which together with aminomethyltransferase is part of the glycine cleavage system. Although it is reported that the glycine cleavage system is active under anaerobic conditions [41], the downregulation stems from the fact that the corresponding reaction reduces NAD + to NADH which is very costly due to the low availability of NAD + [64]. The production of one-carbon units, for which the glycine cleavage system is used [65], was taken over by glycine hydroxymethyltransferase (see switches). Furthermore, lipoyl-protein ligase A was down-regulated to reduce pyruvate dehydrogenase and to increase pyruvate formate lyase activity [66]. As a response to oxidative stress the expression of glutathione synthetase increases [67]. In an anaerobic environment no oxidative stress is prevalent, explaining the down-regulation of glutathione synthetase. Glycine-tRNA synthetase was down-regulated which may be due to reduced growth under oxygen deprivation.
Functional description of significant switches 64 significant switches were found (p-value ≤ 0.01, Table  2, supplement 2). The first 20 are interpreted here. Switches belonging to the same metabolic process are described in common paragraphs.

Formate fermentation
Five switches (1,3,4,8,15) belonged to the fermentation of formate. The following reaction-pairs were up-regulated and down-regulated respectively: formate hydrogenase complex and formyltetrahydrofolate deformylase, FocA formate FNT transporter and formyltetrahydrofolate deformylase, formate hydrogenase complex and GTP cyclohydrolase I, formate dehydrogenase and formyltetrahydrofolate deformylase, FocA formate FNT transporter and GTP cyclohydrolase I. All switches formed an intersection between degradation and formation of formate. Due to the high abundance of formate in the cell under anaerobic conditions, the formation of new formate was downregulated (see e.g. [68]), while the degradation of formate into CO 2 and H 2 and the transport of formate to the periplasm was up-regulated.

Mixed acid fermentation and anaerobic respiration
Four switches (2,10,11,14) belonged to mixed acid fermentation and anaerobic respiration. The first of these switches was formed by up-regulated acetaldehyde dehydrogenase and down-regulated ethanolamine ammonialyase. The reactions were connected via the metabolite acetaldehyde. E. coli ferments glucose via acetyl-CoA to ethanol. The first step in this fermentation is catalysed by acetaldehyde dehydrogenase converting acetyl-CoA to acetaldehyde [69]. Ethanolamine ammonia-lyase catalyses the cleavage of ethanolamine to acetaldehyde and ammonia [70]. Ethanolamine can be used as a carbon and energy source under aerobic conditions [71], resulting in a down-regulation of the reaction under anaerobic conditions. In two switches fumarate reductase was up-regulated where 5'-phosphoribosyl-4-(Nsuccinocarboxamide)-5-aminoimidazole lyase and adenylosuccinate lyase were down-regulated. 5'-phosphoribosyl-4-(N-succinocarboxamide)-5-aminoimidazole lyase and adenylosuccinate lyase form a bifunctional enzyme. The metabolite connecting the differently regulated reactions was fumarate in both cases. Fumarate reductase was up-regulated as it is used by E. coli during anaerobic growth [34] with menaquinol acting as an electron acceptor, while fumarate functions as a terminal electron donor [35]. 5'-phosphoribosyl-4-(Nsuccinocarboxamide)-5-aminoimidazole lyase/adenylosuccinate lyase was down-regulated to reduce the biosynthesis of purines indicating the reduced growth under oxygen deprivation. Another switch consisted of up-regulated phosphoenolpyruvate carboxylase and down-regulated aspartate transaminase, connected by the metabolite oxaloacetate. Phosphoenolpyruvate carboxylase participates in mixed-acid fermentation of glucose [72] and is therefore up-regulated during anaerobic growth. Under anaerobic conditions the citrate cycle is shortened to a reductive branch. CoA and oxaloacetate is then further processed to succinyl-coenzyme A by two possible branches, either using aspartate transaminase or malate dehydrogenase [73]. With our finding it seems that the second branch is favoured.

One carbon units
Two switches (6,7) were part of the metabolism of onecarbon units. The switches consisted of up-regulated serine hydroxymethyltransferase and down-regulated reactions of the glycine cleavage system (gcv system and glycine dehydrogenase (decarboxylating)). Both reactions produce 5,10-methylene-THF and are therefore major contributors of one-carbon units in E. coli [65,74,75]. The switch found here indicates that under anaerobic conditions one-carbon units are more produced by serine hydroxymethyltransferase than in an aerobic environment. The glycine cleavage system reduces NAD + to NADH. This is a costly reaction as NAD + is only available in small quantities [64]. Therefore, the glycine cleavage system is down-regulated resulting in an up-regulation of serine hydroxymethyltransferase to compensate for the loss in one-carbon units.

Processing of hexoses
Two switches (9,13) belonged to the processing of gluconate and glyoxylate. Both up-regulated nodes, 2-ketoaldonate reductase and 2-keto-4-hydroxyglutarate aldolase, participate in the intracellular regulation of glyoxylate levels [76,77]. Gluconokinase converts gluconate to 6-phosphogluconate which then can enter the Entner-Doudoroff or the pentose phosphate pathway [77] while 2-keto-4hydroxyglutarate aldolase forms a part of the Entner-Doudoroff pathway. Normally, the Entner-Doudoroff pathway is used if E. coli grows on gluconate. It exhibits basal levels of activity of this pathway if growing on glucose [78]. This is explained by the steady production of gluconate during glucose metabolism [78]. Under anaerobic conditions on glucose, the glucose metabolism is up-regulated (see Figure 5), which is followed by increased production of gluconate and an up-regulation of gluconate processing reactions. The down-regulated reactions, glyoxylate reductase and 2-ketoaldonate reductase respectively, use NADPH as the electron donor and cooperate with gluconate reductase [76] that, under aerobic conditions, brings glyoxylate into the tricarboxylic acid cycle [76]. Under anaerobic conditions this cycle is limited, resulting in the observed down-regulation.

Branched chain amino acids transporters
Three switches (16,17,18)  consisted of up-regulated CTP synthetase and down-regulated UDP-glucose-hexose-1-phosphate uridylyltransferase. The up-regulation of CTP-synthetase agrees to previous findings [59] although the reason for this remains to be investigated further. Two switches (5,19) were inexplicable to us (see Conclusions).

Comparison to a standard method
We compared the list of genes extracted with our technique to a standard t-test which did not take any network information into account. A mutant-corrected t-test (see Methods) was run on the gene expression levels for the corresponding reactions. Table 3 shows the results for the first 40 highest ranking features. All except six reactions were also found by our technique. The top three reactions (1, 2, 3 of Table 3) are involved in fermentation of formate and were also found with our technique (Table 1, Figure 4). Our technique was capable detecting whole pathways that occurred in the list of our top ranking features. However, the standard method did not detect such pathways or sub-graphs (discussed in the text, see above) supporting our concept for identifying functionally relevant sub-graphs. The six reactions 10,13,15,25,29,32 were not extracted by our technique. Five of these reactions were not found due to the network construction: Unspecific metabolites were deleted resulting in the deletion of reactions that catalyse unspecific substrates, such as pyruvate kinase, glutamate dehydrogenase (NADP+), NAD kinase, NADH oxidoreductase and RhtB homoserine Rht transporter. Putative reactions with undefined metabolites like N-acetyl-anhydromuramyl-L-alanineamidase, were also not included into the studied metabolic network and could therefore not be identified.

Conclusion
We applied simplified first-and second-order Haar-wavelet-transformations to select combined transcription levels of reaction-pairs. We chose the Haar wavelets as they enable connecting two discrete data points (reaction pairs in our case) in a straightforward way. Furthermore, we searched for common and opposing responses between combined gene expression data which matched well to the shape of the Haar wavelet filters. Through using this approach we gained substantial insight into the metabolic regulation of E. coli upon the transition from oxygen-rich to oxygen-deprived conditions. Such an approach complements to the original idea of DeRisi and co-workers to use microarray technology for discovering system changes. For example, they revealed changes in yeast metabolism during the diauxic shift [84]. In the study presented here, we discovered a broad spectrum of responses including direct responses to limited oxygen and changing buffer conditions. As a response to limited oxygen, we identified an up-regulation in glycolysis, other hexose metabolisms, mixed acid fermentation, formate fermenta-tion and the metabolism of aspartate. In summary, we see two interesting implications for our study, (i) data analysis: the implementation of the Haar-wavelet technique on small pairs of nodes is well suited for revealing significant patterns in a cellular network; and (ii) functional: many pathways are regulated on a transcriptional level supporting the concept of hierarchical control analysis for microorganisms [85,86].
The formate fermentation showed an interesting switch like behaviour: for oxygen deprived conditions the degradation of formate was up-regulated while its cost-intensive production was down-regulated. Note that this may be more difficult to reveal when using smoothing techniques (as e.g. [17,18]) and demonstrates the benefit of using wavelets. Furthermore, a decrease in the metabolism of iron was detected as a response to reduced oxygen availability. Interestingly, this agrees with Faith et al. who analysed a large compendium of 445 microarrays for E. coli including a variety of different oxygen conditions [9]. They showed that PdHR which regulates the central metabolism, is also involved in regulating the fec operon which encodes genes for iron transport. We discovered that the entire histidine biosynthesis pathway was up-regulated as a possible response to accumulation of acid products in batch culture [58]. However, essential subgraphs were not only detected in an isolated form, but also in relation to connected pathways which depended on the same metabolites. E.g., the cluster containing the histidine biosynthesis pathway (Figure 7) also contained components for metabolism of aspartate and glutamine. In addition, the cluster of formate fermentation (Figure 4) included parts of the aspartate metabolism. This reflects the unspecific hub-like nature of key metabolites such as L-glutamine and aspartate connecting several pathways. Significant switches supported the yielded adaptation mechanisms of E. coli to changing oxygen abundance, as e.g., switches pertaining to the fermentation of formate and mixed acids.
In our previous study, we used the same microarray dataset and extracted discriminate patterns of highly connected regions in the network [26]. In comparison to the present study here, we got a good consistency of the extracted pathways (glycolysis, aspartate metabolism, formate fermentation, pyruvate metabolism). In the study presented here, we elucidated some new pathways, i.e. the histidine biosynthesis, enterobactin biosynthesis (oxidative stress response), the aerobic part of the TCA cycle, and hexoses and one-carbon-units processing. It is of note that histidine biosynthesis and the biosynthesis of enterobactin are linear chains in the network. In contrast to the previous method, such linear chains can be well recognised by the method we present here which couples pairs of nodes. However, the previous method recognised two interesting, highly-connected regions which were not indicated using our new method (the interface between glycolysis and NAD biosynthesis, and the biosynthesis of lysine, see [26]). Two observed switches remain explainable. One switch (switch 19) consisted of 3-hydroxy acid dehydrogenase and phosphoserine phosphatase. In the second case (switch 5) up-regulated 3-methyl-2-oxobutanoate hydroxymethyltransferase and down-regulated 2dehydropantoate reductase directly followed the up-regulated reaction in the biosynthesis of pantothenate. These results may reveal an incomplete understanding of these metabolic components and the need for further experimental investigation.
On simulated data, the accuracy and precision was significantly better in comparison to the standard method. This allowed us to use a p-value of 0.01 and to focus on more significant changes. We compared our technique with a standard method extracting lists of discriminative genes from the expression data without taking gene relationships into account. We were able to detect all relevant reactions that could also be found by the standard method. In contrast, the standard method failed to reveal comprehensive functional pathways. However, for future studies a general method to validate the functionality of such a broad spectrum of newly revealed pathways remains to be developed. Nevertheless, our technique might be used for analysing signalling networks, e.g., to identify discriminative regulations in cancers with different prognosis, even though reaction and signalling levels might be less related to gene expression levels for higher organisms. Further methodological advances might also include the addition of protein post-transcriptional regulation and the application of more complex image processing methods.

Establishing the metabolic network
All metabolic reactions were extracted from the EcoCyc database (Version 10.0) [87]. A graph was established by defining neighbours of reactions: Two reactions were neighbours if a metabolite existed that was the product of one reaction and the substrate for the other. In this representation the nodes of the graph were the reactions while edges were defined by the metabolites. Metabolites were discarded that were highly connected and therefore pathway unspecific, such as water, oxygen, major coenzymes and prostethic groups. This approach resulted in a graph with 1196 nodes and 3650 edges. . They effect a major portion of all genes in E. coli. All gene expression experiments were done in triplicate under aerobic and anaerobic conditions, except for anaerobic wild-type which was repeated four times. The gene expression data of each data-set was mapped onto the corresponding reactions of the transcribed proteins. If a reaction was catalysed by a complex of proteins the minimal expression value of the genes involved was taken as the value of the corresponding complex. The expression data of all samples was mapped onto each network, yielding 43 different patterns for each graph.

Generating feature modules
To discover specific expression patterns and textures in the network we calculated features with the Haar-wavelet transformation consisting of gene expression combinations of neighbouring reaction-pairs. Haar-wavelet transformations add and subtract the values of neighbouring pairs of nodes and multiply them with a constant factor: Be r 0 , r 1 the gene expression values of a pair of reactions, respectively. Applying the transform yields the feature modules f 0 , f 1 : See also [90] for more details. The Haar wavelet transform can be regarded as a low pass filter when performing the summation and a high pass filter when calculating the difference between neighbouring value pairs. Both filters were applied on all pairs of nodes connected by an edge resulting in calculated feature modules.

Statistical testing of the feature modules
All Haar-wavelet generated features were tested by a multiple t-test between aerobic and anaerobic conditions. To correct for potential influences coming from individual mutants, t-tests were performed for every constellation of samples excluding the sample of one particular mutant, respectively. The wild type sample was never excluded. From this outcome the worst (highest) p-value for each feature was selected. All p-values were corrected for multiple testing (Bonferroni, see [27]). Features were then ranked according to their p-value.

Clustering of significant reaction pairs
All significant feature modules were extracted (p-value ≤ 0.01). Sub-graphs were put up by connecting the found significant feature modules (reaction-pairs). This resulted in five larger sub-graphs. To facilitate the interpretation of the found sub-graphs, nodes with equal expression behaviour (up-, down-regulation) were grouped together. To reduce random fluctuations we focused only on larger patterns, i.e. clusters with a cluster size smaller than five were discarded. In total 10 clusters were extracted. Reaction-pairs having one up-and one down-regulated node were regarded as switches. They were extracted if their pvalue was below 0.01.
Note, that our method yielding these clusters is based on two steps: generating feature modules and combining those with a common response yielding significant clusters of co-expression. The first step compares to a low and a high pass filter of the first order Haar-wavelet-transformation, respectively. The second step compares to a low pass filter of the second order Haar-wavelet-transform.

Analysing the found clusters and switches in-depth
All extracted clusters and switches were functionally characterised (see Results and Discussion). An in-depth analysis was performed by scanning the literature. Finally, the analysed clusters and switches were assembled yielding an overall map of the metabolic changes.
sion arrays uncovers the dynamic architecture of phenotype.