Pan-cancer analysis of the metabolic reaction network

Metabolic reprogramming is considered a hallmark of malignant transformation. However, it is not clear whether the network of metabolic reactions expressed by cancers of different origin differ from each other nor from normal human tissues. In this study, we reconstructed functional and connected genome-scale metabolic models for 917 primary tumors based on the probability of expression for 3,765 reference metabolic genes in the sample. This network-centric approach revealed that tumor metabolic networks are largely similar in terms of accounted reactions, despite diversity in the expression of the associated genes. On average, each network contained 4,721 reactions, of which 74% were core reactions (present in >95% of all models). Whilst 99.3% of the core reactions were classified as housekeeping also in normal tissues, we identified reactions catalyzed by ARG2, RHAG, SLC6 and SLC16 family gene members, and PTGS1 and PTGS2 as core exclusively in cancer. The remaining 26% of the reactions were contextual reactions. Their inclusion was dependent in one case (GLS2) on the absence of TP53 mutations and in 94.6% of cases on differences in cancer types. This dependency largely resembled differences in expression patterns in the corresponding normal tissues, with some exceptions like the presence of the NANP-encoded reaction in tumors not from the female reproductive system or of the SLC5A9-encoded reaction in kidney-pancreatic-colorectal tumors. In conclusion, tumors expressed a metabolic network virtually overlapping the matched normal tissues, raising the possibility that metabolic reprogramming simply reflects cancer cell plasticity to adapt to varying conditions thanks to redundancy and complexity of the underlying metabolic networks. At the same time, the here uncovered exceptions represent a resource to identify selective liabilities of tumor metabolism.


INTRODUCTION 1
Dysregulation of cellular metabolism has been implicated in the progression of several cancers 2 as a consequence of oncogenic mutations (1,2). Despite the fact that the regulatory programs 3 underlying the observed metabolic shifts should be tumor specific, it is also known that at the 4 system level metabolic regulation is substantially similar between the tumor and its tissue of 5 origin (3)(4)(5). Even when shown to be selectively essential to cancer cells, the diversity of 6 metabolic phenotypes associated with cancer questions the extent to which these regulatory 7 programs are context-dependent rather than tumor-specific (6, 7). For example, glucose 8 metabolism was shown to vary within tumor regions and between human tumors in lung cancer 9 patients (8) or to depend strongly on the initiating oncogenic mutation and the tumor tissue of 10 origin in genetically engineered mice (9). Human metabolism is a highly complex system, 11 accounting for thousands of reactions and metabolites that interact with the environment to form 12 a connected and functional metabolic network (10). It is plausible that metabolic shifts so far 13 associated with different cancers are yet another expression of the plasticity of these cells to 14 ever-changing conditions in their genome and their environment (11), with the advantage that in 15 metabolism this adaptation can leverage on the high redundancy and complexity of the human 16 metabolic network. The aim of this study was therefore to characterize the landscape of 17 metabolic reactions expressed in different cancers, to define their occurrence depending on the 18 cancer type or mutations in key cancer genes, and finally to identify any difference from 19 metabolic reactions normally expressed in human tissues. 20 Networks represent the natural structure of biological systems, including metabolism (12). 21 Network-dependent analyses not only enhance the interpretability of genome-scale data by 22 providing a context, but also highlight knowledge gaps and experimental artifacts, e.g. networks 23 under standard medium conditions, and this requirement ensures for example that expressed 1 genes that become isolated due to the malignant transformation are eliminated from the network. 2 A functional GEM, on the other hand, can simulate a flux through 56 fundamental metabolic 3 tasks, including biomass growth. Noteworthy, samples used in this study represent a mixed 4 population of cells, predominantly constituted by cancer cells (>80% of tumor nuclei in each 5 sample according to TCGA quality criteria). Therefore, each GEM should be thought as the 6 metabolic network expressed in the tumor, rather than solely representing the cancerous cells. 7 We enforced quality criteria to the reconstructed GEMs to ensure to work with metabolic 8 networks that include and exclude genes primarily because not likely to be expressed in that 9 sample. Out of the initial 1,082 samples, 917 reconstructed GEMs passed these quality criteria 10 ( Fig. S1). We observed that many discarded models were derived from pancreatic 11 adenocarcinoma (PAAD) and clear cell renal cell carcinoma (KIRC) samples. Specifically, 12 GEMs for 15 PAAD samples (52%) and 53 KIRC samples (51%) were discarded because they 13 failed to converge to an optimal solution (Fig. S2). The lack of convergence in the case of KIRC 14 is consistent with the notion of a highly compromised metabolic network that we uncovered 15 previously for this cancer type (5,26). Finally, we inspected whether the genes included in a 16 GEM have indeed a higher expression level in the corresponding sample compared with the 17 genes excluded. This was the case both when looking at individual models (Fig. S3B) and all 18 models simultaneously (Fig. S3A). Nevertheless, we noted that a fraction of lowly expressed 19 genes was still included in most reconstructed models, likely to preserve connectivity and 20 functionality of the metabolic networks. Taken together, we were able to reconstruct GEMs 21 representative of the metabolic network expressed in 917 cancers. 22

Classification of metabolic reactions in core or contextual 1
Compared to the reference GEM, HMR2, which contains 8,184 reactions associated with 3,765 2 genes, the reconstructed GEMs featured on average 4,721 reactions and 1,995 genes (Fig. S4). 3 We observed small variations in the number of reactions, but substantial differences in the 4 number of genes included across GEMs. We performed principal component analysis (PCA) on 5 the gene inclusion matrix, that is the binary g x m matrix where 1 indicates inclusion of HMR2 6 gene g in the m-th GEM, and 0 vice versa. PCA revealed genetic diversity across GEMs 7 attributable to the different cancer type (Fig. 1A), even by taking into account that the 8 reconstruction might be biased in the case of PAAD and KIRC types. Nevertheless, PCA on the 9 reaction inclusion matrix virtually abolishes the above-seen diversity among models (Fig. 1B). 10 This suggest a considerable similarity in metabolic functions in cancer, in spite of gene 11 expression heterogeneity. 12 To shed light on the extent of the similarity in terms of reactions across cancers, we 13 classified a reaction as "core" if included in >95% of all models, "absent" if excluded in >95% 14 of all models, and "contextual" if otherwise ( Fig. 2A-S5). At this threshold, 3,510 reactions are 15 core (95% bootstrap confidence interval [CI], 3,421 to 3,599), 3,455 are contextual (95% CI,16 3,367 to 3,539), and 1,219 are absent (95% CI, 1,157 to 1,284) (Fig.2B). Since reactions can be 17 associated with more than one gene (i.e. isoenzymes), we further distinguished core reactions 18 into "pan" if included in all models because of the same gene-reaction association or "iso" if 19 otherwise ( Fig. 2A). Finally, some reactions have no gene associations (e.g. spontaneous 20 reactions), but they were classified as core because they connected other core reactions. We 21 termed these "conn" reactions. Among core reactions, 2,850 are pan reactions (95% CI, 2,807 to 22 2,897), 69 are iso reactions (95% CI, 52 to 85), and 590 are conn reactions (95% CI, 546 to 632) 23

Analysis of core metabolic reactions in cancer 1
Core reactions are expected to represent the backbone of cell metabolism. Thus, we compared 2 core reactions as defined here to the list of reactions carried by the housekeeping proteome in 3 normal human tissues based on Human Protein Atlas (HPA) data (27) (Fig. 3A). Excluding conn 4 reactions, 2,900 out of 2,919 core reactions (99.3%) are housekeeping also in normal tissues, as 5 expected. Noteworthy, cancer metabolism seemed to be generally associated with a large loss of 6 function in metabolism, i.e. there were 2,083 housekeeping reactions in normal cells that were 7 not present in any of the cancers. Furthermore, 19 reactions were core only in cancers. These 8 were lumped in 5 reaction clusters, defined as sets of reactions encoded by the same gene(s): the 9 mitochondrial conversion of arginine to ornithine and urea (encoded by ARG2); amino-acid 10 transport (SLC6A9 and SLC6A14); prostaglandin biosynthesis (PTGS1 and PTGS2); sulfate 11 transport (SLC26A1, SLC26A2, SLC26A3, SLC26A7, SLC26A8, SLC26A9); and ammonia 12 transport (RHAG). Except for ammonia transport by RHAG, all other reactions have at least one 13 associated gene that is highly expressed in the sample from which the model was reconstructed 14 ( Fig. 3B). At the same time, the corresponding proteins showed weak to no gene expression 15 evidence in at least 30 out of 32 normal human tissues according to HPA (Fig. 3C). This is 16 suggestive of a small acquisition in metabolic housekeeping functions in cancer compared to 17 normal human cells, which should be otherwise considered entirely overlapping. 18 Among the core reactions, we identified 69 core-iso reactions, which were included in all 19 models but due to different gene-reaction associations. These were lumped in 16 reaction clusters 20 ( Fig. 4A). For example, the conversion of N-acetylputrescine to N4-acetylaminobutanal in the 21 metabolism of arginine can be carried out in principle by 5 different amine oxidases (encoded by 22 AOC1, AOC2, AOC3, MAOA, MAOB), whose expression varied considerably among samples for 23 which GEMs had been reconstructed (Fig. 4B). We sought to identify if inclusion of a given 1 isoenzyme in the GEM correlated with particular features of the modeled sample. In particular, 2 we tested if the inclusion had a significant statistical association with the sample cancer type and 3 on the presence of mutations in key cancer genes. We observed that in all cases the expression of 4 a particular isoenzyme was affected by the sample cancer type, but in no instance by the 5 presence of a particular mutated gene (likelihood ratio test, FDR < 0.001). This was well 6 illustrated by the example of N-acetylputrescine degradation (Fig. 4C). This reaction is likely to 7 be carried out by AOC1-MAOA gene products in rectal adenocarcinoma but AOC3-MAOB in 8 breast invasive carcinoma. To provide a simplified yet exhaustive view of these associations, we 9 computed for each cancer type the isoenzyme with the most frequent association with a given 10 reaction cluster (Fig. 4A). Collectively, this result is consistent with the previously observed 11 notion that some degree of diversity in gene inclusion across models was not reflected at the 12 reaction level, due to the expression of cancer type-specific isoenzymes. 13 Figure 3 -Core reactions in cancer versus reactions carried out by the housekeeping proteome in normal human 1 tissues. A) Venn diagram of the reactions classified as core in this study (excluding conn-reactions) (red circle) or 2 housekeeping on the basis of the gene expression pattern in 32 normal human tissues as reported in HPA (green 3 circle). B) Core reactions in cancer but not housekeeping in normal tissues based on HPA data. C) Inclusion of core 4 reactions not classified as housekeeping based on HPA data across the 917 cancer models and matched gene 5 expression in the corresponding tumor samples. For comparison, the gene expression evidence in 32 normal human 6 tissues for the genes encoding these reactions were retrieved from HPA. 7 8 Figure 4 -Core-iso reactions in cancer. Sixteen core-iso reaction clusters (i.e., set of reactions encoded by the same 1 gene(s)) were identified following differential inclusion of isoenzymes in the different models. In all cases, 2 differential inclusion was significantly associated with the sample cancer type. A) Most recurrent gene-reaction 3 association in the models representing a given cancer type for the 16 core-iso reaction clusters (only one 4 representative reaction shown per cluster). B) Expression plot for an example core-iso reaction, the conversion of N-5 acetylputrescine to N4-acetylaminobutanal in arginine metabolism, which is associated with 5 different genes 6 (AOC1, AOC2, AOC3, MAOA, MAOB). Each bar stacks the expression levels of the 5 genes in the tumor sample 7 from which the model was reconstructed. The bar of genes with size-adjusted log-cpm < 0 was neglected. Models 8 were sorted according to AOC1 expression. C) Expression boxplots for the 5 genes encoding the example reaction in 9 B) when binned by cancer type. Keys as in Fig. 1.

Analysis of contextual metabolic reactions in cancer 1
Contextual reactions are included only in a fraction of models. This can be explained in view of 2 how the algorithm treated the input data. A reaction was prone to be excluded from a GEM in 3 two cases: first, the expected expression of the associated gene(s) in the corresponding sample 4 was not distinguishable from noise; or, the expected expression of genes associated with 5 neighboring reactions in the corresponding sample was not distinguishable from noise, hence 6 prejudicing the connectivity of the network. An example of contextual reactions is the ALOX12-7 mediated peroxidation of 12(S)-hydroperoxy-5Z,8Z,10E,14Z-eicosatetraenoic acid (12(S)-8 HPETE) to 12(S)-hydroxyeicosateraenoic acid (12(S)-HETE), a reaction in the metabolism of 9 arachidonic acid that was included only in 163 out of 917 GEMs (17.8%). This was reflected by 10 the expression level of ALOX12 in the corresponding samples (Fig. 5A). Again, we sought to 11 identify if inclusion of contextual reactions depended on the cancer type or on the presence of 12 mutated genes in the sample. Out of 3,455, 3,269 reactions (94.6%) showed a significant 13 association with the cancer type, while 1 reaction had a significant association with a mutated 14 gene (likelihood ratio test, FDR < 0.001). As an example for the former case, the cancer type-15 dependency of the ALOX12-catalyzed reaction was evident from ALOX12 expression level in 16 the different types, which was particularly high in head and neck and lung squamous cell 17 carcinomas and in bladder adenocarcinomas (Fig. 5B). As for the latter case, the only mutated 18 gene-associated reaction is the mitochondrial hydrolysis of glutamine to glutamate encoded by 19 glutaminase-2 (GLS2), which was excluded preferentially in models for tumors in which TP53 20 was mutated (Fig. S7). 21 In order to reduce the complexity of cancer-type dependency in contextual reactions, we 22 computed for each cancer type the fraction of GEMs where the reaction was included (e.g. 0 23 means that no GEMs from samples belonging to a given cancer type included that contextual 1 reaction). Next we performed consensus hierarchical clustering to detect consensus clusters of 2 cancer types where contextual reactions appeared to segregate (Fig. S8). Four clusters were 3 identified: brain tumors (low grade glioma and glioblastoma multiforme); kidney-pancreatic-4 colorectal tumors (clear cell renal cell carcinoma and pancreatic, colon, and rectum 5 adenocarcinoma); female reproductive system tumors (ovarian cancer and uterine corpus 6 endometrial cancer); other tumors (bladder and lung adenocarcinomas, breast carcinoma, head 7 and neck and lung squamous cell carcinomas). Finally, we performed random forest cross-8 validated variable selection to select contextual reactions that exhibited the strongest 9 representativeness to either cluster. We found 49 representative contextual reactions. For 10 example, the transport of sulfoglycolithocholate was preferentially included in brain tumor 11 GEMs (mean in cluster, 82.4%; outside cluster, 19.7%); the conversion of phenylalanine to 12 phenethylamine was almost exclusively present in kidney-pancreatic-colorectal tumor GEMs (in 13 cluster, 89.9%; outside, 1.9%); the conversion of cytosolic malate to pyruvate was less common 14 in female reproductive system tumor GEMs (in cluster, 56.4%, outside, 99.3%); and the above-15 seen conversion of 12(S)-HPETE to 12(S)-HETE was mostly included in other tumor GEMs (in 16 cluster, 27.3%, outside, 1.9%) (Fig. 5C). 17 Figure 5 -Contextual reactions in cancer were mostly dependent of the cancer type and some of these may 1 represent specific gain or loss of metabolic functions compared to the matched normal tissues. A) An example of 2 contextual reaction, the conversion of 12-HPETE to 12-HETE by arachidonate 12-lipoxygenase, ALOX12), and its 3 expression plot across models. B) Expression boxplots for ALOX12 binned by cancer type. C) Barplots for the 4 fraction of models from a cancer type (columns) that contain 4 representative contextual reactions (rows) for each 5 cancer-type consensus clusters (horizontal colored bars). D) Heatmap of the fraction of models from a cancer type 6 (columns) that contained the 49 most representative contextual reactions (rows) for the four cancer-type consensus 7 clusters (see Fig. S8, colored horizontal bars). A blue entry indicates that all models belonging to that cancer type 8 included a given reaction (white if vice versa). Below, maximum expression detected in matched normal tissues 9 among the genes encoding for each representative contextual reaction. Keys as in Fig. 1.  Since virtually all contextual reactions displayed cancer-type dependency, which in turn 1 appeared to cluster by human tissue similarity, we interrogated whether the context-specificity of 2 these 49 representative reactions simply mirrored the expression patterns of the associated genes 3 in the normal tissues of origin, as previously suggested for metabolic genes (3,5). Therefore, we 4 retrieved RNA-seq gene expression data from HPA for all putative tissues of origin for the 5 cancer types in this study (except breast, unavailable). We then extracted which gene had the 6 maximum expression level in each tissue among all genes associated with each reaction. As 7 hypothesized, the expression patterns in normal tissue closely resembled the availability of a 8 given reaction in the GEMs of their malignant counterpart (Fig. 5D). However, we noticed some 9 exceptions, both in terms of gain and loss of metabolic function. As an example of gains of 10 function, the conversion of N−acetylneuraminate−9−phosphate to N−acetylneuraminate was 11 present in all GEMs except in some belonging to the cluster of female reproductive system 12 tumors. However, the associated gene, NANP, is not expressed in any major normal tissue 13 (range, 0 to 4 FPKM). As a second example, the sodium-dependent transport of mannose and the 14 absorption of cholesterol were both featured in GEMs belonging to the cluster of kidney-15 pancreatic-colorectal tumors. However, the encoding genes, SLC5A9 and NPC1L1 respectively, 16 are not expressed in the matched normal tissues, but only in the duodenum and small intestine 17 (range, 52 to 55 FPKM and 41 to 45 FPKM resp.). An example of loss of function was the 18 mitochondrial oxidation of indole-3-acetaldehyde to indoleacetate in tryptophan catabolism, 19 which is catalyzed by several NAD-dependent aldehyde dehydrogenases with wide specificity 20 (ALDH1B1, ALDH2, ALDH3A2, ALDH7A1, ALDH9A1). While these enzymes are largely 21 expressed in all major normal tissues, the catalyzed reaction was mostly absent in all cancer 22 GEMs, except for a fraction of models belonging to the cluster of kidney-pancreatic-colorectal 1 tumors. 2 In conclusion, contextual reactions represented on average 26% of all metabolic reactions 3 in any given cancer network. This analysis revealed that the context-specificity is often related to 4 the cancer type. In particular, most contextual reactions segregated in four tissue clusters and, 5 with few notable exceptions, this pattern resembled the expression profiles of the cancer tissue of 6 origin. 7

Expression of contextual reactions correctly distinguished cancer type clusters 8
Next we sought to validate the 49 contextual reactions here found to distinguish the metabolic 9 networks of four cancer type clusters defined above. To this end, we retrieved expression data 10 from TCGA for 66 genes encoding these contextual reactions in an independent cohort of 4,462 11 tumor samples spanning the same 13 cancer type. Next, we trained a random forest classifier on 12 2,231 samples (28) to assign samples to one of the four cancer type clusters based on the 13 expression level of this 66 gene signature (Fig. S9A). We then tested the so-trained classifier on 14 the remaining 2,231 samples (Fig. S9B). The out-of-bag error was 3.38% in the training set and 15 2.9% in the test set. The multiclass area-under-the-curve (AUC), as defined by Hand and Till 16 (29), was 0.973. 17 The classification performance might be biased due to inherent differences in the gene 18 expression profile of the tissue of origins in the four clusters. In other words, any set of 66 genes 19 could distinguish the four clusters because their expression is supposedly modulated in a tissue-20 specific fashion in healthy conditions. To control for this, we repeated the above multiclass 21 classification using 1,000 random 66 gene-signatures and verified that the classifier based on the 22 expression of contextual reactions outperformed the random classifier in 98.8% cases 23 (permutation test p = 0.012, Fig. S9C). This suggestive that not to any set of 66 genes is 1 differentially regulated between these cancer type clusters simply because of inherent differences 2 in expression in their respective tissue of origins. To further check whether these 66 genes are 3 differentially expressed between cancer type clusters, but yet substantially similar in their 4 corresponding normal tissues, we performed differential expression analysis using a generalized 5 linear model to fit gene expression to the cancer type cluster while controlling for the matched-6 normal tissues of origin. We retrieved 438 tumor-adjacent normal samples from TCGA for the 7 same cancer types as above except brain tumors, for which only 5 normal samples were available 8 (sample size range per cluster: 24 -287). Thus, we analyzed gene expression for the three 9 remaining cancer type clusters (accounting for 3,867 tumor samples, sample size range per 10 cluster: 459 -2452). We observed that 64 of 66 genes in the signature (97%) displayed 11 differential regulation between cancer type clusters (FDR < 0.001). For example, AOC1 is 12 moderately expressed in the female reproductive system tumor cluster, but lowly expressed in 13 the other tumor cluster (FDR = 9*10 -69 , Fig. S9D). For 48 of 66 genes, the expression was also 14 significantly different from at least one cluster of matched normal samples. However, 15 importantly, we found no evidence of expression difference between the corresponding clusters 16 of matched normal samples (FDR > 0.01). For example, AOC1 expression is overexpressed in 17 female reproductive system tumors compared to matched normal samples (FDR = 2*10 -4 ). But 18 normal samples from the female reproductive system had similar AOC1 expression to normal 19 samples from tissues matched to the other tumor cluster (FDR = 0.988). Besides AOC1, we 20 computed the most significant gene for the remaining two cluster comparisons: ALDH2, 21 repressed in the other tumor cluster compared to kidney-pancreatic-colorectal tumor cluster, but 22 similarly expressed in the respective tissue of origins; and CYP3A5, very lowly expressed in 23 female reproductive system tumor cluster compared to the kidney-pancreatic-colorectal tumor 1 cluster, yet again similarly expressed in the respective tissue of origins (Fig. S9D). 2 In conclusion, even though gene expression analysis is not a direct evidence of the 3 occurrence of the underlying reactions and fails to account for the systems of metabolic reactions 4 in which a gene is expressed, these results in a large independent cohort suggest that the 49 5 contextual reactions in the distinct cancer type clusters arise from cancer type-specific regulation 6 of the corresponding genes that are not attributable to substantial differences in the gene 7 expression profiles of the matched tissue of origins, arguing that they may represent cancer type-8 specific metabolic liabilities. 9

DISCUSSION 10
Reprogramming of cellular metabolism is a hallmark of malignant transformation, as suggested 11 by accumulating evidence that uncovered tumor-specific molecular mechanisms of metabolic 12 regulation (1, 2). However, only few studies explored tumor metabolism at the systems level (7). 13 Since networks are the natural structure of complex systems like metabolism, in this study we 14 adopted a network-centric approach to characterize the landscape of metabolic reactions 15 expressed in different cancers. Despite the substantial diversity of the genes included in each 16 model across cancers, the resulting metabolic networks were strikingly similar in terms of 17 included reactions, probing the robustness and redundancy of the human metabolic network. 18 The overwhelming majority of core reactions in this study are carried out by enzymes 19 classified as housekeeping in normal tissues. This argues that cancer cells, regardless of their 20 origin, vastly maintain the backbone of cellular metabolism and no specific metabolic function is 21 universally acquired as a result of the transformation. Potential exceptions to this could be 22 represented by the residual core reactions, which are so-classified only in cancer metabolism 23 networks. While inclusion of these reactions in all GEMs certainly depends on network 1 connectivity and functionality, we also observed an obvious correlation with gene expression in 2 the underlying cancer samples. There could be many alternative biological reasons why these 3 reactions are universally present in cancer metabolic networks while being seldom expressed in 4 normal human tissues. Perhaps an intriguing pattern is the role of both ARG2 and PTGS1/2 in the 5 generation of metabolites that control endothelial cell proliferation (30, 31) and activation of the 6 immune system (31, 32), which are likely to be dispensable metabolic functions in the majority 7 of healthy human cells. Another interesting observation is the presence of core-iso reactions, 8 included in all models because at least one isoenzyme was expressed in the modeled tumor 9 sample. Core-iso reactions were previously observed by Hu et al., for example in the case of 10 aldolase isoenzymes (ALDOA, ALDOB, ALDOC) (3). Here, we complemented these findings 11 and verified that in all cases the differential expression of isoenzymes could be attributed to 12 differences in cancer type between the modelled samples. For example, oxidation of primary 13 amines (like acetylputrescine) is possible in all models because at least one among the spectrum 14 of monoamine oxidases (MAOA and MAOB) and amine oxidases (AOC1, AOC2, AOC3) was 15 expressed in each underlying sample. 16 Contextual reactions were present only in a fraction of the reconstructed metabolic 17 networks. In one case, which is the mitochondrial synthesis of glutamate from glutamine 18 encoded by GLS2, the contextual reaction was strongly associated with absence of TP53 19 mutations in the modeled sample (regardless of its cancer type). This result is strikingly 20 concordant with the notion that p53 directly controls GLS2 expression in either stressed or not 21 stressed conditions (33) and further suggests that the presence of the underlying reaction is partly 22 dictated by mutations in TP53 independent of the cancer type. In 94.6% of the other cases, the 23 inclusion of contextual reactions could be attributed to differences in cancer types, which we 1 narrowed down to differences across four cancer type clusters. In particular, 49 contextual 2 reactions were strongly representative for the four cancer type clusters because of differential 3 regulation of the associated genes across cancers of different types, irrespective of the expression 4 level in the corresponding tissues of origin. At the same time, in general, the pattern of 5 contextual reaction inclusion in a model largely resembled the expression pattern of the 6 associated gene(s) in the matched normal tissue of origins. Exceptions to this were noted for few 7 reactions, for example the mitochondrial oxidation of indole-3-acetaldehyde to indoleacetate in 8 tryptophan catabolism, by several NAD-dependent aldehyde dehydrogenases (ALDH1B1, 9 ALDH2, ALDH3A2, ALDH7A1, ALDH9A1). This case is intriguing because of ALDH2: its 10 cancer type-cluster specificity and the substantial expression difference with the matched normal 11 tissues seem both to stem from the underlying patterns of ALDH2 expression (Fig. S9D). ALDH2 12 is moderately to highly expressed in most normal samples, but it is strongly repressed in all 13 cancers except those belonging to the kidney-pancreatic-colorectal cluster, as observed in our 14 network-centric approach (Fig. S9C). For the other cases, we could not derive equally 15 straightforward relations with gene expression, likely due to the complexity of metabolic 16 network around these reactions. An interesting outcome of this analysis is that the expression of 17 the 66 genes encoding these 49 contextual reactions could accurately classify tumor samples into 18 four groups of cancer types. This clearly points to metabolic similarities between certain cancer 19 types, which can be used for identification of common drug targets for different tumor types that 20 cluster in this analysis. 21 A surprising result was the apparent enrichment of neurological metabolic functions in 22 cancer compared to normal samples. This was evident at different points: the core status of 23 transport reactions catalyzed by solute carrier 6 gene family (the above mentioned SLC6A9 and 1 SLC6A16), which are traditionally classified as neurotransmitter transporters (34) and indeed 2 were not found to be housekeeping in normal tissues; the differential expression of monoamine 3 oxidases (MAOA and MAOB) in different cancer types possibly to guarantee primary amine 4 oxidation as a core-iso reaction, even though these oxidases are classically linked to the 5 metabolism of neuroactive amines; and the inclusion the NANP-catalyzed reaction in all models 6 except in some of those in the female reproductive system tumor cluster, which generates 7 N−acetylneuraminate (or Neu5Ac), a critical component of neuronal membranes. These 8 observations are at odds with the notion that tumors lack innervation, but other explanations for 9 their expression in the metabolic network are possible. On one hand, all these enzymes might 10 fulfill other less characterized metabolic functions. On the other hand, it has already been 11 observed aberrant expression of neuronal genes in non-brain tumors, which has been attributed to 12 not well understood survival advantages (35-37), but define, for example, an established 13 phenotype of prostate cancers (38). 14 Modeling metabolic networks at genome scale comes with challenges, mostly arising 15 from knowledge gaps in the biochemistry of several metabolic reactions (39). The gene-reaction 16 associations in this study were occasionally inferred, since the specificity and in vivo activity of 17 some enzymes is unknown. Even though we drew our conclusions based on stringent statistical 18 models, these shortcomings may introduce a systematic bias that is difficult to estimate and 19 control for in downstream analysis. Hence, we expect that some quantitative claims will be 20 revisited in light of improved knowledge on the biochemistry of human metabolism. An 21 alternative approach could be (differential) expression analysis for each gene associated with a 22 metabolic reaction, as we and others have implemented previously (3)(4)(5). However, we adopted a 23 network-centric approach because reactions might be expressed or not depending on the 1 availability of the neighboring reactions or, in a broader context, of a metabolic pathway or even 2 a metabolic function; or they could be carried out by differentially expressed isoenzymes. This 3 context is lost in canonical gene expression analyses. The network structure may occasionally 4 override information coming from gene expression data. This provides an opportunity for 5 discovery, but at the same time complicates the interpretability of certain results in light of gene 6 expression evidence, as in the case of RHAG expression in Fig. 3C. This can also result in the 7 opposite situation, i.e. metabolic genes seemingly expressed in the tumor yet absent at the 8 reaction level because either unconnected or supporting an unknown metabolic function. Another 9 challenge is the paucity of large-scale cell type-specific data. Despite the TCGA requirement that In conclusion, our findings suggest that tumors express a metabolic network of core 16 reactions with housekeeping functions and cancer type-specific contextual reactions that are 17 virtually overlapping with the corresponding normal tissue of origin. In light of this vast 18 similarity, metabolic reprogramming implicated with cancer transformation might just reflect the 19 plasticity of tumors to adapt to varying environmental and genetic factors by leveraging on the 20 complexity of the metabolic network. This also suggests that targeting tumor metabolism may 21 result in either toxicity or resistance, because the underlying metabolic network supports 22 essentially the same metabolic functions as in the matched normal tissue. Nonetheless, 23 exceptions to this similarity with normal tissues were uncovered in this study, and we believe 1 that this is a valuable resource to further investigate selective liabilities of tumor metabolic 2 networks. 3

MATERIALS AND METHODS 4
Reconstruction of tumor genome-scale metabolic models. Read count tables for 18,956 genes 5 from 1,082 primary tumor samples were retrieved from The Cancer Genome Atlas (TCGA), for 6 which both gene expression and mutation data were simultaneously present: These samples were 7 sequenced using Illumina HiSeq or Genome Analyzer RNA-seq platforms. For each sample, 8 counts were transformed into log-counts-per-million (log-cpm) and multiplied with a 9 normalization factor that accounts for differences in library sizes across samples, which was 10 calculated using the edgeR R-package (41). The resulting size-adjusted log-cpm for a gene is an 11 expression level measurement comparable across samples (42). The probability that a gene is 12 truly expressed in an individual sample was estimated using a Bayesian statistical framework by 13 computing the probability that its expected distribution of read counts in that sample is better 14 explained by the expected distribution of read counts of genes that should not be expressed in 15 that samples. The expected size-adjusted log-cpm for gene i in sample j was calculated based on 16 specific features of sample j (like the belonging to a certain cancer type or the presence of a 17 mutation in a key gene) using a pre-specified generalized linear model (1) from (43):  18 (1) 19 where the expected size-adjusted log-cpm for gene i in sample j is equal to the linear 21 combination of sample feature variables ( 1 ) and coefficients for the effect of that feature on 22 gene i ( / ). The coefficients were fitted for each gene to the observed size-adjusted log-cpm 1 across the 1,082 samples by ordinary least square regression using voom R-package (42). The 2 linear model for the expected expression of gene i is referred to as θ "#$# % . We reasoned that gene 3 i is truly expressed in sample j if its observed size-adjusted log-cpm is better explained by a 4 normal distribution with parameters derived from θ "#$# % , rather than by linear models 5 constructed on the expected size-adjusted log-cpm of genes that are not supposed to be expressed 6 in sample j. We selected 7 testis-specific gene products as "noise" genes: ACRV1, ADAM2, 7 BOLL, DKKL1, FMR1NB, TEX101, and ZPBP2. These proteins are not detected in any other 8 tissues according to the Human Protein Atlas (27), yet reads that align to their loci are seldom 9 detected by the RNA-seq platform. The posterior probability that gene i is truly expressed in 10 sample j, or in other words the likelihood of the linear model θ "#$# % . for gene i being better than 11 alternative linear models θ "#$# 8 ., was than calculated using the Bayesian formula (2)

15
where the prior probability of the linear model θ "#$# ; was set equal to 50% and the prior 16 probabilities for each alternative linear model θ "#$# T derived from each of 7 "noise" genes was 17 equally split so that they would sum to 50%. The output is a m x n matrix of posterior 18 probabilities P, where m is 18,956 genes and n is 1,082 samples, and V,W ∈ 0,1 . 19 The posterior probabilities' vector for sample j was used to score each reaction in the 20 automatic reconstruction of a genome-scale metabolic model (GEM) from the reference generic 21 human GEM, HMR2. HMR2 contains 8,184 reactions associated with 3,765 genes. The 1 reconstruction was performed using the tINIT algorithm, as described previously (24). Briefly, 2 the algorithm implements a mixed-integer linear problem (MILP) that, starting from a reference 3 metabolic network (in this case HMR2), maximizes the inclusion in the final model of reactions 4 that are positively scored while maximizing the exclusion of reactions that are negatively scored. 5 Other constraints in the MILP guarantee that the final model features a connected and functional 6 network, in that it can simulate a flux in each reaction under standard medium conditions, and it 7 can simulate a flux through 56 fundamental metabolic tasks, including biomass growth (24). In 8 each reconstructed sample j, the reaction score was equal to the posterior probability of the 9 associated gene in sample j, as returned from (2). If more than one gene was associated with a 10 reaction, the most positive score was retained because indicative that the reaction can be carried 11 out by at least one expressed gene product, as previously implemented (25, 44). We selected a 12 threshold equal to 0.99 that we subtracted to the posterior probabilities' vector for sample j, 13 before scoring the reactions, so that only genes with a posterior probability > 99% to be truly 14 expressed (as defined above) have positive values and will tend to be included by tINIT. We 15 selected this threshold as an arbitrary metric reflecting 99% statistical confidence, which we 16 reasoned to be a conceptually superior over arbitrary thresholds based on expression scores as 17 previously implemented in automatic reconstruction of GEMs (25,45,46). The parameters for 18 the reconstruction were identical for all samples, in particular the maximum allowed relative gap 19 in the optimal solution returned from the MILP was set to 5%. The reconstruction was 20 implemented in Matlab 7.11 using Mosek v7 as solver. 21 Quality control of reconstructed tumor GEMs. We evaluated the quality of the reconstruction 22 according to the following criteria: convergence to an optimal solution; the accuracy of gene 23 inclusion should be > 90%; the corresponding Matthews correlation coefficient, an unbiased 1 estimator of accuracy, should be > 0.80; and the number of reactions added during the 2 reconstruction in order to be able to perform the 56 metabolic tasks considered in the tINIT 3 algorithm should be < 1% of all reactions included in the model. These thresholds were selected 4 to prevent reconstructions of questionable quality to be included in downstream analysis, in 5 consideration that most reconstructions converged successfully and only some outliers fall below 6 the selected values ( Figure S1). The accuracy for model k derived from sample j was calculated 7 in terms of true/false positives/negatives in the final model k, where a true positive is a gene with 8 posterior probability > 99% in sample j and included in model k and a true negative is a gene 9 with posterior probability <= 99% in sample j and excluded in model (false positive/negative if 10 vice versa). Of 1,082 initially considered samples, 917 GEMs could be reconstructed that 11 satisfied the above criteria. Principal component analysis (PCA) to evaluate similarity of the 917 12 reconstructed GEMs was performed on the gene/reaction inclusion matrix, where each row is a 13 gene/reaction in the reference GEM, HMR2, and each column represents a reconstructed GEMs. 14 A matrix cell is equal to 1 if the corresponding HMR2 gene/reaction is included in the 15 corresponding reconstructed GEM. PCA was performed on unscaled data with mean centering 16 using the ade4 R-package (47).

Reaction classification and analysis.
A HMR2 reaction was classified as "core" if included in > 18 95% of all reconstructed GEMs; "absent" if included in less < 5% of all reconstructed GEMs; 19 "contextual" if otherwise. We selected these thresholds because the underlying distribution 20 seemed to reach a plateau at these values (Fig. S5). Core reactions were further sub-classified 21 into "pan" if their inclusion was due to inclusion of the same associated gene(s) in > 95% of all 22 reconstructed GEMs and "iso" if otherwise, meaning that different isoenzymes are present in 23 different models for that core reaction. We estimated how robust was the number of core, 1 contextual and absent reactions to potential outliers in the set of reconstructed GEMs by 2 bootstrapping the count of each category 1,000 times. We computed the count of core, contextual 3 and absent reactions in 1,000 sets of 917 random GEMs, reconstructed by including v random 4 genes from the gene-reaction association matrix of HMR2, where v is a 1x917 vector equal to the 5 number of genes included in the actual set of 917 GEMs reconstructed using tINIT. Core and 6 contextual reactions were mapped to the generic KEGG metabolic map using iPath2 (48) (if 7 univocally identified using KEGG IDs) and grouped according to HMR2 metabolic sub-systems. 8 Core reactions in tumors were compared to housekeeping reactions in normal tissues, so 9 classified based on Human Protein Atlas (HPA) evidence for the associated genes. We compiled 10 this list by including genes that were either expressed (> 0.5 FPKM) in all 32 normal tissues 11 considered by HPA ("expressed in all" HPA category); or in > 30 normal tissues; or expressed in 12 a subset of the 32 tissues that share little functional similarity, e.g. CBLN3 in cerebellum, lung 13 and small intestine, which suggests that the gene may be expressed promiscuously across human 14 tissues ("mixed" HPA category). We counted 69 core-iso reactions, which could be lumped in 16 15 reaction clusters, sets of reactions encoded by the same gene(s). We tested whether the 16 preferential inclusion of an isoenzyme in certain model could be attributed to the cancer type or 17 presence of mutations in 9 key cancer-associated genes, which were featured in the linear model 18 (1), using a likelihood ratio test. After adjustment for multiple testing (49), an association was 19 considered significant if the false discovery rate (FDR) < 0.001. The most recurrent gene-20 reaction association present in models belonging to a certain cancer type was chosen as the most 21 likely isoenzyme to carry out the core-iso reaction in that cancer type. 22 We tested if the preferential inclusion of a contextual reaction in a certain model could be 1 attributed to the cancer type or presence of mutations in 9 key cancer-associated genes, using a 2 likelihood ratio test. Since 94.6% of contextual reactions showed an association with cancer type, 3 we sought to reduce the complexity by deriving clusters of cancer types where a contextual 4 reaction is included more or less frequently than expected. We performed consensus clustering 5 on a l x c matrix, where c is the number of cancer types (13) and l is the number of contextual 6 reactions (3,269), and each matrix cell is the fraction of models derived from samples of the 7 corresponding cancer type containing the corresponding contextual reaction. The optimal 8 number of cluster w = 4 was selected based on the observation that for w > 4 the relative change 9 in the area-under-CDF-curve was < 0.1, where CDF stands for the empirical cumulative 10 distribution function of consensus distributions for up to w clusters. Consensus clustering was 11 implemented using ConsensusClusterPlus R-package (50). The 49 contextual reactions most 12 representative of the four clusters were chosen based on variable selection procedure applied to a 13 random forest classifier constructed to assign a model to either of the four consensus clusters 14 based on inclusion of contextual reactions. Random forest-driven variable selection was 15 implemented using the varSelRF R-package (51). 16 Validation of cancer type cluster-dependency of contextual reactions. Gene expression 17 profiles for 4,462 primary tumor samples from the same 13 cancer types (sample size per type: 18 94 to 978) were retrieved from TCGA and transformed to size-adjusted log-cpm as described 19 above. The profiles were randomly split in a training and test set so that 50% of samples 20 belonging to a certain cancer type cluster were included in each set. A random forest classifier 21 was constructed on the expression level in the training set of 66 genes, selected because 22 associated with the 49 contextual reactions most representatives of the different clusters (see 23 above). The performance in the classification of samples in the test set to the 4 cancer type 1 clusters based on the 66 gene expression signature was evaluated by calculating the multiclass 2 AUC (29). The same procedure was repeated for 1,000 random 66 gene expression signatures. 3 The random forest classification was implemented using the RandomForest R-package (28). 4 Differential gene expression analysis for the 66 gene expression signature was performed. 5 Samples belonging to the brain tumor cluster because not enough tumor-adjacent normal samples 6 were found to match these cancer types in TCGA. We retrieved 438 normal samples from TCGA 7 that matched the remaining cancer types. We evaluated differential gene expression between 8 tumor vs. normal samples belonging to the same cluster, between tumor samples belonging to 9 different clusters, and between normal samples belonging to different clusters. After adjustment 10 for multiple testing, a gene was considered differentially expressed between groups if FDR < 11 0.001. The differential gene expression analysis was implemented using the voom and limma R-12 packages (42, 52).     Figure S5 -Fraction of models that include a given reaction from the reference model HMR2 (bars). If a reaction is 1 included in more than 95% of models (872 out of 917 models, red horizontal line), it is classified as a core reaction.

2
If excluded in more than 95% of the model, then it is classified as an absent reaction. Any reaction in between is 3 classified as contextual. 4 5 6 Figure S6 -Classification of reactions in the different metabolic sub-systems (same color code as used in Fig. 2B).  Figure S7 -Glutamine conversion to glutamate by glutaminase 2 (GLS2) is the only contextual reaction with a 1 significant association with a mutation, TP53. In the barplot, the fraction of models derived from samples with wild-2 type TP53 vs. mutated TP53 that include the GLS2-encoded reaction.  NORMAL  TUMOR  NORMAL  TUMOR  NORMAL  TUMOR  NORMAL  TUMOR  NORMAL  TUMOR  NORMAL  TUMOR  NORMAL  TUMOR  NORMAL  TUMOR  NORMAL  TUMOR  COAD -