Epigenetic control of metabolic identity across cell types

Constraint-based network modelling is a powerful tool for analysing cellular metabolism at genomic scale. Here, we conducted an integrative analysis of metabolic networks reconstructed from RNA-seq data with paired epigenomic data from the EpiATLAS resource of the International Human Epigenome Consortium (IHEC). Applying a state-of-the-art contextualisation algorithm, we reconstructed metabolic networks across 1,555 samples corresponding to 58 tissues and cell types. Analysis of these networks revealed the distribution of metabolic functionalities across human cell types and provides a compendium of human metabolic activity. This integrative approach allowed us to define, across tissues and cell types, i) reactions that fulfil the basic metabolic processes (core metabolism), and ii) cell type-specific functions (unique metabolism), that shape the metabolic identity of a cell or a tissue. Integration with EpiATLAS-derived cell type-specific gene-level chromatin states and enhancer-gene interactions identified enhancers, transcription factors, and key nodes controlling core and unique metabolism. Transport and first reactions of pathways were enriched for high expression, active chromatin state, and Polycomb-mediated repression in cell types where pathways are inactive, suggesting that key nodes are targets of repression. This integrative analysis forms the basis for identifying regulation points that control metabolic identity in human cells.


Introduction
Identifying the molecular hallmarks of cell identity is crucial for understanding cellular differentiation in development and disease and can inform strategies for cellular reprogramming.Although the concept of cell identity remains debated, it is often inferred from cellular functions and phenotypes governed by gene expression patterns maintained by identity-specifying transcription factors (TFs) 1 .
Metabolism is a key determinant of cell identity.For example, in Type II Diabetes, high blood glucose levels can disrupt beta cell identity, resulting in loss of beta cell-specific gene expression and gain of gene expression programs of other islet cell types 2 .Similarly, oxidative phosphorylation and ROS signalling drives the differentiation of intestinal stem cells into Paneth cells 3 .The centrality of metabolism in cellular identity is further supported by the observation that culture media, rather than tissue of origin, governs pluripotent stem cell identity.
Metabolic identity extends the cellular identity concept to the activities of biochemical pathways in a cell.Metabolic identity includes the core metabolism, a set of biochemical reactions shared across cell types that maintain energy and chemical homeostasis, and unique metabolism, or cell type-specific metabolic processes in differentiated cells, which correspond to specialized tasks including alcohol detoxification in the liver and melanin production in skin melanocytes.
Metabolic network modelling, which integrates cell type-specific epigenomic maps with the biochemical reactome, is a powerful approach to capture metabolic identity across cell types and identity disruptions in disease states [4][5][6][7][8][9][10] .Metabolic models provide key insights into how the inactivation of rate-limiting enzymes controls metabolic flux, interactions between metabolic pathways and compartments, and the ultimate capacity of cells to perform metabolic functions 11 .
Cell-type-specific metabolism is achieved through precise regulation of pathway enzymes through the epigenetic control of gene expression, post-translational modifications, and compartmental sequestration of metabolites and co-factors.In support of this, we previously demonstrated that cells place high regulatory loads on rate-limiting pathway enzymes 12 .Specifically, loci encoding transporter proteins that control substrate availability show high H3K27ac signal in human macrophages 13 .
More generally, cell-type-specific gene expression is maintained by regulatory elements such as promoters, enhancers, and silencers, whose activities have been mapped genome-wide through comprehensive ChIP-seq, ATAC-seq, and DNase-seq profiling of histone modifications and open chromatin generated by the Roadmap Epigenomics 14 , ENCODE 15 , and other consortia.The recent International Human Epigenome Consortium EpiATLAS resource 16 greatly expands these resources, providing an unprecedented opportunity to dissect the regulatory control of metabolism.Notably, enhancer-gene interactions inferred in the EpiATLAS resource using the generalized Activity-By-Contact (ABC) model 17 provide opportunities to connect regulatory elements, and their upstream transcription factor regulators, to metabolic genes.
Probabilistic graphical model methods, including ChromHMM 18 and Segway 19 , infer cell type-specific chromatin states from combinatorial and spatial patterns of epigenomic signals.While ChromHMM and Segway define chromatin states at the level of individual loci, the recent ChromGene 20 method defines gene-level chromatin state by integrating histone modification signals throughout the gene body and flanking regions using a mixture of hidden Markov models 21 .ChromGene defines six active states (1-6), one poised state (7), and five repressive states (8-12) (Supplementary Table 1), based on combinations of ten histone modifications.This gene-level view provides better insight into which genes encoding for enzymes and transporters in the metabolic networks are active at the chromatin level, and which are transcriptionally repressed.
In this work, we reconstructed metabolic models across 1,555 biosamples from the IHEC EpiATLAS resource spanning 58 human cell types and tissues to identify core and tissue-specific metabolic reactions.We leveraged EpiATLAS enhancer-gene interactions to discover regulatory elements and upstream transcription factors controlling these processes.We observe that transporter genes are enriched for cell type-specific regulatory control and that many core metabolic pathways expressed across tissue types exhibit cell-type-specific regulatory control, providing a mechanistic basis for tissue-specific metabolic reprogramming of core pathways.

Metabolic models capture core and tissue-specific metabolism
We used RNA-seq data of 1,555 samples from the EpiATLAS 16 resource to reconstruct metabolic models that served as scaffolds to map gene-level chromatin states (ChromGene) and enhancer activity data from the corresponding samples to the human metabolic network to define: 1) the core metabolic biochemical reactions across cell types and tissues, 2) the metabolic identity for individual cell types and tissues, and 3) the regulatory mechanisms controlling this identity (Figure 1). Figure 1: Systematic modelling of epigenetic control of metabolism.The number of integrated RNA-seq and ChIP-seq samples for each tissue and cell type is represented as bar plots.We used RNA-seq profiles (left bar plot) from 58 tissues and cell types to reconstruct metabolic networks with rFASTCORMICS 5 .These models were then used as scaffolds to map enhancer-activity scores (1,565 samples across 56 cell types and tissues, right bar plot) as well as gene-level ChromGene chromatin state annotations (1,698 samples across 57 cell types and tissues, not shown as samples distribution is similar to the enhancer activity scores) to identify key regulation points.12 ChromGene states were defined based on the integration of the chromatin marks: strong trans enh (highest expression), trans, and trans wk (transcribed but at different expression levels depending on the presence of H3K27ac H3K4me1, and H3K36m3), ZNF (Zinc Finger-related genes), poised, PC repr wk (weak Polycomb-mediated repression), bivalent (high level of repressive mark H3K27me3, and moderate levels of H3K4me1 and H3K4me3), het (genes associated with heterochromatin), PC repr (Polycomb-mediated repressed genes).
These metabolic reconstructions span 58 tissues and cell types.Blood cells account for the largest fraction of these models, with 356 T cell, 303 monocyte, 224 neutrophil, and 107 lymphocyte B lineage models (Figure 1, left bar plot).The median number of model reactions per cell and tissue type ranged from 2,360 (T cell) to 4,252 (liver) (Figure 2A).The twelve model groups with the lowest median number of reactions are models for blood and immune cells, notably T cell, lymphocyte of B lineage, neutrophil, venous blood, and natural killer cell models (Figure 2A).By contrast, hepatocyte, trophoblast, placenta, colon, and kidney models have the highest number of reactions, an indication of their greater metabolic complexity.The substantial size of the liver and hepatocyte models is consistent with the large number of liver-specific metabolic tasks such as drug metabolism, bile acid synthesis, homeostasis of carbohydrates, amino acids, lipids, and fat-soluble vitamins 22,23 .
While many basic metabolic processes are shared across tissues and cell types, we hypothesized that individual tissues and cell types (1) fulfil core metabolic processes through tissue-specific reactions and regulatory mechanisms and (2) have tissue-specific metabolic processes for tissue-specific functions (e.g., bile acid synthesis in hepatocytes).In support of these hypotheses, we observed that a t-SNE embedding of samples by metabolic models distinguished cell-type clusters (Figure 2B).This result is further supported by greater Jaccard similarity (in reaction composition) within tissue types than between tissue types (Supplementary Figure S1A).Notably, hepatocytes and neutrophils define the extremes of the first principal component (Supplementary Figure S1B), suggesting a highly distinct metabolism from other cell types.Taken together, these data demonstrate that metabolic models capture the cell type-specific metabolic character of cell types.
We next defined the set of 'core' metabolic reactions as the set of 541 reactions (from 10,600 total reactions) in Recon3D 24 which were found in over 99% of models.These include reactions in pathways required for cellular homeostasis, including reactive oxygen species (ROS) detoxification, fatty acid synthesis, squalene and cholesterol synthesis, purine synthesis, and nucleotide interconversion.Moreover, pathways implicated in the maintenance of the extracellular matrix, such as chondroitin sulphate metabolism, and energy metabolism, were present ubiquitously across all cell types and tissues (Figure 3A).
Next, we computed the number of reactions present in at least 90% of samples in each tissue or cell type.253 reactions fulfil this criterion include reactions in fatty acid synthesis and oxidation, inositol phosphate metabolism, and glycolysis/gluconeogenesis.Notably, even for pathways commonly shared across all tissues and cell types, some reactions were specifically included only in some models (Figure 3B and Supplementary Figure S2).In contrast to these highly common reactions, 4,799 reactions were included in less than 1% of models (3,944 were never included, comprising redundant exchange and transporter reactions, drug metabolism reactions, and reactions not under gene control and thus not supported by the data) (Supplementary Table S3).Some bile acid synthesis, drug metabolism, cytochrome C, steroid metabolism, and fatty acid oxidation pathways were unique to a single tissue or cell type, namely hepatocytes.However, we observed that other reactions in these pathways were present also in other tissues, suggesting that pathway activation or repression may occur at specific key reactions rather than at the level of the whole pathway.3: Metabolic models capture core and tissue-specific metabolic reaction pathways.A) Presence rates computed for each reaction and tissue type.Pathways were ranked according to their mean presence rate across models.Columns represent the reactions included in at least one of the models and rows represent the cell and tissue type.Some pathways (e.g., ROS, fatty acid synthesis and squalene pathways) were included in most metabolic models, while other pathways (e.g., drug metabolism, bile acid synthesis) were unique to hepatocytes.Even among the pathways with the highest presence rate, some reactions have a cell type-specific or samplespecific expression pattern.We observe a similar pattern for unique pathways (e.g., bile, drug metabolism) which contain besides unique reactions a small fraction of ubiquitous reactions.B) The number of reactions per pathway present in at least 90% of the tissue and cell type models.Reactions of central carbon metabolism are among the reactions with the highest presence rate.
Lymphoma or leukaemia cell lines, as well as blood and immune cells, have a more compact metabolism (i.e., smaller number of reactions) (Figure 3A), but also tend to have a lower reaction presence rate in most pathways, as evidenced by the brighter horizontal lines in Figure 3A.This finding suggests that, for some samples, these reactions with a lower presence rate can be shut down, possibly due to environmental signals or, in case of cancer cells, due to mutations.Some branches of fatty acid oxidation (FAO) were absent or had low presence rates in most cell types except for liver, placenta, trophoblast, extraembryonic cell, and mole.Finally, the reaction presence rate of metabolic models correlates with the median expression levels associated with the metabolic genes mapped to Recon3D, demonstrating the sensitivity of model building pipeline (Supplementary Figure S3).Taken together, the reconstructed metabolic models reveal core and tissue-specific metabolic characteristics.
Core and unique metabolic genes are associated with higher enhancer activity and expression levels than non-housekeeping genes shared by multiple tissues While genes controlling core metabolism, by definition, must be expressed ubiquitously, we sought to determine whether these commonly expressed genes are differentially regulated across cell types.We defined a set of core genes (n=3,530) as genes with active ChromGene 20 annotation across all EpiATLAS cell types and tissues (Methods).We also defined a set of core metabolic genes (n=306) as the subset of core genes present in Recon3D.Similarly, we defined unique (n=1,013) and unique metabolic genes (n=121) as the set of genes active in only a single tissue or cell type.The core gene set was enriched for genes implicated in the metabolism of RNA and processing of introns, while core metabolic genes were enriched in mRNA splicing, translation, energy production, biosynthesis pathways, lipid metabolism, and ROS metabolism (Supplementary Table S3).Both core gene sets were strongly enriched for common essential genes obtained within the DepMap project 25 (Figure 4A).934 out of 1,549 common essential genes and 48 out of 79 common essential metabolic genes were core and core metabolic genes, respectively.Core genes and core metabolic genes together captured 60% of the common essential genes.The missing 31 essential metabolic genes were associated with active ChromGene states (1-6) in most tissues and cell types, with one exception, gamma-glutamyl transferase (GGTLC2), which carried a repressive chromatin state across most samples (Supplementary Figure S4).We observed similar patterns for non-metabolic core essential genes: except for a few genes that are associated with a repressive state in most cell types, core essential genes carry activating states (ChromGene state 1-6) in most cell types (Supplementary Figure S4).The enrichment for overlaps between common essential and core genes implicates these genes in key metabolic pathways that are highly conserved across cell types and essential for cell survival.
We next used the generalized Activity-By-Contact (ABC) method 17,26 to examine the enhancers linked to core and core metabolic genes.For each gene, we computed an enhancer activity score by summing the activity of all enhancers predicted to interact with a gene per cell type or tissue (Methods).Consistently, higher median enhancer activity scores were observed for the core and core metabolic genes (Figure 4B).Also, the increase of the median gene expression was more pronounced for the core and core metabolic genes compared to all genes and all metabolic genes (Figure 4C).This is consistent with the previous observation that core fitness genes have substantially higher expression than non-fitness genes 27,28 .Also, core metabolic genes are associated, by definition, with activating ChromGene states 1-6 and a higher enhancer activity (Figure 4D).There was a substantial number of genes with ChromGene state 1 (corresponding to highly expressed genes with strong signals of active marks such as H3K4me1 and H3K27ac) among the genes expressed in hepatocytes and extraembryonic cells, as well as neutrophils, monocytes, lymphocyte B lineage, venous blood, peripheral blood mononuclear cells, and myeloid cells.A) Core and core metabolic genes are enriched for essential genes common to most cell types (hypergeometric p-value < 3e-12).B) Core and core metabolic genes have higher enhancer activity scores than genes on average (t-test p-value <1e-12) and C) higher median expression values (t-test p-value <1e-12).D) Core metabolic genes have high expression values (log2 TPM, left), are linked to active ChromGene states (middle), and have a high enhancer activity across most cell types and tissues (right).Rows contain the 306 core metabolic genes and columns contain the cell types with at least 3 samples for each data type.E) Similar to D but representing 121 unique metabolic genes.Unique genes have a high expression in a single tissue or cell type and have low or no expression in the other tissues.There is also a high number of Polycomb-repressed genes, heterochromatin states, and low enhancer activity scores for these unique genes in other tissues.
For unique genes, we found both higher expression and higher enhancer activity to correspond to an active ChromGene state (Figure 4E, Supplementary Figure 5).Of the 1,013 unique genes, 177, 144, and 109 were specific to hepatocytes, brain, and muscle, respectively (Supplementary Figure S6A).These sets were enriched for gene ontology terms matching their cellular identities (e.g., brain enriched for protein-protein interactions at synapses).Among the 121 unique metabolic genes, as many as 67 were hepatocyte-specific genes and were consistently enriched for tasks and pathways specific to hepatocytes (Figure 4E, Supplementary Figure S6B, and Table S4-S5).The high expression of these unique gene sets matched high enhancer activity scores while low expression and enhancer activity was observed in other tissues.Polycomb-mediated repression was the most common repressive chromatin state at the cell type-specific genes (Figure 4E).However, the number of Polycomb-repressed genes among the 121 unique genes varied between tissues.Muscle cell, lymphoma and leukaemia samples, and cancer cell lines had 62, 60 and 57 Polycomb-repressed unique metabolic genes, respectively.This suggests that besides gene activation in unique cell types, repressive states, and extensive Polycomb-mediated repression, could also contribute to shape metabolic identity.

TFs controlling metabolic identity show cell type-specific activity
Next, we asked whether the high and consistent expression of core genes and core metabolic genes was controlled by the same TFs across tissues, or whether core metabolism is subject to cell typespecific regulation.To this end, we integrated cell type-specific enhancer-gene interactions to obtain sets of regulatory elements linked to each core and core metabolic gene for each cell type (Figure 5A).Many TFs were uniformly enriched at enhancers associated with core genes across different cell types (Figure 5A).Conversely, we observed that some TF regulators of core metabolic genes were uniquely enriched in specific cell types and tissues.Overall, 271 non-redundant TF motif modules and tissuespecific regulatory regions were significantly enriched at enhancers associated with core metabolic genes (FDR<0.01),and 40 of these had absolute enrichment scores over 1.2.These TFs control common processes such as growth, proliferation and differentiation (EGR) 29 , oncogenic and embryonic development (TFAP2, SPDEF, ZFX, ZIC1, ZBTB7A, PAX, GLIS) [30][31][32][33][34][35][36][37][38][39][40] , regulation of chromatin structure (MECP2, HINFP, MBD2, CTCF, KAISO) [41][42][43][44][45][46] , and cell cycle (E2F, RUNX) 47,48 .Moreover, MYB is expressed mainly in hematopoietic tissues, colonic crypts, and neural stem cells.In contrast, RUNX1 is expressed in hematopoietic cells 49 , but also plays a role in the innate immune system of nonhematopoietic cells 50 .Nuclear receptors broadly regulate metabolism and inflammation 51 .Motifs of TFs implicated in the maintenance of embryonic development (ZSCAN4) 52 , cell development and homeostasis (FOX2), and liver regeneration (ZNF490) 53 were depleted from enhancers associated with core metabolic genes.HINFP, ZNF490, GMEB2 showed the largest variation in enrichment.Furthermore, only 6 TFs were common essential genes: CTCF, NRF1, THAP1, YY1, ZNF143, and ZNF335, consistent with redundant roles among TFs.Interestingly, we observed an inverse association between TF enrichment values for core metabolic genes and unique metabolic genes (Figure 5B), suggesting distinct upstream regulators of core and tissue-specific metabolic processes.Consistently, some TFs with negative enrichment values in core metabolic genes, such as ZSCAN4 and ZNF24, have high enrichment values for the unique metabolic genes (Figure 5C).A) The enhancergene interactions were used to obtain a union set of regulatory elements for ten samples randomly chosen from the cell types and tissues with more than 10 samples for the core and core metabolic genes.TFs implicated in common biological processes such as growth proliferation, and development response to stress or environmental changes were enriched for core metabolic genes.Only TFs with absolute enrichment above 1.2 are depicted.B) Negative correlation between enrichment (log2, FDR<0.01) of TFs for core metabolic genes vs unique metabolic genes.C) Same analysis as A) but for unique genes.Only TFs with an absolute enrichment above 1.3 are depicted.
Other TF groups with high enrichment at enhancers of unique metabolic genes included CUX 54 (implicated in cell cycle control and DNA repair), POU 55 (warranting cell identity and differentiation) and SOX (SOX9 plays a role in liver homeostasis and regeneration 56 ).
Most of these TFs and motifs have a wide number of target genes and were shown to be dysregulated in numerous cancers [57][58][59] .A more detailed understanding of the cell type-specific control of core and unique genes could allow targeted interventions in specific cell types such as diseased cells.
Epigenetic control shapes metabolic identity by regulating solute carrier genes in a tissuedependent manner Encouraged by our findings that unique genes are marked by Polycomb-mediated repression in other tissues, we next sought to determine if this repression was targeted to genes controlling key nodes in metabolic pathways or broadly across genes in metabolic pathways.In total, 750 metabolic genes were associated with Polycomb-mediated repression in at least one tissue (Figure 6A, Supplementary Figure S8).We observed that leukaemia or lymphoma cell lines (409 metabolic genes), muscle cells (348), natural killer cells (286), and cancer cell lines (282) have the highest number of Polycombrepressed metabolic genes.Conversely, stem cell-derived cell lines, mammary gland epithelial cell lines, and brain samples have the lowest number (38, 55 and 67).Mammary gland epithelial cells are instead enriched for poised promoters (Supplementary Figure S9), possibly linked to their precise, highly regulated response to hormonal signals.The poised and Polycomb-repressed ChromGene states are mutually exclusive, explaining the lower number of Polycomb-repressed genes in the cells carrying many poised promoters.The 409 Polycomb-repressed genes in leukaemia or lymphoma cell lines are enriched for pathways such as bile secretion, purine metabolism, and steroid hormone biosynthesis, which are mostly active in hepatocytes.Furthermore, in all the tissue types Polycombrepressed genes are enriched for genes encoding for solute carriers (SLCs) (Figure 6B).The proportion of SLCs among the Polycomb-repressed genes is between 15.8% (1.3-fold enriched) and 25.5% (2.1fold enriched) for stem-cell derived cell lines and epithelial cells of the endometrial gland, while overall they represent only 12.4% of the metabolic genes (Figure 6A).For lymphoma or leukaemia cell lines, 76 SLCs (out of a total of 229 SLCs, p = 1.e-5) and 8 (ATP)-binding cassette transporters (out of 22, p = 0.08) (Figure 6A) are Polycomb-repressed.
To analyse the distribution of Polycomb-mediated repression across metabolic pathways, we compared the binarized ChromGene states and the Polycomb-repression distribution (ChromGene state 12).A core metabolism cluster highlighting mostly the same pathways as in Figure 3A is enriched for active chromatin state (Figure 6C top) across all tissues and cell types and depleted of any Polycomb-repression (Figure 6C bottom).Several pathways have a presence rate of above 0.6 for ChromGene state 1 in three or more tissues (Supplementary Table 8, Supplementary Figure 10).Hepatocytes have a presence rate above 0.6 for 21 pathways indicating once again high metabolic complexity.For Polycomb-mediated repression only a few pathways have a presence rate above 0.3 in more than one cell type (Supplementary Figure S10).Leukaemia and lymphoma cell lines had the highest presence rates for Polycomb-repression in butonate metabolism, blood group synthesis, alanine and aspartate metabolism, vitamin D, O-glycan metabolism, glycine, serine, alanine, and threonine metabolism.Unlike what is seen for active chromatin states, Polycomb-repression never affects every reaction of a pathway, but only some specific branches of pathways (Supplementary Table S9).ChromGene states 1-6 were associated with a score of 1, and inactive ChromGene states between 7 and 12 with a score of -1.Gene expression was then mapped using the Gene-Protein-Reaction rules of the metabolic network to obtain a score for each reaction.A mean score for each pathway was computed by summing the individual scores and dividing by the number of reactions (top).Bottom: Same plot but considering ChromGene state 12 (Polycomb-repressed, bottom panel).The black ribbon depicts the number of reactions per pathway.

Transporters and other pathway entry points are key nodes for epigenetic regulation of the metabolic networks
As SLCs are enriched for Polycomb-mediated repression, and hence could serve as switches within the metabolic network, we tested whether SLCs were also enriched among the uniquely expressed metabolic genes (Figure 7A).Within the 121 unique genes, we observed an enrichment for cytochrome P450 (CYP) enzymes (12 out of 48, p = 3e-05), which is consistent with the enrichment for hepatocyte-specific genes.Furthermore, 33 out of 371 genes controlling transport reactions, were among the 121 unique metabolic genes (p = 0.02), (3 ATP binding transporters, 21 SLCs and the transporter genes NPC1L1, SULT2A1, AQP8, PRODH2, RBP2, RBP4, RHCG, ATP1A3, ATP1A4).
To take advantage of the structural information of the metabolic reconstruction, the 121 uniquely active metabolic genes were mapped to the networks to unravel key switches.Among the 5,938 reactions under known gene control in the generic Recon3D reconstruction, 3,744 reactions are associated with transport.Of these, 2,241 are controlled by gene products, while the remaining 1,503 are diffusion reactions.
Among the 269 unique reactions under a selective expression regulation (controlled by one or several of the 121 genes), 151 were transporter reactions controlled by 17 genes.Consequently, an enrichment for unique expression regulation for the transporters was also found at the reaction level (p = 1e-10) (Figure 7B).97, 28, 6, and 4 out of the 151 transporter reactions are controlled by SLCO1B1, SLC22A10, SLC5A8 and SLC22A2, respectively.Besides, extracellular transporters (151/2652), bile acid synthesis (27/185), drug metabolism (13/573), glycine, serine, alanine, and threonine metabolism (11/47), cytochrome C (8/15) and glycolysis/gluconeogenesis (6/42) have the largest number of reactions controlled by unique genes.These reactions are often importing metabolites into the cell or are the first reaction after a transport reaction to the endoplasmic reticulum or another compartment, as exemplified by the drug metabolism pathway (Figure 7C).Accordingly, drug metabolism is inactivated in other tissues by repressing their import in the cell or repressing enzymes in the endoplasmic reticulum metabolising drugs (Figure 7C).Similarly, we have already shown the importance of regulation of entry points for the bile acid pathway (IHEC flagship paper, under preparation).This pathway has a large fraction of genes and associated reactions with a ChromGene state of 1 (highly expressed in hepatocytes) (Figure 7D).In other tissues, this pathway seems to be mostly inactive due to the Polycomb-repression of genes controlling switches in the network (Figure 7D).Overall, the metabolic network analysis shows an orchestrated epigenetic regulation combining activating and repressing signals to control key nodes in the networks in a cell-type-specific manner.A) Among the 121 unique genes, 12 out of 48 genes and 33 out of 271 genes code for CYP proteins and transporter coding genes, respectively (hypergeometric test).B) Among the 256 unique reactions, 152 out of 2,241 are transporter reactions (hypergeometric test).C) Schematic representation of the drug metabolism pathway in hepatocytes.Reactions under unique control are depicted in red.These unique reactions are often located in the endoplasmic reticulum and generally after a transport reaction between two compartments or an import into the cell.D) ChromGene states linked to the bile acid genes are depicted.Some genes implicated in drug metabolism are often uniquely expressed in hepatocytes and control 13 out of 573 reactions.In other tissues, these genes are often Polycomb repressed.

Discussion
In this study, we reconstructed 1,555 metabolic network models capturing core metabolic pathways and processes and the basis of the distinct metabolic identities across cell types.Both core and unique metabolic genes have a higher enhancer activity, and consequently, higher expression values, compared to the remaining metabolic genes, providing robustness to the control of vital processes, consistent with the high number of common essential genes among the core genes.60% of common essential genes were included among the core genes.We defined core genes based on strict criteria (active ChromGene states for all cell types and tissues), and consequently, many of the common essential genes missing from the core gene set were associated with active chromatin states in most cell types.Gene set enrichment analysis further showed that unique genes (considering both metabolic and non-metabolic unique genes) are sufficient to infer the identity of the tissue and cell types.Unique genes exhibit active ChromGene state in the cell type they are expressed in and strong Polycomb repression in other tissues.Notably, stem cells exhibit fewer Polycomb-mediated repressed genes, suggesting a relationship between differentiation stage and repressive mechanisms.Conversely, we observe more Polycomb-repressed genes in cancer cell lines, consistent with Polycomb dysregulation in cancer 60 .
We observed that transport reactions, specifically SLCs, were enriched for unique metabolic genes and often Polycomb-repressed.This is consistent with the strict regulation of transporters, as SLCs control the import of metabolites, and channel them into storage or degradation compartments.While precise transporter regulation is energetically expensive, it enables more robust regulatory systems where regulation is enforced only at key points of the networks.SLCs have diverse expression patterns with potential substrates that range from ions, lipids, vitamins, waste products to drugs 61,62 .Some members of the SLC superfamily are ubiquitously expressed, while others are cell-or even context-specific.Mutations in 20 % of SLC genes are linked to Mendelian 63 diseases.SLCs are also often dysregulated in cancer, as the ability to control non-vital metabolic pathways can facilitate tumor growth.SLCs are often hypermethylated in cancer, and SLC expression has been linked to cancer clinical outcomes 64 .
Finally, we showed that core genes are under the regulatory control of TFs linked to essential biological processes such as growth, proliferation, and chromatin structure.However, most of these TFs are not essential, suggesting redundant regulatory systems.Notably, TFs controlling core metabolism were often depleted in the unique gene sets.As an example, binding sites for KAISO were enriched in core metabolism and depleted in the unique gene sets of several tissues.KAISO regulates development in normal tissue and plays a role in tumorigenesis through its impact on cell cycle and methylation.Similarly, binding sites for HINFP, which is critical for late embryogenesis cell cycle control 65 , were broadly depleted from unique gene sets despite strong enrichments in the core metabolic set.HINFP is crucial in the control of the cell cycle during late embryogenesis 65 .
In summary, our analysis, empowered by the comprehensive EpiATLAS resource, has characterized the metabolic identities across 58 cell types and tissues, delineated core and tissue-unique metabolic components, and, by integrating cell-type-specific gene-level chromatin states and enhancer-gene interactions, identified regulatory elements, upstream transcription factor regulators, and key pathway nodes controlling metabolism.We observed that transporters, specifically SLCs, and first reactions of pathways were enriched for high expression, active chromatin state, and Polycombmediated repression.This analysis serves as the foundation for uncovering regulatory sites that govern metabolic identity in human cells, providing a basis for tissue-specific metabolic reprogramming.

Methods
We accessed RNA-seq profiles (1,555 samples across 58 cell types and tissues), ChromGene HMM states (1,698 samples), and enhancer activity data (1565 samples) from the IHEC EpiATLAS resource (https://epigenomesportal.ca/ihec/) 16 .Of these, 965 samples had expression, ChromGene, and enhancer activity data and 938 samples belonged to tissues and cell types with at least 3 samples.RNA-seq and ChromGene data were accessed from the IHEC sftp server in batch mode.The summed enhancer activity on gene-level was taken from the IHEC sftp as well ('gABC_GeneSummedEnhancer_geq2.txt.gz').This file sums the H3K27ac signal of all enhancers of a gene, based on interactions predicted with the gABC-score 17 .The score of genes without sufficient activity at any of their promoter (≥2 H3K27ac signal bp around any annotated TSS) was set to zero.

Reconstruction of metabolic models
1,555 sample models were reconstructed for each RNA-seq sample by the rFASTCORMICS 5 using Recon 3D 24 as input reconstruction, as previously described.Briefly, rFASTCORMICS discretizes RNAseq values into expressed, not expressed, and undetermined expression status.Expressed and not expressed genes are mapped to the generic reconstruction (here Recon3D) to form two sets of reactions: core and inactive reactions.Core reactions are supported by the data and should be included in the model.Inactive reactions are under the control of genes that are not expressed and hence are removed from the network.rFASTCORMICS uses FASTCORE 9 to build a flux-consistent network by adding to the set of core reactions (which are not blocked due to the presence of inactive reactions in the same pathway), a next-to-minimal set of non-core reactions.The models were not medium-constrained.

Jaccard Similarity Index, t-SNE, and Principal Component Analysis
For each pair of models, the Jaccard Similarity Index (JSI) was computed, and the models were then clustered according to the JSI using the Euclidean distance.To verify that the clustering on the models captured metabolic variations, we performed principal component analysis on the metabolic genes retrieved from Recon3D using the factoextra package 66 .
We generated a t-SNE embedding of the sample models using the tSNE package 67 in R. Briefly, we constructed a binary matrix encoding reaction presence across 10,600 Recon 3D reactions in 1,555 samples.We excluded reactions absent from all samples in our dimensionality reduction analysis.

Core, core metabolic, unique, and unique metabolic genes
A consensus ChromGene score is the most represented cell type and tissue gene mark.If the most represented mark accounts for less than 50% of the samples and is an activating mark and the other marks are repressive then a score of zero is assigned.Similarly, if the majority mark accounts for less than 50% of the samples and is repressive and the other marks are activating marks, the samples get a consensus score of zero.We defined a set of core genes as those with active consensus ChromGene states (states below 7) for all 33 cell types in the set of 938 EpiATLAS samples with expression profiles, ChromGene annotations, and enhancer activity data.Conversely, we defined unique genes as those which contained active consensus ChromGene states in only one tissue.We then defined core metabolic and unique metabolic gene sets as the subsets of core and metabolic genes included in Recon3D.The log2 expression values, consensus ChromGene states, as well as the enhancer activity were mapped to the metabolic genes of Recon3D and plotted with the ComplexHeatmap package 68 .

Enrichment of common essential genes among core genes and gene set enrichments
To validate the core sets, we computed the enrichment of common essential genes (genes essential across all cell lines in DepMap) downloaded from https://depmap.org/portal/download/all/(CRISPRInferredCommonEssentials.csv) in the core sets using a hypergeometric test.

Presence rate of ChromGene states
The consensus ChromGene states were mapped to the metabolic genes, and a ChromGene score for each reaction was computed using the Gene-Protein-Reaction rules (GPR rules).GPR rules state which genes control a reaction and if the gene products act as protein complexes (denoted by BOOLEAN AND), or as isozymes (BOOLEAN OR) e.g., Gene 1 AND Gene 2 OR Gene 3. To obtain a single score for each reaction, the maximum ChromGene value was taken for genes linked by a BOOLEAN OR and the minimum for BOOLEAN AND.The ChromGene was binarized to obtain a negative (-1) and a positive score (1) for ChromGene states below or equal to 7 and above 7, respectively.A mean score was then computed for each pathway.A similar approach was used for the genes linked to ChromGene 12 (Polycomb-repression) (Figure 6).

Enrichment of transporter genes and reactions
Hypergeometric tests were performed to assess the enrichment of SLCs and CYPs in unique metabolic genes.Furthermore, genes controlling reactions belonging to the transporters subsystems (pathways) in Recon3D were also assessed for enrichment.Finally, reactions of the transporter reactions were also tested for enrichment among the set of unique reactions (reactions that have an activating consensus ChromGene in a unique cell type and tissue).

Transcription Factor Motif Enrichment Analysis
To identify upstream regulators of shared and tissue-specific metabolic gene regulation, we performed overlap enrichment analysis between transcription factor motif occurrences and metabolic gene-linked regulatory regions.Briefly, we began with the previously described sets for core and unique metabolic genes.For a particular sample, we limited these sets to genes with active ChromGene states in the corresponding tissue type.We used the previously described set of samplespecific enhancer-gene interactions to identify sets of gene-associated regulatory elements.We used the union set of regulatory elements from 10 samples in each of 20 tissue categories in our analysis.
We accessed the human reference genome-wide motif occurrences of 280 non-redundant archetypal transcription factor motif modules 70 .We calculated a tissue-specific binomial-based enrichment between pathway-associated enhancers and motif archetype occurrences 71 using the joint set of all core and unique gene regulatory regions as background regions.
To identify tissue-specific regulators of core and unique metabolic processes, we visualized enrichments across tissue categories.We pruned the resulting heatmaps by retaining motif modules with absolute enrichment of at least 1.2 and 1.3, for modules regulating core genes and unique metabolic genes, respectively.
than between different cell types and tissues.B) Principal component analysis of the metabolic gene expression data.Neutrophils are separated from the other blood cell samples.Non-blood cells are separated from the blood cells along PC1.The hepatocyte and liver samples like neutrophils also form a distinct cluster but at the other extremity of PC1 and PC2.

Figure 2 :
Figure 2: Sample models capture metabolic complexity across tissues and cell types.A) The distribution of the number of model reactions per cell type and tissue is represented as a box plot.The number of reactions in each

Figure
Figure3: Metabolic models capture core and tissue-specific metabolic reaction pathways.A) Presence rates computed for each reaction and tissue type.Pathways were ranked according to their mean presence rate across models.Columns represent the reactions included in at least one of the models and rows represent the cell and tissue type.Some pathways (e.g., ROS, fatty acid synthesis and squalene pathways) were included in most metabolic models, while other pathways (e.g., drug metabolism, bile acid synthesis) were unique to hepatocytes.Even among the pathways with the highest presence rate, some reactions have a cell type-specific or samplespecific expression pattern.We observe a similar pattern for unique pathways (e.g., bile, drug metabolism)

Figure 4 :
Figure 4: Core and core metabolic genes are essential and exhibit high enhancer activity and high expression.A) Core and core metabolic genes are enriched for essential genes common to most cell types (hypergeometric p-value < 3e-12).B) Core and core metabolic genes have higher enhancer activity scores than genes on average (t-test p-value <1e-12) and C) higher median expression values (t-test p-value <1e-12).D) Core metabolic genes have high expression values (log2 TPM, left), are linked to active ChromGene states (middle), and have a high enhancer activity across most cell types and tissues (right).Rows contain the 306 core metabolic genes and

Figure 5 :
Figure 5: Core metabolic genes and unique metabolic genes are enriched for different TF sets.A) The enhancergene interactions were used to obtain a union set of regulatory elements for ten samples randomly chosen from the cell types and tissues with more than 10 samples for the core and core metabolic genes.TFs implicated in

Figure 6 :
Figure 6: Polycomb repression shapes metabolic networks by repressing solute carriers.A) Number of Polycomb-repressed SLCs over number of Polycomb-repressed genes across cell types.B) Proportion of SLCs in

Figure 7 :
Figure 7: Transporter and CYP genes are enriched among the unique genes and transport reactions are enriched among the unique reactions.A)Among the 121 unique genes, 12 out of 48 genes and 33 out of 271 genes code for CYP proteins and transporter coding genes, respectively (hypergeometric test).B) Among the 256 unique reactions, 152 out of 2,241 are transporter reactions (hypergeometric test).C) Schematic representation of the drug metabolism pathway in hepatocytes.Reactions under unique control are depicted in red.These unique reactions are often located in the endoplasmic reticulum and generally after a transport reaction between two compartments or an import into the cell.D) ChromGene states linked to the bile acid genes are depicted.Some genes implicated in drug metabolism are often uniquely expressed in hepatocytes and control 13 out of 573 reactions.In other tissues, these genes are often Polycomb repressed.
gene ChromGene States across cell types.60% of common (left) and metabolic essential genes (right) were in inactive ChromGene states (above seven, in blue) across 33 cell types.The remaining genes were tagged with active ChromGene state (1-6, yellow and green) for most cell types.However, several genes, such as gamma-glutamyl transferase (GGTLC2) were associated with repressive marks in almost all cell types.

Table S5 : Unique metabolic genes for hepatocyte, brain, and muscle are enriched for organ-matched gene ontology terms.
Unique genes specific to one tissue were defined by having a consensus ChromGene below 7 in one single tissue or cell type.