Large-Scale Differential Gene Expression Transcriptomic Analysis Identifies a Metabolic Signature Shared by All Cancer Cells

Cancer-dependent metabolic rewiring is often manifested by selective expression of enzymes essential for the transformed cells’ viability. However, the metabolic variations between normal and transformed cells are not fully characterized, and therefore, a systematic analysis will result in the identification of unknown cellular mechanisms crucial for tumorigenesis. Here, we applied differential gene expression transcriptome analysis to examine the changes in metabolic gene profiles between a wide range of normal tissues and cancer samples. We found that, in contrast to normal tissues which exhibit a tissue-specific expression profile, cancer samples are more homogenous despite their diverse origins. This similarity is due to a “proliferation metabolic signature” (PMS), composed of 158 genes (87 upregulated and 71 downregulated gene sets), where 143 are common to all proliferative cells but 15 are cancer specific. Intriguingly, the PMS gene set is enriched for genes encoding rate-limiting enzymes, and its upregulated set with genes associated with poor patient outcome and essential genes. Among these essential genes is ribulose-5-phosphate-3-epimerase (RPE), which encodes a pentose phosphate pathway enzyme and whose role in cancer is still unclear. Collectively, we identified a set of metabolic genes that can serve as novel cancer biomarkers and potential targets for drug development.


Introduction
Recent advances in cancer research have demonstrated the remarkable complexity of this disease [1]. In addition to the classification based on the tissue of origin, tumors are further characterized into distinct tumor subtypes [2]. These subtypes can then be categorized by the expression of selective markers that, in many instances, have a definite effect on the patient outcomes [3]. However, despite this complexity and heterogeneity, about 20 years ago, a series of shared hallmarks that broadly describes the biology of cancer was proposed [4] and refined a decade later [5]. One such hallmark that is emerging is the recognition of the unique metabolic profile that distinguishes cancer cells from non-proliferating normal tissues [6].
The transition of cells from quiescence to a proliferative state is accompanied by substantial metabolic rewiring. These alterations are thought to enable cells to synthesize sufficient biomolecules to support the highly-demanding replication machinery, as well as to maintain other cellular functions,

Median-of-Medians Calculation
In order to calculate the median of normal expression, we first calculated the median of each gene in a given tissue G tissue . Following this, we calculated the median of all of the G tissue to get the median-of-medians for each gene (G all ).

PMS Calculation
The MERAV database contains 16 sample sets, in which the expression patterns of normal tissue, primary tumors, and cancer cell lines originating from the same tissue are presented. The tissues that expressed all three types were identified, and the median of each normal (normal median-of-medians) tissue was determined ( Figure S2a). Then, for each tissue, we compared the cancer cell lines' expression to the normal median-of-medians. The median of all the tissues was combined to one matrix, by which the median value of each gene was then calculated. For each gene, the positive and negative values were separated to generate a score that calculates the median and the number of positive arrays ( Figure S2b,c).

RNA Preparation and RT-PCR Analysis
Total RNA was isolated from cells using the NucleoSpin ® RNA Kit (MACHEREY-NAGEL, Germany), and reverse-transcription was performed using qScript cDNA Synthesis Kit (Quantabio, Beverly, MA, USA). The resulting cDNA was diluted in DNase-free water (1:10) before quantification by real-time quantitative PCR. The mRNA transcription levels were measured using SYBR Green PCR master mix Blue Mix HI-ROX (PCR Biosystems, London, UK) and StepOnePlus (Applied Biosystems, Foster City, CA, USA). All data are expressed as the ratio between the expression level of the target gene mRNA and that for actin. Primers used for qRT-PCR were obtained from Integrated DNA Technology and are listed in Table S8.

Analysis of Different Databases
The Rosario et al. database [20] includes the expression ratio between 24 normal tissues and tumors as provided by the cancer genome atlas (TCGA). For each cancer type, expression profile in the three gene sets (all metabolic genes, PMS upregulated, and PMS downregulated) was determined. Following this analysis, we calculated the mean expression profile of all gene set in each cancer type and presented it as a scatter plot. In addition, we analyzed the "gene expression profiling interactive analysis" (GEPIA, http://gepia.cancer-pku.cn/index.html) [21]. By applying this database, we compared the median expression of the PMS genes between normal and tumors from the same tissue of origin.

Determining the Correlation between the PMS Gene Set and Patient Outcomes
For each member of the PMS gene set (both up and downregulated), we determined the overall survival (OS) using the Kaplan-Meier plotter website (http://kmplot.com/analysis/) [22]. The combined data of all PMS gene set hazard ratio (HR) and their p-values are presented as volcano and violin plots. The p-value and the HR values were determined by the KM plotter website.

Statistical Analysis and Graphs
All of our statistical analyses, plots, and graphs were generated using R version 3.3.3 or Prism 8 (GraphPad Software, San Diego, CA, USA). The violin plots were generated using the R "ggplot2" library. In each violin plot, a boxplot was added that demonstrates the mean and one standard deviation (SD).

Normal Tissues Demonstrate a Tissue-Specific Metabolic Gene Expression Pattern
To gain a comprehensive view of the expression of all the metabolic genes in cancer, we first determined their expression pattern in normal samples [23,24]. The MERAV database contains 726 arrays derived from normal samples, which represent the expression profile of 31 different tissues (Table S1) [18]. This sample number is relatively large and therefore provides a method to analyze the metabolic gene expression in different tissues. The drawback of analyzing normal human tissues is the data collection process, which is mainly performed post-mortem, which can affect the analysis. Thus, we first compared our human gene expression profile to gene expression in mice, as sample collection in mice is faster and more reliable [25,26]. Pearson correlation between mouse and human metabolic gene expression ( Figure S1a) demonstrated a statistically significant (p < 0.001, Mann-Whitney U test) high correlation (mean = 0.898 ± 0.143) between samples derived from the same tissue relative to the comparison with other tissues (mean = 0.762 ± 0.2) ( Figure S1b). This high correlation between the species verified that human metabolic gene expression reflects the actual expression of normal healthy tissues, and reinforces the accuracy of the MERAV database.
In order to achieve a clear tissue-selective expression profile, we analyzed the metabolic gene expression pattern in normal human samples. In the MERAV database, the normal samples are represented by 726 arrays that are not equally distributed between the tissues, and therefore can affect the analysis. For example, the central nervous system (CNS) is represented by 149 arrays, whereas the spleen by only five (Table S1). To avoid a skew towards tissues with many samples, we chose a maximum of 10 arrays to represent each tissue expression profile (representative normal arrays) (the exact representative arrays that were used for the analysis are listed in Table S1). These representative arrays were then assembled to a new matrix and subjected to an unsupervised hierarchical clustering analysis (Figure 1a). In addition, we excluded nine tissues that were represented by four or fewer arrays in the analysis (Table S1). We found that most of the samples derived from the same tissues co-cluster together including samples obtained from similar lineages, such as the immune system (bone marrow, hematopoietic, and tonsil), or reproductive system (embryo, oocyte, and testis). However, several of the samples, including the breast tissue, adrenal gland, and thyroid gland clustered separately, suggesting a unique metabolic gene expression pattern ( Figure 1a). Interestingly, samples generated from the lung tissue were divided into two groups, one co-cluster with the stomach and the second with the hematopoietic lineage samples (Figure 1a). This co-cluster with hematopoietic lineage conforms with a recent study which reported the presence of hematopoietic progenitor cells in the lung [27]. Importantly, despite the fact that the data was generated from different studies, the samples of the same tissue still clustered together, and this implies the robustness of the MERAV database.
Next, we defined a selective expression profile of metabolic genes in different tissues. We calculated the median-of-the-medians for each metabolic gene (G all ) (see Section 2), which resulted in a value representing each gene's basal expression level. Then, we compared all of the normal representative samples to G all and presented their relative expression as a heatmap (Figure 1b). Based on this heatmap, we subdivided the metabolic genes into three groups ( Figure 1c and Table S2): (I) "Ubiquitously expressed genes", a set of metabolic genes that do not demonstrate any tissue-selective expression pattern (37% of the genes; 630 genes). (II) "Tissue enriched", metabolic genes that show selective expression in several tissues (40% of the genes; 675 genes) ( Figure S1c). (III) "Tissue-specific" metabolic genes that are highly expressed only in a single tissue (23% of the genes; 399 genes) ( Figure 1e and Table S3). We referred to tissue enriched and tissue specific groups as the "tissue-selective" gene set, composed of 1074 genes. Furthermore, based on the presence/absence matrix [28], we subdivided this gene set into "uniquely expressed genes" (i.e., genes that are only expressed in a given tissue) and "highly expressed genes" (i.e., genes that are highly expressed in a given tissue) ( Figure 1d). Then, we validated our gene expression analysis by correlating our identified tissue-selective genes with known tissue-exclusive metabolic pathways. Therefore, we assigned a corresponding pathway for each gene, as described previously [29], and applied a Fisher's exact test to determine the enriched metabolic pathways in each tissue (Table S4). For example, the liver selective genes include those encoding known liver-specific metabolic pathways, such as drug metabolism cytochrome P450, amino acid metabolism, fatty acid metabolism, and urea cycle (Table S4). Together, we identified that most of the metabolic genes demonstrate a selective expression pattern, which corresponds to tissue metabolic pathways.
Our analysis identified the testis, after the liver, to be the tissue with the highest number of specific genes (Figure 1e). Interestingly, we found this tissue to selectively express glycolysis/pentose phosphate pathway (PPP) isozymes (Figure 1f) such as pyruvate dehydrogenase alpha 2 (PDHA2) [30], glyceraldehyde-3-phosphate dehydrogenase spermatogenic (GAPDHS), phosphoglycerate kinase 2 (PGK2) [31], and lactate dehydrogenase C (LDHC) [32]. We validated this bioinformatic analysis by quantitative polymerase chain reaction (qPCR), which confirmed the selective expression of testis-specific genes ( Figure 1g). Together, our analysis recognized the tissue-specific metabolic expression, which can serve as a platform to determine cancer-dependent metabolic rewiring among genes. subdivided this gene set into "uniquely expressed genes" (i.e., genes that are only expressed in a given tissue) and "highly expressed genes" (i.e., genes that are highly expressed in a given tissue) ( Figure 1d). Then, we validated our gene expression analysis by correlating our identified tissueselective genes with known tissue-exclusive metabolic pathways. Therefore, we assigned a corresponding pathway for each gene, as described previously [29], and applied a Fisher's exact test to determine the enriched metabolic pathways in each tissue (Table S4). For example, the liver selective genes include those encoding known liver-specific metabolic pathways, such as drug metabolism cytochrome P450, amino acid metabolism, fatty acid metabolism, and urea cycle (Table  S4). Together, we identified that most of the metabolic genes demonstrate a selective expression pattern, which corresponds to tissue metabolic pathways.
Our analysis identified the testis, after the liver, to be the tissue with the highest number of specific genes (Figure 1e). Interestingly, we found this tissue to selectively express glycolysis/pentose phosphate pathway (PPP) isozymes (Figure 1f) such as pyruvate dehydrogenase alpha 2 (PDHA2) [30], glyceraldehyde-3-phosphate dehydrogenase spermatogenic (GAPDHS), phosphoglycerate kinase 2 (PGK2) [31], and lactate dehydrogenase C (LDHC) [32]. We validated this bioinformatic analysis by quantitative polymerase chain reaction (qPCR), which confirmed the selective expression of testis-specific genes ( Figure 1g). Together, our analysis recognized the tissue-specific metabolic expression, which can serve as a platform to determine cancer-dependent metabolic rewiring among genes.  The bar on the right represents the selective and ubiquitous expression pattern. (e) A pie chart that represents the expression of tissue specific genes by tissues. All the tissues are determined based on the number of their specific genes. This chart represents the top 10 tissues with the most specific genes; the number of specific genes is indicated. CNS-central nervous system. (f) The glycolysis and the pentose phosphate pathway (PPP) contain testis selective isoforms. The testis selective enzymes are in red. Arrows represent an enzymatic reaction; metabolites are presented as dots. (g) Gene expression analysis validation. The liver, lung, and testis RNA were converted to cDNA and subjected to quantitative real-time PCR (qPCR). The expression level of all tissues is relative to that of the lung. Each value represents the mean ± SD for n = 3.

Cell Transformation Is Accompanied by a Loss of the Tissue-Specific Expression Profile
The MERAV database contains the expression profile of samples derived from both normal tissues and cancers, which were normalized together [18]. This feature provides a tool to analyze cancer-dependent metabolic gene expression profiles globally. Thus, we first separated the arrays by sample type (normal tissues, primary tumors, and cancer cell lines), and determined the Pearson correlation coefficient of all the samples (Figure 2a). We found that as opposed to normal samples that presented a broad distribution (mean = 0.75 ± 0.1), the distribution of both primary tumors (mean = 0.83 ± 0.05) and cancer cell lines (mean = 0.84 ± 0.03) are narrow relative to normal tissues (standard deviation of 0.1 in normal tissues versus 0.05 and 0.03 in primary tumors and cancer cell lines, respectively). This broad distribution pattern in normal samples supports our finding of tissue-selective genes, as each tissue has a clear and defined expression profile that distinguishes it from other tissues ( Figure 1b). However, it also indicates that the transformation from normal tissues to proliferating cancer cells results in samples with similar metabolic gene expression profiles. This loss of tissue identity was further demonstrated in liver and kidney samples, which were chosen since they had the highest tissue-enriched metabolic gene expression ( Figure S1c). For both tissue types, we confirmed a significant downregulation in tissue-enriched gene expression in both primary tumors and cancer cell lines (Figure 2b). This finding demonstrates that upon transformation, cells tend to lose their tissue-selective genes and gain a more homogenous expression pattern.
Next, we generated a systematic tool to identify the metabolic gene signature shared by all cancers. Specifically, we compared all the arrays in the MERAV database (normal tissues, primary tumors, and cancer cell lines (4281 arrays)) to the previously calculated basal gene expression values for normal tissues (G all ) (Figure 1b). Then, we applied hierarchical clustering analysis and identified that the arrays were separated based on the sample type ( Figure 2c). As described above (Figure 1b), the normal tissues displayed a clear tissue enriched metabolic expression. In contrast, proliferative samples (primary tumors and cancer cell lines) exhibited a set of metabolic genes that are consistently upregulated or downregulated in all of their arrays (Figure 2c). The location of primary tumor samples between normal tissues and cancer cell lines is attributed to their nature of being a mixture of cancer and normal samples. Interestingly, a small fraction of the normal tissues, including samples mostly derived from embryos, clusters with the cancer cell lines. This suggests that the shared signature is not restricted to cancer but is present in all proliferating cells.
We isolated the metabolic gene set (Figure 2c) that is upregulated (87 genes) and downregulated (71 genes) throughout all the cancer cell lines and primary tumor arrays ( Figure S2a) and designated it as the "proliferation metabolic signature" (PMS). We found that all cancer types, despite their tissue of origin, express the PMS gene set, indicating its essential role in proliferation (Figure S2b-e and Table S5). The proliferation-dependent expression alteration of PMS genes was then individually tested in liver and lung cell lines by qPCR (Figure 2d). We further validated that the PMS gene set is not limited to the MERAV database, as it demonstrated the same expression pattern both in data generated by Rosario et al. [20] (Figure 2e) and by GEPIA (http://gepia.cancer-pku.cn/index.html) [21] ( Figure 2f). In addition, by literature search, we identified examples of PMS genes that demonstrated a cancer-dependent alteration in their expression at the protein level (Table 1). Together, our bioinformatics tools identified the PMS dataset, which enables the cancer cells to gain similar expression patterns. Furthermore, the PMS cancer dependent expression pattern was validated by other datasets and at the protein level.

The Proliferation and Cancer-Specific Signature
The normal proliferative samples clustered at the edge of the cancer cells' array ( Figure 2c), suggesting that several of the PMS genes are proliferation-dependent and not unique to tumors. To identify these candidates, we took advantage of our large-scale gene expression database that includes the expression of 79 non-cancer cell lines [18]. These cell lines contain the human breast epithelial cell line MCF10A, the vascular endothelium cell line HUVEC, the testis cell lines HS-1-T, primary human osteoblast, bone marrow, and mesenchymal stromal cells. In order to systematically identify the PMS genes that are cancer-specific, we first determined the expression of each gene in normal tissues (N) and cancer cell lines (C). Then, we defined the expression of the same gene in the non-cancer cell lines (NC) and compared the distribution with that of C as well as N. Accordingly; we determined a distance score, which is defined by the absolute value (NC-C)-(NC-N) for each gene of the universal signature. Genes that demonstrate expression changes in both cancer and non-cancer cell lines (distance > 0) were considered as proliferation-driven expression (Figure 3a and Table S5), whereas genes that demonstrate expression changes only in the cancer cells (distance < 0) were considered as cancer-specific. By applying this score, we found that 144 genes (91% of the genes) are indeed expressed in both cancer and non-cancer cell lines, whereas 14 genes (9%) demonstrated a cancer-specific signature (Figure 3a).
From the PMS gene set, we selected four representative genes for each group. (i) Carbamoylphosphate synthetase 2 (CAD), the rate limiting enzyme in the pyrimidine biosynthesis pathway [46], was upregulated in both cancer and non-cancer cell lines. (ii) Glycine amidinotransferase (GATM), which catalyzes the rate-limiting step in the synthesis of creatine [47], is downregulated in both cancer and non-cancer cell lines (Figure 3b). (iii) More cancer-specific genes, including mannosidase, endo-alpha-like (MANEAL) that plays a role in the N-glycan maturation [48], are overexpressed only in cancer cell lines. (iv) Glutathione S-transferase alpha 4 (GSTA4), which is one of the 16 human cytosolic GSTs [49], is downregulated only in cancer cell lines (Figure 3b). Together, we conclude that the PMS is mainly composed of proliferative genes, which are not specific to cancer. However, some PMS genes demonstrated a tumor-dependent expression.

The PMS Gene Set Is Enriched in Rate-Limiting Enzymes
The PMS is composed of genes encoding enzymes belonging to diverse metabolic pathways (Table S6). However, in all of these pathways, only a fraction of these members belongs to the PMS gene set. Thus, we speculated that one of the mechanisms by which cancer cells regulate proliferation is through the activity of the rate-limiting enzymes (RLEs), rather than the upregulation of the entire pathway. These RLEs are characterized by their relatively low rate of catalysis, hence, regulating the metabolic flux [50]. Therefore, we assume that tight control of RLE expression levels can influence the entire metabolic pathway. In humans, there are 149 RLEs [50], but not all of them are found suitable for our analysis, as nine enzymes are demarcated as "non-metabolic" based on our definition, eight enzymes were duplicates, and two (Gamma-Glutamyltransferase 1 (GGT1), and UDP Glucuronosyltransferase Family 1 Member A1 (UGT1A1)) did not pass the array quality control of the MERAV database [18]. We found that the PMS set is enriched in genes that encode to RLEs (41 out of 158) (Figure 3c and Table S7), demonstrating that the proliferation machinery is regulated by the upregulation of selective enzymes in the pathway and not by all the members. These results suggest that focusing on gene-based analysis, specifically on RLEs, rather than on the pathway levels, can lead to the identification of metabolic pathways that are essential for cancer cell proliferation.
Biomolecules 2020, 10, 701 9 of 17 which catalyzes the rate-limiting step in the synthesis of creatine [47], is downregulated in both cancer and non-cancer cell lines (Figure 3b). (iii) More cancer-specific genes, including mannosidase, endoalpha-like (MANEAL) that plays a role in the N-glycan maturation [48], are overexpressed only in cancer cell lines. (iv) Glutathione S-transferase alpha 4 (GSTA4), which is one of the 16 human cytosolic GSTs [49], is downregulated only in cancer cell lines (Figure 3b). Together, we conclude that the PMS is mainly composed of proliferative genes, which are not specific to cancer. However, some PMS genes demonstrated a tumor-dependent expression.

The PMS Gene Set is Enriched in Rate-Limiting Enzymes
The PMS is composed of genes encoding enzymes belonging to diverse metabolic pathways (Table S6). However, in all of these pathways, only a fraction of these members belongs to the PMS gene set. Thus, we speculated that one of the mechanisms by which cancer cells regulate proliferation is through the activity of the rate-limiting enzymes (RLEs), rather than the upregulation of the entire pathway. These RLEs are characterized by their relatively low rate of catalysis, hence, regulating the metabolic flux [50]. Therefore, we assume that tight control of RLE expression levels can influence the entire metabolic pathway. In humans, there are 149 RLEs [50], but not all of them are found suitable for our analysis, as nine enzymes are demarcated as "non-metabolic" based on our definition, eight enzymes were duplicates, and two (Gamma-Glutamyltransferase 1 (GGT1), and UDP Glucuronosyltransferase Family 1 Member A1 (UGT1A1)) did not pass the array quality control of the MERAV database [18]. We found that the PMS set is enriched in genes that encode to RLEs (41 out of 158) (Figure 3c and Table S7), demonstrating that the proliferation machinery is regulated by the upregulation of selective enzymes in the pathway and not by all the members. These results suggest that focusing on gene-based analysis, specifically on RLEs, rather than on the pathway levels, can lead to the identification of metabolic pathways that are essential for cancer cell proliferation.  -N), was devised to classify a gene's expression pattern. Genes exhibiting a proliferation-driven expression signature (distance > 0) were expressed at similar levels in the NC and C groups, whereas cancer-specific genes (distance < 0) were expressed at similar levels in the NC and N groups. The enrichment of genes exhibiting a proliferation-driven signature was quantified using a Fisher's exact test. (b) Violin plots of selected genes with proliferation-driven or cancer-specific expression patterns. Examples of upregulated and downregulated genes in the two gene classes. The boxplot in the violin plot demonstrates the mean value with one SD. (c) PMS gene set is enriched in rate-limiting enzymes. A pie chart representing the proportion of rate-limiting enzymes in the PMS signature versus all other metabolic genes is presented. The p-value was determined by Student's t-test using Prism. Using the mean expression value of each gene in normal (N) tissues, non-cancer (NC), and cancer (C) cells lines, a distance score, defined as the absolute value of (NC-C)-(NC-N), was devised to classify a gene's expression pattern. Genes exhibiting a proliferation-driven expression signature (distance > 0) were expressed at similar levels in the NC and C groups, whereas cancer-specific genes (distance < 0) were expressed at similar levels in the NC and N groups. The enrichment of genes exhibiting a proliferation-driven signature was quantified using a Fisher's exact test. (b) Violin plots of selected genes with proliferation-driven or cancer-specific expression patterns. Examples of upregulated and downregulated genes in the two gene classes. The boxplot in the violin plot demonstrates the mean value with one SD. (c) PMS gene set is enriched in rate-limiting enzymes. A pie chart representing the proportion of rate-limiting enzymes in the PMS signature versus all other metabolic genes is presented. The p-value was determined by Student's t-test using Prism.

The Upregulated PMS Gene Set Is Enriched in Essential Genes
The PMS is composed of metabolic genes that are either upregulated or downregulated in all cancer cell lines. The upregulated gene set is expressed in all cancer cells, suggesting that it can play a central role in tumor survival. Thus, we determined the clinical and biological meaning of the PMS signature by systematically determining their role in patient outcome and on cell viability. We analyzed the Kaplan-Meier Plotter tool (http://kmplot.com/analysis/) [22] for the effect of both the PMS upregulation and the downregulation gene set on patient outcome (for more see Section 2). We found that the PMS-upregulation dataset is significantly enriched with genes and that their overexpression results in poor patient outcome in liver, breast, and lung cancer relative to the PMS-downregulated set (Figure 4a and Figure S3a). Moreover, the mean of the PMS-downregulation dataset KM is below the hazard ratio (HR) of one, indicating the high expression of most of these genes will improve patient outcome, further supporting their role as inhibitors of cell proliferation. Together, we determined a direct correlation between the PMS expression and patient outcome, indicating the role of this gene set in cancer.
To systemically determine the essential genes in the PMS set, we utilized the DepMap portal (https://depmap.org/portal/) [51,52], as this tool determined gene essentiality in 689 cell lines. First, we differentiated between the upregulated (58,982 guides) and the downregulated (45,277 guides) dependency scores (Figure 4b). This analysis resulted in a significantly lower dependency score of the upregulated gene set (mean = −0.2159) in comparison to the downregulated set (mean = 0.02436). In contrast, knockout of the downregulated gene set resulted in a significant growth advantage (Figure 4b). Notably, the DepMap portal defines that a dependency score of <−1 indicates an essential gene. We combined the dependency scores of all the guides in all cell lines and gene sets and then compared the number of essential hits. This analysis resulted in a distinct difference between the two gene sets as the upregulated set includes 5394 guides that have a <−1 score, and the downregulated set had only 2 (Figure 4c).
We then tested the dependency score based on genes to identify specific PMS genes that are essential for cell proliferation. We ranked all of the PMS genes and found that the upregulated gene-set contains eight genes that are essential genes (their mean distribution is <−1), which is in comparison to the downregulated genes that were all non-essential (Figure 4d). One of the essential PMS genes is the pentose phosphate pathway (PPP) enzyme ribulose-5-phosphate-3-epimerase (RPE) [53]. The RPE enzyme belongs to the nonoxidative arm of the PPP, and its expression is regulated by KRAS in pancreatic cancers (PDAC) [54]. We found RPE overexpression in many cancer types, including breast, CNS, hematopoietic, liver, lung, pancreatic, and prostate ( Figure 4e). We then validated this bioinformatic analysis by (qPCR), which confirmed selective overexpression of RPE in both lung and liver cell lines (Figure 4f). RPE overexpression and its low dependency score predict that its expression affects tumor aggressiveness. Indeed, the high expression of RPE in liver and breast cancer samples correlates with the poor patient outcome as determined by the Kaplan-Meier Plotter tool (http://kmplot.com/analysis/) [22] (Figure 4g). Moreover, in these PDAC cancers, RPE expression is essentially associated with overall survival [55]. Altogether, we conclude that only the upregulated gene set contains essential genes. This result further demonstrates that proliferating cells become addicted to several of the upregulating genes, a characteristic that can be exploited for future identification of anticancer drug targets.  The dependency score distribution of PMS-upregulated genes is lower than the PMS-downregulated genes. The data was collected from the Dependency Map (DepMap) portal (https://depmap.org/portal/). The PMS distribution was compared to non-PMS genes (Other Metabolic genes). The p-value was determined by Student's t-test using Prism. The dashed line in the volcano plot indicates dependency score = 0. (c) The PMS-Up-Genes are enriched in guides with dependency score less than −1. The DepMap portal defines an essential gene as one whose median dependency score is less than 1. The pie chart represents the number of guides for each group. Red slice is number of guides below −1. (d) Comparing the dependency score distribution of selected genes. The PMS upregulated and downregulated genes were ranked based on their average dependency score. The dependency score distribution of top rank of eight upregulated and downregulated genes is presented as violin plots. The dashed line in the volcano plot indicates dependency score = −1 (e) RPE expression is elevated in cancer samples relative to normal tissues in multiple cancers derived from different tissues. The distribution of RPE expression in different tissues is presented as violin plots. In each tissue RPE expression was compared between normal tissues (blue), primary tumors (light green), and cancer cell lines (orange). The expression data was obtained from the MERAV webtool. (f) Gene expression analysis validation of RPE in selected tissues and cancer cell lines. RNA was converted to cDNA and subjected to qPCR. The expression level of all tissues and cancer cell lines is relative to the normal lung or liver. Each value represents the mean ± SD for n = 3. (g) RPE expression is associated with poor patient prognosis. Liver and breast patients were divided into two groups ("high" and "low") according to RPE expression, and their survival rate is presented as Kaplan-Meier survival plots. These plots were generated in the Kaplan-Meier plotter website.

Mapping the PMS Gene in Selected Metabolic Pathways
We generated a map locating the PMS genes in the glycolysis, PPP, one carbon, and nucleotide biosynthesis metabolic pathways. We annotated the genes to indicate the correlation between cancer/proliferation-dependent expression and essentiality ( Figure 5). We used the Kyoto Encyclopedia of Genes and Genome (KEGG [56,57]) as well as the HumanCyc (http://humancyc.org, [58]) as references to generate this map. The upregulated gene set includes multiple genes encoding for enzymes belonging to the glycolysis, PPP, one carbon, and nucleotide biosynthesis pathways ( Figure 5). However, the downregulated sets comprise genes such as phosphoenolpyruvate carboxylase (PCK1). This enzyme belongs to the gluconeogenesis enzyme [59], a pathway which functions opposite to glycolysis. The most abundant metabolic pathway identified in the PMS is nucleotide biosynthesis, composed of 34 genes (Table S6). This gene set is subdivided between the purine and pyrimidine synthesis and to enzymes that catalyze both types of nucleic acid metabolism, such as ribonucleotide reductase isozymes M1 and M2 (RRM1 and RRM2, respectively).

Discussion
Major metabolic rewiring accompanies the transformation from resting normal tissues to proliferating cancer cells. Here, we present a systematic analysis of the alteration in metabolic gene expression that is associated with cancer. Specifically, we identified PMS, a set of metabolic genes Both purines and pyrimidines can be synthesized by the de novo pathways that generate nucleotides from building blocks, which mainly originate from glucose (de novo pathway); or through utilizing nucleotides that might be present in the environment (salvage pathway). As indicated in Table S6, members of the de novo and salvage pathways generating both purines and pyrimidines are present in the PMS gene set. A more detailed analysis of the nucleotide biosynthesis identified the first enzyme in the de novo pathway amidophosphoribosyltransferase (PPAT) as the RLE. This enzyme converts 5-phospho-α-D-ribose 1-diphosphate (PRPP) into 5-phospho-β-D-ribosyl-amine (PRA) through the use of glutamine as an amine donor ( Figure 5). This pathway ends with the generation of inosine monophosphate (IMP) that is the precursor for adenosine monophosphate (AMP) and guanosine monophosphate (GMP) synthesis. Interestingly, the majority of enzymes in this pathway are upregulated in cancer, reflecting its importance in the proliferation machinery.

Discussion
Major metabolic rewiring accompanies the transformation from resting normal tissues to proliferating cancer cells. Here, we present a systematic analysis of the alteration in metabolic gene expression that is associated with cancer. Specifically, we identified PMS, a set of metabolic genes that are upregulated in all proliferative samples, which is enriched in genes encoding for rate-limiting enzymes. Moreover, we identified that the PMS upregulated gene set has a significant abundance of essential genes. This analysis can serve as a platform for the identification of metabolic genes essential for tumor viability, and therefore can function as a potential tool for the identification of drug targets and cancer biomarkers.
Several studies compared the gene expression profile between normal tissue and cancer cells [17,20,60,61]. These studies mainly emphasize the cancer-dependent changes that are induced in the metabolic pathways, whereas our current study focuses more on the gene levels. Here, we show that the PMS is enriched in RLE, suggesting that modifying the expression of central enzymes in a given metabolic pathway is sufficient to activate it. Thus, if most members in a given metabolic pathway do not demonstrate any cancer-dependent expression changes, but the rate-limiting enzymatic genes do, this can still result in alteration in the metabolic flux. Therefore, while focusing the analysis on metabolic pathways is important, it may wrongly rule out key regulatory elements.
It is well established that the nucleotide biosynthesis pathways are essential for cancer, as many of its members serve as targets of anticancer drugs [11]. Among these targets are the PMS genes RRM1 and the pyrimidine de novo synthesis rate-limiting enzyme thymidylate synthetase (TYMS). Both are well-known drug targets, with multiple molecules aiming to inhibit their function. The existence of a substantial number of genes encoding RLEs and drug targets in the PMS gene set indicates that the analysis can detect established cancer-related metabolic pathways. Therefore, we predict that further studies of the PMS genes would lead to the identification of potential targets for anticancer therapeutics.
Previously, we identified a set of metabolic genes that are regulated by the EMT program [19] and designated as "mesenchymal metabolic signature" (MMS). There are striking differences between the PMS and the MMS gene sets. As opposed to the PMS gene set that is enriched with rate-limiting step enzymes, the MMS has only two. In addition, the PMS is over-represented with genes encoding the enzymes of the nucleotide biosynthesis pathway (24% of the genes in the signature), whereas the MMS encodes only three of these enzymatic genes (5% of the genes in the signature). However, glycan biosynthesis, which was under-represented in the PMS (10% of the genes in the signature), is highly present in the MMS (36% of the genes in the signature). Thus, the universal signature is composed of genes that are important for the generation of building blocks that serve the proliferation machinery. The mesenchymal signature is mainly composed of genes that function in the generation of the extracellular matrix.

Conclusions
This current study identifies the differential gene expression pattern of normal tissue vs. cancer samples. Specifically, a set of 158 genes that encode for enzymes in multiple metabolic pathways were identified to be differentially regulated in all proliferative cells. This set designated as proliferation metabolic signature (PMS), is enriched in rate-limiting enzymes, and in essential genes, suggesting it can function as a platform for identification of unknown cell vulnerabilities that can be exploited for the development of novel anticancer drugs.
Supplementary Materials: The following are available online at http://www.mdpi.com/2218-273X/10/5/701/s1, Figure S1: In normal tissues, most metabolic genes demonstrate a selective expression profile, Figure S2: Determining the PMS gene set, Figure S3: The PMS-up set is enriched with essential genes, Table S1: Each normal tissue is represented by a different number of arrays, Table S2: Summarizing the expression pattern of each metabolic genes, Table S3: Distribution of tissue-selective genes in each tissue, Table S4: Elevated metabolic pathways within each normal tissue, Table S5: Distance score calculation, Table S6: Defining the PMS gene set based on their metabolic pathways, Table S7: Defining the PMS genes that encode for rate-limiting enzymes, Table  S8: A list of the used qPCR primers.