Identifying functional metabolic shifts in heart failure with the integration of omics data and a heart-speciﬁc, genome-scale model

In diseased states, the heart can shift to use different carbon substrates, measured through changes in uptake of metabolites by imaging methods or blood metabolomics. However, it is not known whether these measured changes are a result of transcriptional changes or external factors. Here, we explore transcriptional changes in late-stage heart failure using publicly available data integrated with a model of heart metabolism. First, we present a heart-speciﬁc genome-scale metabolic network reconstruction (GENRE), iCardio . Next, we demonstrate the utility of iCardio in interpreting heart failure gene expression data by identifying tasks inferred from differential expression (TIDEs), which represent metabolic functions associated with changes in gene expression. We identify decreased gene expression for nitric oxide (NO) and N -acetylneuraminic acid (Neu5Ac) synthesis as common metabolic markers of heart failure. The methods presented here for constructing a tissue-speciﬁc model and identifying TIDEs can be extended to multiple tissues and diseases of interest.


Correspondence papin@virginia.edu
In brief Dougherty et al. build a heart-specific metabolic model by using tissue-specific data and broad metabolic functions in the heart. Using that model, they integrate publicly available transcriptomics data for human heart failure to identify conserved transcriptional changes in the synthesis of nitric oxide and N-acetylneuraminic acid across studies.

INTRODUCTION
The heart maintains its functions by using oxygen and various nutrients through metabolic pathways to synthesize contractile proteins, generate necessary lipid species, and produce ATP as a fuel for muscle contraction. Metabolic changes, such as changes in substrate utilization, have been measured in many diseased states, such as left ventricular hypertrophy (Kundu et al., 2015) and cardiotoxicity (Bauckneht et al., 2017;Borde et al., 2012). In some cases, changes in substrate utilization occur before functional and/or structural changes to the heart, suggesting that metabolism has a key role in the downstream development of disease or could be a target to prevent disease (Kundu et al., 2015;Li et al., 2019). However, it is not clear whether measured changes in substrate utilization are driven by changes in the transcriptome or other factors. Further, consistent transcriptional changes in metabolic pathways could point toward potential contributors to, or biomarkers of, heart failure. Therefore, there is a need for a comprehensive, descriptive model of the metabolic function of the heart to interrogate the relationships among substrates, gene expression, and metabolic functions.
A common tool to interrogate relationships among metabolic substrates, gene expression, and metabolic functions is a genome-scale metabolic network reconstruction (GENRE). GENREs are mathematical representations of metabolism, which use the enzymes encoded in an organism's genome to define the biochemical metabolic reactions and associated metabolites that comprise that organism's metabolism. Each metabolic reaction is associated with a gene-protein-reaction (GPR) rule relating genes to the proteins they encode and proteins to the reactions they catalyze. Human GENREs account for the function of the biochemical reactions that humans catalyze according to annotation of the human genome. Further, the complex GPR rules associated with each reaction allow for the systematic integration of different types of data, such as transcriptomics or proteomics data.
However, not all genes are expressed in every tissue, necessitating the construction of tissue-specific models of metabolism. GENREs of human metabolism (Blais et al., 2017;Brunk et al., 2018;Duarte et al., 2007;Ma et al., 2007;Mardinoglu et al., 2013;Swainston et al., 2016;Thiele et al., 2013) have been used to generate tissue-, disease-, or cell-specific models for various analyses, such as predicting drug targets (Agren et al., Folger et al., 2011;Rawls et al., 2020), identifying biomarkers of disease (Zhang et al., 2013;Zhao and Huang, 2011), and understanding drug toxicity or side effects (Blais et al., 2017;Shaked et al., 2016;Zielinski et al., 2015). Tissuespecific models are built by integrating tissue-specific omics data, usually transcriptomic or proteomic data, with a human GENRE using various integration algorithms (some of which are summarized in Blazier and Papin (2012) (Robaina Esté vez and Nikoloski, 2014) to obtain a tissue-or cell-type-specific model. To date, there are two existing heart models (Karlstä dt et al., 2012;Zhao and Huang, 2011), both of which were built from Recon1, the first human GENRE (Duarte et al., 2007). These models were used to examine the relationship between substrate use and the efficiency of the heart (Karlstä dt et al., 2012) and to predict epistatic interactions in the heart and biomarkers of heart disease (Zhao and Huang, 2011). However, human models have been expanded since Recon1 to more comprehensively describe human metabolism. A recently published human GENRE, iHsa (Blais et al., 2017), is more comprehensive than Recon1 (8,336 reactions versus 3,311 reactions) and, therefore, offers the potential to generate a more comprehensive heartspecific model. An expanded, heart-specific model can be used as a tool to interpret large datasets, such as transcriptomic data, to provide functional insight into how metabolism might change in a diseased state.
Here, we present a validated, heart-specific metabolic model, iCardio, which was built using iHsa (Blais et al., 2017) with tissuespecific protein data from the Human Protein Atlas (HPA) (version 18;https://www.proteinstlas.org;Uhlé n et al., 2015). The draft model was curated using metabolic tasks, which are mathematical descriptions of metabolic functions that a model should be able to perform. The metabolic tasks represent both previously published tasks (Blais et al., 2017) and curated, heart-relevant tasks. We validated the model qualitatively by the number of reactions covered by the metabolic tasks and quantitatively by ATP predictions for common carbon sources.
Finally, we demonstrate the utility of metabolic models in identifying functional changes in metabolism using an approach that identifies metabolic tasks associated with significant changes in gene expression, called tasks inferred from differential expression (TIDEs). The TIDEs approach represents a reaction-centric view in contrast to standard gene-centric views, which are taken with approaches such as standard gene enrichment or gene set enrichment analysis (GSEA). This reaction-centric view accounts for both the stoichiometric balance of reactions necessary to achieve metabolic functions as well as the complex relationships among genes, proteins, and the reactions they catalyze. Therefore, the TIDEs approach offers biological insight into metabolic functions affected in a disease state that are not readily apparent from gene expression data or GSEA alone. Here, we demonstrate the utility of the TIDEs approach using heart failure as a case study because of the critical role that metabolism has in the progression and diagnosis of disease. In contrast to previous studies which measure changes in the uptake of metabolic substrates in the heart in diseased states (Bauckneht et al., 2017;Borde et al., 2012;Kundu et al., 2015;Murashige et al., 2020;Taylor et al., 2001), here, we identify metabolic functions that are associated with significant changes in transcription across previously published transcriptomics heart-failure datasets. Using publicly available data for ischemic and idiopathic heart failure, we demonstrate that both gene expression and metabolic functions cluster based on studies rather than origin of heart failure. Across the different studies and types of heart failure, we identified two common metabolic functions associated with changes in gene expression, the synthesis of nitric oxide (NO) and the synthesis of N-acetylneuraminic acid (Neu5Ac), a sialic acid. Further, we demonstrate that both NO and Neu5Ac synthesis are consistently associated with decreased gene expression across other independent, previously published datasets for general heart failure, including both microarray and RNA sequencing (RNA-seq). Together, this suggests that both NO and Neu5Ac synthesis are robust metabolic functions associated with changes in gene expression in heart failure across studies and can serve as either biomarkers of heart failure or potential targets for the treatment of heart failure.

RESULTS
Building and validating iCardio using metabolic tasks Before generating a heart-specific GENRE, we turned to the literature to identify metabolic functions that are known to be important for heart metabolism. These identified metabolic functions were formulated as metabolic tasks and added to the previously published metabolic task list (Blais et al., 2017), which, together, served as a benchmark for model function. Although multiple algorithms exist for integrating tissue-specific data with metabolic models, we selected the cost-optimization reaction dependency assessment (CORDA) algorithm because it generated the bestperforming draft model for heart-specific task accuracy (method details).
The CORDA algorithm was used to build the draft iCardio model from iHsa by integrating tissue-specific protein data from the HPA (method details); CORDA removed 4,203 reactions from iHsa to build a draft iCardio model that had a heart-specific task accuracy of 89% ( Figures 1B and 1C; Data S2). The draft model was then curated using metabolic tasks to obtain the final iCardio model (Figure 1). To achieve 100% task accuracy, 79 reactions were added to, and 12 removed from, the draft iCardio model ( Figure 1A), based on literature and manual curation (Data S2). The 79 reactions that were added to iCardio are reactions that were removed from iHsa to build the iCardio draft model and were added back to the iCardio model to ensure functionality. We can map the protein data from HPA using the associated GPR rules. Of the 79 reactions added back to iCardio, 73% (58) are associated with either high (1), medium (8), or low (16) protein evidence, no GPR rule (28), or no data (5); the remaining 21 reactions were associated with no protein evidence. As an example, the synthesis of NO was a metabolic task that originally failed but was curated to pass in the final iCardio model. For that task to pass, three reactions were added back to the model, including two reactions that were associated with proteins that were not detected and one that had no GPR rule (Data S2). All three isoforms of NO synthase (NOS), which catalyze the two reactions with GPR rules, were not detected in version 18 of the HPA tissue data, but two of the NOS isoforms, NOS1 and NOS2, are associated with low protein evidence in version 19, whereas NOS3 has significantly higher transcript counts in heart muscle in the RNA consensus tissue HPA data (Uhlé n et al., 2015).
Of note, after curation, the final iCardio model contains approximately 50% of the reactions that are present in the iHsa network reconstruction. Using the complex GPR rules that are associated with each reaction, we can also identify the number of genes that are associated with a reaction in iHsa but are not associated with a reaction in the iCardio model; only 583 genes are associated with a reaction in iHsa but not included in iCardio, highlighting both the promiscuity of enzymes and the complex relationship between genes and metabolic functions ( Figure 1C).
Because metabolic tasks have been used as a metric for building iCardio, the number of reactions covered by this set of metabolic tasks provides a qualitative validation of the model. Similar to what has been done with other models , parsimonious flux balance analysis (pFBA) was used to determine the reactions used in iCardio for each passing metabolic task. The 216 previously published heart-relevant tasks (Blais et al., 2017) covered 38% (1,593/4,200) of reactions in iCardio and the 93 passing heart-relevant tasks covered 21% (874/ 4,200) of reactions in iCardio ( Figure 2A). There is overlap between these two sets of reactions; in total, the two sets of tasks covered 41% (1,714/4,200) of reactions in iCardio. It is important to note that, although the tasks may cover the same reactions, they cover different combinations of reactions and pathways for each task. The two sets of tasks together may cover 1,700 reactions, but in total, more than 20,000 reactions are used in different combinations to complete the metabolic tasks, indicating that a number of reactions are repeated among tasks. This result is to be expected, especially for reactions that involve central carbon metabolism and ATP production, such as ATP synthase. The maximum number of reactions covered by a given task was 788 reactions (1 task), the metabolic task for the de novo synthesis of lipids from glucose and essential fatty acids, and the minimum number of reactions covered by a task was one reaction (24 tasks), metabolic tasks that describe transport reactions. Overall, the reaction coverage of tasks demonstrates a qualitative validation of iCardio.
To provide a quantitative validation of iCardio, ATP yields were predicted for a number of carbon substrates and amino acids and compared to another metabolic model, MitoCore (Smith et al., 2017). The MitoCore model was chosen for comparison because of its focus on mitochondrial metabolism. For almost all carbon sources (other than methionine), iCardio predicted ATP yields within 10% of the values calculated with MitoCore (Smith et al., 2017) ( Figure 2B). For methionine, the ATP prediction from iCardio matches the methionine prediction from Recon 2.2 (Swainston et al., 2016). It is important to note the difference in scope between the two models: MitoCore contains 342 reactions, whereas iCardio contains 4,200 reactions and still maintains accurate ATP yields. This result highlights that, even with the increased size of iCardio, there are not infeasible energygenerating loops, which would artificially inflate ATP yield predictions and influence the reactions necessary for different metabolic tasks. Together, the qualitative and quantitative validations of iCardio demonstrate the ability of the model to accurately and more comprehensively represent heart metabolism.
Identifying TIDEs using iCardio for heart-failure geneexpression data Metabolic models provide an alternative approach for interpreting changes in gene expression to yield insight into metabolic shifts that may be contributing to a diseased state. Here, we use iCardio to identify metabolic tasks that were significantly associated with differentially expressed genes (DEGs), called TIDEs, which represent metabolic functions that are associated  (Blais et al., 2017) with additional metabolic tasks that were curated from the literature based on heart metabolism. These tasks were used to test whether the model could perform functions known to be present in the heart and remove metabolic functions not present in the heart. (C) The CORDA algorithm removed 4,203 reactions from iHsa to produce a draft iCardio model that had a cardiomyocyte-specific task accuracy of 89%. Further curation, based on the previously defined list of metabolic tasks, resulted in a final model with 100% heart-specific task accuracy.
Cell Reports 34, 108836, March 9, 2021 3 Article ll OPEN ACCESS with significant shifts in gene expression in an experimental versus control state ( Figure 3). Heart failure is a complex disease, both in etiology and presentation, but heart failure is associated with a shift in metabolism (Wende et al., 2017). Several different metabolic shifts have been measured in heart failure, such as decreased fatty acid utilization (Lopaschuk, 2017;Wende et al., 2017), increased ketone body utilization (Janardhan et al., 2011), and decreased utilization of branched-chain amino acids (Sun et al., 2016). However, there has been limited systematic analysis of the transcriptional metabolic changes that could be influencing these measured changes in metabolic uptake in heart failure. iCardio provides an opportunity to contextualize gene expression data from patients with end-stage heart failure to identify transcriptional changes in metabolic functions, such as those listed above, and to identify metabolic functions associated with changes in transcription across studies.
We identified DEGs from microarray data for samples collected from non-infarcted regions of the left ventricle for patients undergoing heart transplants for either ischemic or idiopathic heart failure compared with samples from healthy hearts. We integrated theese DEGs and their associated logfold changes with iCardio to determine TIDEs, representing metabolic functions associated with a significant change in gene expression. For example, when compared with healthy hearts, the ischemic hearts from GEO: GSE5406 (Hannenhalli et al., 2006) had 2,678 DEGs; 392 of these DEGs were represented in iCardio (Table S3). After integrating these DEGs using iCardio and the TIDEs pipeline, 85 of the 307 metabolic tasks were designated as TIDEs (Figure 4; Data S3), representing metabolic functions associated with significant changes in gene expression for this ischemic heart-failure dataset. In the randomized data used to determine the statistical significance for that dataset, only 13 of the 1,000 random iterations had more than 85 tasks identified as significantly changed, indicating that the identified TIDEs cannot be attributed to random changes in gene expression, but rather, to distinct and coordinated shifts in gene expression resulting in changes in specific metabolic functions. Across the 307 metabolic tasks, there are varied distributions in the underlying randomized task scores (Figure 4), representing the complex relationships between gene expression and metabolic function that are captured with iCardio and the TIDEs reaction-centric approach.
Some general trends appear from the TIDEs identified from this ischemic heart-failure dataset ( Figure 4). First, metabolic tasks related to fatty acid synthesis (39 tasks) were decreased. Within tasks related to fatty acid synthesis, significant decreases were observed for synthesis of saturated fatty acids but not for synthesis of unsaturated fatty acids. Metabolic tasks related to lipid synthesis were decreased (6 tasks). Finally, metabolic tasks related to signaling metabolism, the synthesis of NO and Neu5Ac, were associated with decreased gene expression. Taken together, these TIDEs support a potential role for decreased gene expression for fatty acid synthesis and lipid synthesis as well as decreased synthesis of NO and Neu5Ac for signaling in heart failure. However, other, previously reported, metabolic signatures of heart failure, such as increased ketone body degradation (Janardhan et al., 2011) and decreased breakdown of branched-chain amino acids (Sun et al., 2016), were not associated with significant changes in transcription for this dataset.
To determine common shifted metabolic functions across datasets and types of heart failure (ischemic versus idiopathic), we expanded the TIDE analysis to two additional studies (Table S3), which included samples for idiopathic and ischemic heart failure. Although we examined more than these three datasets, these were the only datasets that had publicly available data and at least 50 DEGs (Table S4). First, we see clustering of both the DEGs ( Figure S1A) and the TIDEs ( Figure S1B; Data S4) within each dataset, rather than across the types of heart failure, highlighting the complexity and heterogeneity of the disease. Although we see no common TIDE appears across datasets (Figure S2A), there are some TIDEs that appear in at least four of the six datasets ( Figure 5A).
First, de novo synthesis of Neu5Ac was associated with decreased gene expression in five of the six datasets (excluding GEO: GSE1869 ischemic). Second, both the synthesis of NO and the synthesis of NAD from tryptophan were associated with decreased gene expression in four of the six datasets (excluding ischemic and idiopathic GEO: GSE1869). Decreased synthesis of NO, NAD, and Neu5Ac represent potential biomarkers of heart Figure 2. Validating iCardio descriptively using the number of reactions covered by metabolic tasks and quantitatively using ATP yields for common carbon substrates (A) A descriptive validation of iCardio using the number of reactions covered using the different metabolic tasks. Together, the tasks account for almost half of the reactions in iCardio. The remaining reactions represent areas for future improvement of metabolic tasks. (B) A qualitative validation of iCardio using ATP yields for a variety of common carbon sources. The ATP yields for iCardio (y axis) are compared with another recently published, but smaller, metabolic model, MitoCore (Smith et al., 2017), which contains 324 reactions. The agreement between the models demonstrates the lack of energy generating cycles and infinite loops in iCardio.
4 Cell Reports 34, 108836, March 9, 2021 Article ll OPEN ACCESS failure that are shared across types of heart failure (ischemic versus idiopathic) and datasets. To confirm the role of these metabolic functions in heart failure, four additional datasets (Greco et al., 2012;Ren et al., 2020;Schiano et al., 2017) (Table  S5) for general heart failure were analyzed using the TIDEs pipeline ( Figure S3; Data S5). Both the synthesis of NO and Neu5Ac are significantly associated with decreased gene expression changes in three of the four heart failure datasets, whereas the synthesis of NAD from tryptophan is significantly associated with decreased gene expression in two of the four datasets.
As mentioned previously, several metabolic shifts are associated with heart failure: decreased fatty acid utilization (Lopaschuk, 2017;Murashige et al., 2020;Wende et al., 2017), increased ketone body utilization (Janardhan et al., 2011;Murashige et al., 2020), and decreased utilization of branched-chain amino acids (Sun et al., 2016). However, it is unclear whether these changes are driven by external factors (i.e., changes in hormones or circulating levels of metabolites) or transcriptional changes. Here, we see decreased synthesis of fatty acids in four of the six datasets ( Figure 5A), suggesting downregulation of fatty acid synthesis, rather than consistent increased gene expression for fatty acid oxidation ( Figure 5B). For ketone body metabolism, branched-chain amino acid metabolism, and fatty acid oxidation, we see no consistent changes in gene expression for these metabolic functions across the datasets. Finally, for glucose metabolism, we see decreased gene expression for the breakdown of glucose in the presence of oxygen for three of six datasets, suggesting decreased use of glucose oxidation ( Figure 5B). The lack of transcriptional changes for ketone body metabolism, branched-chain amino acid metabolism, and fatty acid metabolism are confirmed in the general heart-failure dataset ( Figure S3; Data S5). Additionally, decreased gene expression for the metabolic function of the breakdown of glucose in the presence of oxygen was seen in three of the four heart-failure datasets, confirming the role of decreased gene expression for glucose oxidation in heart failure ( Figure S3B).

Comparison of TIDEs with GSEA by KEGG pathway
Next, we wanted to compare the TIDEs analysis to a common gene-centric approach, GSEA, using the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways to define the gene sets. Although KEGG includes pathways other than metabolism ( Figure S4), we have chosen to focus on a comparison of metabolic pathways. Using GSEA with the data from the studies cited above, the most commonly changed metabolic pathways across the datasets were increased beta alanine metabolism in all datasets, increased propanoate metabolism in five of the six datasets, decreased selenoamino acid metabolism in four of the six datasets and increased valine, isoleucine, and leucine degradation in three of the six datasets (Figure S2B). For the commonly cited changes in uptake of metabolites in heart failure, the GSEA analysis shows no change in fatty acid metabolism, one dataset with a decrease in genes associated with glycolysis, two studies that show increased oxidative phosphorylation, and three datasets with increased valine, isoleucine, and leucine metabolism. For comparison with the metabolic tasks, the breakdown of propanoate and synthesis of beta-alanine were included as metabolic tasks but showed no change across datasets. Selenoamino acids are included in the model but are not included as a metabolic task. Valine, isoleucine, and leucine metabolism cover six metabolic tasks in the model covering uptake and breakdown of each amino acid, and only one task, breakdown of valine, was significantly changed in one dataset.
The difference in identified changes in metabolism highlights the differences between a reaction-centric approach, such as TIDEs, versus a gene-centric approach. GSEA KEGG metabolic pathways can cover more than one metabolic function, such as fatty acid metabolism, whereas the TIDEs approach allows for a distinction between the synthesis and degradation of different fatty acids. We can use iCardio for a similar general metabolic approach by using the metabolic subsystems assigned to each reaction to identify sets of reactions. Using TIDEs, we can determine whether the reactions in a given subsystem are Metabolic tasks quantitatively describe metabolic functions that a tissue or organism is known to catalyze. iCardio was used to identify the reactions that are used to perform each of the previously defined metabolic tasks. Next, each reaction is assigned a reaction weight based on gene expression fold-changes. A task score is calculated based on the weights for each reaction and represents an average gene expression value over reactions used in that metabolic task. To determine the statistical significance of this task score, the original gene expression data is shuffled over all genes with data, and task scores are recalculated to give a distribution of task scores. A p value is assigned based on where the original task score falls in the distribution of randomized data.
Cell Reports 34, 108836, March 9, 2021 5 Article ll OPEN ACCESS associated with increased or decreased gene expression (Figure S5). Although this approach also reveals some interesting trends, such as a decrease in expression for reactions in the arachidonic acid metabolic subsystem, it fails to highlight changes in fatty acid metabolism or other pathways, such as NO synthesis or Neu5Ac synthesis.
Second, a GSEA approach includes every gene that can be present in the pathway, even if the gene function is redundant, such as with isozymes. The TIDEs approach allows for the fold-change of one gene to determine the weight of each reaction. For example, one of the metabolic tasks related to synthesis of NO from arginine ( Figure 6), a subset of the reactions necessary for that metabolic task is displayed in Figure 6. Two of those reactions can be catalyzed by one of three enzymes, NOS1, NOS2, or NOS3, which are known to have tissue-specific expression (Mattila and Thomas, 2014). The TIDEs reactioncentric approach uses the log-fold-change of one gene to determine the reaction weight, allowing a different gene to determine a reaction's weight among studies and across tissues. In this study, the reaction weight for the transporter responsible for the import of arginine to, and the export of citrulline from, the cytosol was determined by different genes among the idiopathic heart-failure data in GEO: GSE5406 and GSE57345 ( Figure 6C). However, the metabolic task of NO production from arginine was statistically associated with decreased gene expression in both datasets ( Figure 5A). A similar GSEA analysis would have included all five genes rather than the two to three used in the TIDEs analysis.

DISCUSSION
Here, we present a validated model of heart metabolism, iCardio, with an approach for analyzing changes in metabolic functions based on gene expression data. We present an approach for building tissue-specific models using tissue-specific protein data from the HPA integrated with a general human reconstruction, iHsa, followed by manual curation with metabolic tasks to ensure general metabolic functionality. Using a task-driven approach for model curation demonstrated that the CORDA algorithm (Schultz and Qutub, 2016) produced a draft tissue-specific model that was more accurate with respect to performance of heart-relevant metabolic tasks than other integration methods (Table S2). It is interesting to note that the CORDA algorithm produced one of the smaller draft models and still maintained the highest task accuracy. As noted, although the CORDA algorithm produced one of the smaller draft models and resulted in the final iCardio model that contained 50% of the reactions in iHsa, approximately 75% of the genes in iHsa are still represented in the iCardio model. This result highlights the ability of the complex GPR rules to capture tissue-specific function. For example, the conversion of arginine to NO is an established mechanism for intracellular and extracellular signaling, and the production of NO is mediated through tissue-specific expression of different NOS isoforms (Figure 6). iCardio captures the complex relationship between gene expression and function through the complex GPR rules, even though all of the isoforms were not detected in the original HPA dataset.
The metabolic tasks used for benchmarking the integration algorithms and further manual curation cover 41% of the network and also identifies areas for future development of both general and tissue-specific metabolic tasks. Here, metabolic tasks were formulated agnostic to the gene expression data that was integrated with the resulting model. Additional tasks can serve as hypotheses for changes in metabolic functions of the heart or other tissues that could then be tested using gene expression datasets. Finally, although not every reaction is covered by a metabolic task, these reactions represent metabolic functionality with either evidence of protein expression in the HPA or pathway Gene expression data for ischemic heart failure versus healthy hearts from GEO: GSE5406 was integrated with iCardio to identify shifts in metabolic functions. The red dashed lines indicate the task score for the heart-failure dataset, and the black histograms are the distribution of randomized task scores. A red line to the right indicates a metabolic function associated with increased gene expression, whereas a red line to the left indicates a metabolic function associated with decreased gene expression. The color of the background indicates whether the metabolic function had a statistically significant (p < 0.025) increased (red) or decreased (blue) task score.
6 Cell Reports 34, 108836, March 9, 2021 Article ll connectivity. These reactions and associated pathways can serve as starting points for future metabolic task curation and further support the use of metabolic network reconstructions for generating hypotheses for important, tissue-specific metabolic functions.
When identifying reactions necessary for each metabolic task, the pFBA approach assumes that the pathway for each task remains the same, independent of the data. This reaction-centric TIDEs approach for determining significantly changed metabolic functions emphasizes that (1) metabolic functions require multiple complex, stoichiometrically balanced reactions; and (2) some genes may disproportionately influence the completion of metabolic functions. For example, two metabolic tasks cover the synthesis of NO from arginine (Data S1). The first task comprehensively covers the entire pathway, providing only arginine and other ions extracellularly as inputs while requiring the production of NO and the release of only metabolites that have transport reactions from the cytosol to the extracellular space. The second task only accounts for the central reactions for NO production, providing arginine and dihydronicotinamideadenine dinucleotide phosphate (NADPH), and requires the production of NO. Although both tasks cover the synthesis of NO from arginine, the first metabolic task uses 53 reactions and covers the entire pathway, including transport of metabolites between compartments, which is necessary to maintain the stoichiometric balance of the pathway. However, because the second metabolic task covers mainly reactions in the cytosol, it requires only five reactions. Although the first task was present in the original iHsa task list, the second task was added because of the importance of the synthesis of NO in the heart and, therefore, covers the core reactions for NO synthesis. Together, these two tasks highlight the potential for both breadth and specificity in identifying shifts in metabolic functions.
Second, reactions are associated with complex GPR rules, allowing for multiple genes representing different isozymes to catalyze the same reaction. By accounting for these complex relationships in the calculation of task scores, the expression values of individual genes can affect multiple functions. For example, fatty acid synthase, which catalyzes multiple reactions, can strongly influence a final task score by determining the reaction weight for multiple reactions in a metabolic task. In contrast, previous gene-centric approaches would have evenly weighted all genes in a specific pathway. The TIDEs analysis represents a reaction-centric approach that focuses on metabolic functions that can provide insight into broad metabolic changes that may not be immediately apparent using gene-centric approaches.
The presented TIDEs pipeline offers an alternative, reactioncentric approach to interpret complex changes in gene expression data to identify non-obvious changes in metabolic functions. In the case of heart failure, multiple studies have noted changes in metabolite uptake in late-stage heart failure (Kundu et al., 2015;Paolisso et al., 1994;Taylor et al., 2001), including a recent, comprehensive metabolomics analysis (Murashige et al., 2020). However, it is unclear whether these changes are driven by changes in gene expression in the heart or other factors, such as changes in circulating levels of metabolites or Figure 5. TIDEs analysis identifies functional metabolic changes shared across different datasets, whereas no changes in transcription were seen for common metabolic shifts in heart failure A red box indicates a significant positive association (p value < 0.025), and a blue box indicates a significant negative association (p value < 0.025). Three datasets that contained samples for ischemic, idiopathic, and healthy hearts were downloaded, analyzed, and integrated with iCardio using the TIDEs approach to identify shifts in metabolic functions. (A) Here, we display a subset of those metabolic functions that were identified to be either changing consistently transcriptionally across datasets or had been previously reported to be associated with a change in uptake rates in heart failure (B).
Cell Reports 34, 108836, March 9, 2021 7 Article ll OPEN ACCESS hormones. The TIDEs pipeline did not identify consistent, significant changes in transcription for ketone body utilization, branched-chain amino acid metabolism, or fatty acid oxidation ( Figure 5). In the case of ketone body utilization, a recent metabolomics analysis suggests that changes in circulating levels of ketone bodies influence increased use in late-stage heart failure (Murashige et al., 2020). Although the role of fatty acids in latestage heart failure remains controversial, with some studies measuring increased use (Paolisso et al., 1994;Taylor et al., 2001) whereas animal models measure no change (Zhabyeyev et al., 2013) or decreased use (Allard et al., 1994;Neubauer, 2007), the TIDEs analysis demonstrated no consistent change in gene expression for fatty acid oxidation in late-stage heart failure ( Figures 5B and S3B). Together, this suggests that factors other than changes in transcription are driving the measured changes in metabolite uptake for ketone bodies, amino acids, and fatty acids in late-stage heart failure.
The TIDEs pipeline did identify three common transcriptional changes across the late-stage ischemic and idiopathic heart failure datasets: decreased gene expression for the use of glucose in the presence of oxygen, decreased gene expression for NO synthesis, and decreased gene expression for Neu5Ac synthesis. Although decreased glucose uptake has been noted in human heart failure (Paolisso et al., 1994;Taylor et al., 2001), here, we confirm the potential role for transcriptional changes as influencing changes in glucose metabolism in late-stage heart failure (Mori et al., 2012). In the ischemic and idiopathic datasets ( Figure 5) and the general heart-failure datasets ( Figure S3), we identified significant decreased gene expression for the use of glucose in the presence of oxygen. Recent work in animal models (Fernandez-Caggiano et al., 2020;McCommis et al., 2020;Zhang et al., 2020) demonstrated the role that the mito-chondrial pyruvate carrier (MTC) and pyruvate metabolism have in the development of heart failure. Of note, two of these studies analyzed the expression of MTC in failing hearts (Fernandez-Caggiano et al., 2020;McCommis et al., 2020), noting decreased protein or normalized mRNA levels. However, we see no decreased gene expression for either MCT-1 or -2 in the datasets used in this current study. This result would suggest that, although decreased MCT-1/2 expression is one potential mechanism, other potential points of regulation can influence the decreased use of glucose, which is captured in the metabolic functions presented in this study. In the case of NO synthesis, previous work has highlighted the important role of NO in cardiac function Massion et al., 2003), and more recent work has suggested a role for increasing NO synthesis for the increasing efficacy of beta-blockers (Hayashi et al., 2018).
Neu5Ac is the most common sialic acid associated with multiple functional roles in the body. Increased Neu5Ac levels in blood plasma have been measured for coronary artery disease , atrial fibrillation (Hu et al., 2020), and chronic heart failure (Rajendiran et al., 2014), and Neu5Ac is generally associated with increased risk for cardiovascular diseases (Gopaul and Crook, 2006). However, increased Neu5Ac in blood plasma has mainly been attributed to changes in glycosylation in circulating lipoproteins and has been correlated with increased circulating triglyceride levels (Israr et al., 2018). Here, we identified a significant decrease in gene expression for the production of Neu5Ac in the heart: first in the idiopathic and ischemic heart-failure datasets (Figure 5), and second, in the general heart-failure datasets ( Figure S3). Previous work in a mouse model demonstrated that inhibiting protein glycosylation through deletion of ST3Gal4, a sialyltransferase, resulted in the development of dilated cardiomyopathy (DCM) and stress-induced heart failure (B) The model uses the GPR rules associated with each of these reactions to determine the reaction weight based off gene expression data, thereby assigning the expression of one gene to govern the reaction. (C) The gene whose gene expression was used in the analysis can, therefore, differ between different datasets and still produce the same result, as seen here with both datasets still showing a statistically significant decrease the metabolic function of conversion of arginine to nitric oxide.
8 Cell Reports 34, 108836, March 9, 2021 Article ll OPEN ACCESS (Deng et al., 2016). Knockouts of other genes related to protein glycosylation have also been shown to alter ion signaling in cardiomyocytes (Ednie et al., 2019;Montpetit et al., 2009), resulting in cardiac abnormalities. Finally, a congenital genetic defect in Neu5Ac synthesis led to dilated cardiomyopathy, which was confirmed in a genetic knockout in zebrafish (Wen et al., 2018). Together with the results presented here, this observation suggests a role for Neu5Ac metabolism as a marker of, and potential contributing mechanism for, heart failure.
Although there were some common trends across datasets, no TIDEs could discriminate between ischemic and idiopathic heart failure across studies. This characteristic was also true for the KEGG GSEA results. Together, this result highlights the complexity of heart failure, both in etiology and presentation, suggesting that classifications, such as ischemic and idiopathic, may be insufficient to capture distinct metabolic changes. Further, differences in classifications between different publicly available datasets (i.e., ischemic, dilated, idiopathic, etc.; Table  S4) makes it increasingly difficult to draw conclusions across studies. In addition, the datasets cluster within each study for gene expression, the TIDEs approach, and the GSEA KEGG metabolic pathway analysis, suggesting, again, that there is a large amount of heterogeneity in changes in gene expression for heart failure. However, pathway-based approaches, such as the TIDEs approach, have been shown to better reconcile both species-specific differences (Brubaker et al., 2019;Normand et al., 2018) and platform differences (e.g., microarray or RNA-seq) (van der Kloet et al., 2020) when compared with using gene expression alone.
It is important to note that changes in gene expression are not the only drivers of changes in metabolic function. Other studies have noted the role of changing metabolic milieu in the blood as a driver of changes in the metabolic functions of the heart during heart failure (Murashige et al., 2020;Neubauer, 2007). Future work can integrate clinical measures, such as left ventricular ejection fraction (LVEF) or fluorodeoxyglucose-positron emission tomography (FDG-PET) uptake measures, which could help to separate clusters of patients and more clearly identify the influence of metabolism in the progression of heart failure.
Here, we provide an approach for constructing a tissue-specific metabolic model and demonstrate the utility of metabolic models to interpret changes in metabolic functions based on gene-expression data. The model-building process can be extended to other cell-or tissue-type-specific models. The metabolic tasks provide a 2-fold role for model validation and concrete metabolic functions to identify metabolic shifts in gene expression data. TIDEs represent a reaction-centric approach to identifying changes in metabolic functions and testing hypotheses for changes in gene expression for metabolic functions. These hypotheses can be formulated as metabolic tasks based on the current literature, based on reactions present in a metabolic model but not covered in the current list of metabolic tasks, or results from other gene-centric approaches, such as GSEA. The method is not limited to use with iCardio but can be used with any published metabolic model that includes GPR rules. Further, the method can be used with either microarray or RNA-seq data. We demonstrate that a heart-specific model, iCardio, with the TIDEs pipeline was able to identify decreased NO synthesis and decreased Neu5Ac synthesis across different heart failure datasets that were not identified using conventional gene-centric approaches, such as gene set enrichment analysis.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

DECLARATION OF INTERESTS
The authors declare no competing interests.
was necessary to prune reactions from iHsa that do not have evidence for presence in the heart. To do this, we integrated protein data from the HPA (data available from version 18, https://www.proteinatlas.org; Uhlé n et al., 2015) with iHsa (Blais et al., 2017). Various algorithms have been published to generate tissue-specific models from tissue transcriptomics or proteomics data. Here, we implemented 5 of these algorithms (Becker and Palsson, 2008;Jerby et al., 2010;Schultz and Qutub, 2016;Vlassis et al., 2014;Zur et al., 2010) to generate draft heart-specific models. For GIMME, ATP production was used as the objective function. Data was integrated from the HPA, which contains tissue-specific protein expression where each protein is assigned either a high, medium, low, or no expression based on data from antibody-based immunohistochemistry (Uhlé n et al., 2015). Code for implementing each algorithm is available at https://github.com/csbl/iCardio. The CORDA algorithm was chosen from among the algorithms given its accuracy for the pre-defined list of heart-specific metabolic tasks (Data S1; Table S2). The CORDA algorithm takes as an input user-defined high, medium, and negative confidence reactions to produce a model that is (1) consistent (i.e., all reactions can carry flux) and (2) maximizes high and medium confidence reactions while minimizing the number of negative confidence reactions. Proteins that corresponded with high, medium, or low/no expression in the heart as indicated in the HPA were included as high (n = 1005), medium (n = 2168), or no confidence reactions (n = 5163) respectively based on the model's GPR rules. Reactions without GPR rules (2300 reactions) or reactions associated with no data were included in the negative confidence reactions.
Validating iCardio iCardio was validated qualitatively by determining the number of reactions covered by each metabolic task and quantitatively by comparing ATP yields for common carbon sources between iCardio and another metabolic model, MitoCore (Smith et al., 2017). Parsimonious flux balance analysis (pFBA) (Lewis et al., 2010) determines the lowest sum of fluxes, and therefore reactions, necessary to complete an objective. Here, pFBA was used to identify the reactions necessary for each metabolic task. Previous work has shown that, in most cases, pFBA produces more physiologically relevant flux distributions compared to flux balance analysis (FBA) or flux-based algorithms which incorporate data (Machado and Herrgå rd, 2014). As a final, quantitative validation step, we calculated ATP yields for a variety of carbon sources in silico as the maximum flux through the ATP synthase reaction for one unit of each carbon source and compared the results to a smaller heart-specific model of mitochondrial metabolism (Smith et al., 2017).

QUANTIFICATION AND STATISTICAL ANALYSIS
Analyzing transcriptomics data Microarray data (Table S3) (Hannenhalli et al., 2006;Kittleson et al., 2005;Liu et al., 2015) from patients undergoing heart transplants for advanced ischemic or idiopathic heart failure were downloaded from the Gene Expression Omnibus (GEO) database (Barrett et al., 2013). Datasets were selected that (a) contained samples for both ischemic and idiopathic heart failure and (b) resulted in at least 50 differentially expressed genes for each type of heart failure. Since all the datasets had been background corrected using RMA, the limma package in R (Ritchie et al., 2015) was used to determine differentially expressed genes (DEGs) between healthy hearts and ischemic or idiopathic hearts. Genes with an FDR < 0.1 were considered to be differentially expressed and their corresponding fold change was used in subsequent analysis. Although one study contained patient metadata, this data was not included in downstream analysis.
To validate the identified metabolic functions associated with significant changes in transcription in general heart failure, four additional datasets were analyzed (Table S5) (Greco et al., 2012;Ren et al., 2020;Schiano et al., 2017), which included one microarray and three RNA-seq datasets. For the microarray dataset, DEGs were determined using the above methods. For the RNA-seq datasets, raw FASTQ files were downloaded from GEO (Barrett et al., 2013), pseudo aligned to the Homo sapiens transcriptome Ensembl v96 transcriptome using kallisto (Bray et al., 2016), transcript abundances were aggregated to the Entrez gene level R v. 3.6.3 with the package tximport (Soneson et al., 2015), and fold changes were calculated using DESeq2 (Love et al., 2014). Genes with an FDR < 0.1 were considered to be DEGs and their corresponding fold change was used in subsequent analysis.

Identifying TIDEs
Metabolic tasks and their associated reactions, as identified using iCardio with pFBA, were used to identify metabolic functions that are significantly associated with differentially expressed genes in a condition of interest. This method is referred to as Tasks Inferred from Differential Expression (TIDEs) (Figure 3). A total of 307 metabolic tasks was used for this analysis, representing the tasks functionally present in iCardio from the original task list (Data S1) that also contained at least one reaction with an associated GPR rule (Data S4). Reactions that carry flux for each task are identified using a pFBA assumption without previous knowledge of the gene expression data ( Figure 3A), as has been done with a related approach (Jerby and Ruppin, 2012). Gene expression log fold changes are overlaid onto reactions in the network using the GPR rules to give each reaction a weight. GPR rules represent the proteins necessary to catalyze a specific reaction through AND or OR relationships. The AND relationships represent a protein complex where different genes encode unique protein subunits necessary for enzyme function while OR relationships represent isozymes. Reactions with complex GPR rules are assigned the maximum absolute fold change for OR relationships and the minimum fold change for AND relationships. For OR relationships where there is a disagreement in the direction of change, i.e., where there are genes associated with both a positive and negative, the fold change with the highest absolute weight is taken as the reaction weight. The assigned Cell Reports 34, 108836, March 9, 2021 e2 Article ll reaction weight values across all reactions in a task are averaged to calculate the task score ( Figure 3B). To assign statistical significance to these task scores, the gene expression fold changes are randomized 1000 times among the genes measured in each dataset and task scores are recalculated based on the randomized data to create a distribution of task scores. The p value for each task score that corresponded to the original data is calculated as the number of random task scores greater/less than the original data, depending on how the task score falls relative to the mean randomized task score ( Figure 3C). TIDEs are identified as tasks with a p value < 0.025. A p value of 0.025 was chosen over a p value of 0.05 based on the use of a two-sided t test for calculating significance. For validation of selected TIDEs using general heart failure data, a p value < 0.05 was used for determining significance. Finally, iCardio was also used to identify reactions that belonged to each metabolic subsystem in the model. These sets of reactions were also used with the TIDEs method to identify reaction subsystems that were significantly associated with changes in gene expression data.

Gene set enrichment analysis
The same gene expression datasets (GSE1869, GSE5406, GSE57345) were also analyzed using gene set enrichment analysis (GSEA) (Subramanian et al., 2005) using the pre-defined gene sets by KEGG pathways. To more closely replicate the TIDE analysis, genes were shuffled within a sample rather than shuffling across samples within each dataset. Pathway enrichment scores with a nominal p value < 0.05 were defined as statistically significant.     Subsystems that were significantly increased or decreased based on the gene expression data are shown here.