Integration of omics data to generate and analyse COVID-19 specific genome-scale metabolic models

COVID-19 presents a complex disease that needs to be addressed using systems medicine approaches that include genome-scale metabolic models (GEMs). Previous studies have used a single model extraction method (MEM) and/or a single transcriptomic dataset to reconstruct context-specific models, which proved to be insufficient for the broader biological contexts. We have applied four MEMs in combination with five COVID-19 datasets. Models produced by GIMME were separated by infection, while tINIT preserved the biological variability in the data and enabled the best prediction of the enrichment of metabolic subsystems. Vitamin D3 metabolism was predicted to be down-regulated in one dataset by GIMME, and in all by tINIT. Models generated by tINIT and GIMME predicted downregulation of retinol metabolism in different datasets, while downregulated cholesterol metabolism was predicted only by tINIT-generated models. Predictions are in line with the observations in COVID-19 patients. Our data indicated that GIMME and tINIT models provided the most biologically relevant results and should have a larger emphasis in further analyses. Particularly tINIT models identified the metabolic pathways that are a part of the host response and are potential antiviral targets. The code and the results of the analyses are available to download from https://github.com/CompBioLj/COVID_GEMs_and_MEMs.


Introduction
The novel coronavirus (SARS-CoV-2) and the COVID-19 disease pandemic have impacted many aspects of our lives since its initial emergence and spread in the early 2020s. Various computational approaches have quickly been developed and have already been used to aid diagnosis, predict outcomes, analyse dynamics, and improve treatment of this disease. For example, Zoabi et al. [1] applied machine learning approaches to predict COVID-19 diagnosis based on the presence and severity of clinical symptoms [2]. used artificial intelligence methods to diagnose and predict the COVID-19 patient's response to treatment. The COPASI tool [3] was used to model the dynamics of SARS-CoV-2 using differential equations [4]. A recent review of computational approaches dedicated to combating the COVID-19 pandemic has been presented by Hufsky et al. [4].
Among recent computational approaches aimed to address the so-far limited understanding of COVID-19 disease and dynamics are also Genome-scale metabolic models (GEMs). These models are gaining relevance in various scientific fields from bioengineering and production of biologics to the understanding of complex diseases [5]. Manual reconstruction of an integrated host-virus GEM has already been used to identify and confirm potential antiviral targets [6,7]. However, because this model only incorporated a SARS-CoV-2 virus biomass target function (VBOF) into a GEM of human alveolar macrophages, its predictive power could be further improved by fitting metabolic constraints to the diseased state of a metabolic network. A similar approach of modelling the infection in the hepatocyte-derived cellular carcinoma cell line Huh7 using the manual extension of the Recon 2.2 model was reported by Yaneske et al. [8].
GEMs can be used in a combination with different model extractions methods (MEMs), which allow us to automatically adapt a model and its constraints to a specific context using omics data [9]. One of the main problems is that different MEMs produce variable, sometimes contradictory results [10]. Different datasets gathered either from COVID-19 patients or cell lines have been applied to the analysis of COVID-19 metabolic signatures by genome-scale metabolic modelling and automatically generated context-specific GEMs [11,12]. However, only selected MEM approaches have been applied in the present studies. Namely, Nanda and Ghosh [11] applied iMAT and Cheng et al. [12] applied tINIT extraction method to variable scope of COVID-19 datasets.
In this study, we describe the integration of various publicly available COVID-19 datasets using different MEMs in combination with the Human-GEM metabolic model [13]. We reconstruct an individual metabolic model for each of the datasets and healthy groups and each of the MEMs, namely iMAT [14], INIT [15], tINIT [16] and GIMME [17]. We perform a thorough comparison of obtained models and discuss their biological relevance. Moreover, we generate a set of flux samples [18] using each of these models, which are used to identify enriched metabolic subsystems in each of the models between the infected and healthy groups. These are then used to perform a comparative analysis of the obtained results, which we discuss in a broader biological context.

Background
The coronavirus causing COVID-19, SARS-CoV-2, belongs to the family Coronaviridae and is a single-stranded RNA virus surrounded by a lipid envelope containing several viral proteins [19]. Viruses do not possess enzymes to produce the necessary ingredients and energy to build new viruses. Therefore, upon entry into the host cell, the virus hijacks host cell machinery, modifies many metabolic pathways and even the composition of membranes [20]. Antivirals targeting the host metabolic factors are emerging as an alternative therapeutic strategy [21]. Predicting and understanding the changes in the host metabolism is a necessary step towards the development of new antivirals. Computational approaches, such as GEMs, enabling the reconstruction of host metabolism after viral infection can aid in the search for new antiviral targets.
GEMs have gained large importance in different fields ranging from biotechnology to systems biology and systems medicine [5,22]. For example, GEMs have been applied in cancer research [23,24], drug targeting in pathogens [25], prediction of drug side effects [26], and optimisation of production of recombinant proteins using Chinese hamster ovary (CHO) cells [27]. Reconstruction and analysis of GEMs are mostly based on constraint-based approaches. The main representatives of these approaches are flux balance analysis (FBA) and its extensions [28,29]. FBA performs a steady-state analysis of the metabolic network. This is based on the constraints of the observed metabolic reactions (i.e., feasible metabolic flux ranges), and a predefined objective function. FBA yields a solution describing metabolic flux values that maximise the objective function within the given constraints. A limitation of FBA is that it yields a single non-unique solution, which describes only one of the optimal steady-states of the system. This can be addressed with the application of different FBA extensions, such as flux variability analysis (FVA), which assesses the metabolic flux ranges that maintain a specific state of the metabolic network [30]. Another alternative is to apply parsimonious FBA (pFBA) that yields a unique solution by maximising the objective function and simultaneously minimising the overall fluxes through the metabolic network [31]. However, these approaches still require a specification of one or more optimisation functions. These might strongly affect the obtained results and are hard to define in a general setting [32,33]. Even though using a biomass accumulation function as an optimisation criterion is plausible in certain scenarios (e.g., for studies involving microorganisms or cancer cells), its general applicability is questionable [34]. An alternative approach to the analysis of GEMs without the requirement to specify a cellular objective is flux sampling. This is based on a generation of representative metabolic fluxes covering the feasible solution space [35]. Flux sampling generates a possible range of metabolic flux values as in the case of FVA, however, in an unbiased manner.
Available GEM reconstructions usually describe a biological system in a generic state. For example, the Human-GEM model describes a reference reconstruction of a human cell [13]. This needs to be adapted to a specific cell type in a specific context before further analyses are performed [23]. Model extractions methods (MEMs) allow the automatic integration of different omics (usually transcriptomics) data to adapt a generic model to a given context using gene-protein-reaction (GPR) rules encoded in a model [36]. For example, GIMME aims to remove the reactions catalysed by the products of genes with expression levels below a predefined threshold [17,37]. iMAT additionally aims to pertain to reactions catalysed by the products of highly expressed genes [9,14,37]. The latter is also the case of INIT, which does not impose a strict steady-state assumption and thus allows the accumulation of certain metabolites [15]. tINIT presents an extension of INIT, in which the resulting model needs to be able to perform a given set of metabolic tasks. Context-specific GEMs have already been applied to the analysis of omics data describing different conditions and for understanding complex systems disorders (e.g., see Ref. [38]). Such analyses have also been conducted in the domain of the analysis of COVID-19 metabolic signatures [11,12]. However, present studies have only applied a single MEM to the analysed datasets. Since each MEM is based on certain assumptions different MEMs might produce significantly different or even contradictory results [10]. The predictive power of extracted models can be increased if different MEMs are analysed in a given context to guide the selection of a method that yields models with the largest significance (e.g., models preserving the separation between the observed experimental groups) [39]. Furthermore, the selection of pre-processing and omics data integration steps also has a significant effect on obtained results and the use of various tools, approaches and datasets significantly increase the accuracy, precision and robustness of analyses [40][41][42][43]. Setting up workflows combining the integration of different datasets and tools enabling systematic assessment of omics data is one of the main goals of systems medicine [44,45].
In this work, we describe an approach to perform a selection of the most suitable MEMs in a combination with different COVID-19-specific datasets. The proposed selection is guided by different analytical approaches including principal component analysis (PCA), evaluation of model distances based on Jaccard index metrics, and analysis of the reactions and their flux values observed in different settings. Moreover, using the selected MEMs we perform detailed analyses of obtained context-specific GEMs to compare the metabolic states between healthy and infected cells. There are two main contributions of this study. Firstly, we propose a set of analyses that can be used to identify the most suitable MEM for an exact GEM-based analysis of observed datasets. Secondly, we analyse the models produced by selected MEMs in a wider biological context and identify the metabolic pathways that are a part of the host response and thus present potential antiviral targets.

Methods and data
The whole process of data acquisition and preprocessing, integration of data into a reference model with the extraction of context-specific models and their analysis is illustrated in Fig. 1 and described in the following section.

COVID-19 datasets
We collected the data needed to reconstruct the models from publicly available repositories. We identified the repositories with transcriptomes of healthy and COVID-19-specific human bronchial epithelial (HBE), lung biopsy cells, human embryonic kidney (293T), Calu-3, and adenocarcinoma human alveolar basal epithelial (A549) cell lines as described by Blanco-Melo et al. [46] and Weingarten-Gabbay et al. [47]. Since each dataset belongs to a different cell type, it also defines the biological context in which the measurement of the transcriptome was performed. Next, we used the SRA toolkit [48,49] to download the raw transcriptomic data, from each of the repositories. We used the Kallisto tool [50] to obtain the transcripts-per-million (TPM) values for the observed transcripts. However, to extract the models, we had to evaluate the expression levels of each of the observed genes. Since a gene can have multiple transcripts, its expression level was evaluated by summing the levels of all of the transcripts belonging to a gene. Finally, we calculated the average TPM values for each gene for each of the observed experiments and each of the observed conditions, namely healthy and infected.

Extraction of COVID-19 specific models
We used the Human-GEM model version 1.6.0 [13] as a scaffold model for the extraction process. 1 Before reconstructing infected models, the model was augmented with the virus biomass objective function (VBOF) as described by Nanda and Ghosh [11]. Different model extraction methods (MEMs) were used in the reconstruction, namely iMAT [14], INIT, tINIT [16], GIMME [17]. Models were reconstructed for both the infected and the healthy group. 75th percentile of the average TPM values within a dataset and within a condition (infected or healthy) was used as a threshold value for each of the algorithms. During the reconstruction, the default Human-GEM model biomass function was used for the healthy state and VBOF for the infected state.
After the reconstruction, we verified that a generated model could perform the essential metabolic tasks. For healthy cells, these were obtained from the Human-GEM model repository [13]. However, for infected cells, the metabolic task file was acquired from the repository accompanying the paper [11]. If the models could not perform all the specified metabolic tasks, they were augmented with necessary reactions as described by Robinson et al. [13].

Analysis of extracted models
We evaluated how the differences in model sizes depend on different factors. The significance of model size differences in dependence on a factor was evaluated using the Kruskal-Wallis H-test in cases where the number of groups was larger than 2 and the two-sample Kolmogorov-Smirnov test elsewhere. Moreover, we analysed the specificity of active reactions in dependence on each factor using the Jaccard index metric, which can be used to assess the similarity between a pair of models as Here M i presents a set of reactions in model i and M j a set of reactions in model j [51]. Jaccard indices were assessed for all pairs of models and then used to compare the reaction specificity within each of the observed groups of models, and between the observed group of models and the remaining models (boxplot visualisation and two-sample Kolmogorov-Smirnov test). The reference model, which was used as a scaffold model in the extraction process, has more than 13,000 reactions. Extracted models eliminate inactive reactions for a given context but can still preserve up to more than 3,000 active reactions (see 2). Dimensionality reduction techniques such as principal component analysis (PCA) allow us to project high-dimensional data to a space with a manageable level of dimensions. This projection can be used to guide further analyses. For example, PCA plots in two-dimensional space can be used to assess how well obtained models cluster together in dependence on different factors. In our case, we aimed to find a configuration of the extraction process (namely, selection of the most suitable MEM) that would be able to separate the healthy models from the infected models. Moreover, PCA can be used to assess the relative impact of each of the observed factors (namely MEM, dataset, and infection) on a model content [10]. We thus opted to analyse the obtained models using PCA, the results of which were also confirmed by t-SNE (t-distributed stochastic neighbour embedding) plots [52,53]. The analysis was conducted in a similar manner as described by Opdam et al. [9] and later applied by Walakira et al. [39]. We constructed a matrix describing the presence or absence of each metabolic reaction in each of the reconstructed models. We evaluated the amount of the variability between the models that is explained by each of the factors of extraction, namely MEM, dataset, and presence or absence of the infection. We observed the clustering of obtained models and their compliance with predefined groups using PCA and t-SNE plots.
To analyse the dynamics of extracted models and to identify an active set of reactions for a given context, we generated 1000 flux samples for each of the reconstructed models using the Artificial Centering Hit-and-Run (ACHR) sampler [54]. These samples were then used to evaluate the effects of different factors on model sizes, specificity of reactions, and variability between the generated models. Moreover, we used the generated flux samples to identify the enriched metabolic reaction. The two-sample Kolmogorov-Smirnov test was used to identify the reactions that were differentially expressed between healthy and infected models. Within the enrichment analysis additional criteria to identify the significantly changed reactions needs to be established. If flux sampling is used, any number of samples and thus an arbitrary low p-value for a statistical test can be obtained [12]. We used fold-change criteria (FC) between the healthy and infected groups, as described previously by Nanda and Ghosh [11]: where R i is the average reaction activity in infected and R h is the average reaction activity in healthy models. We used the FC criteria to further identify the significantly altered reactions, namely to detect up-and down-regulated reactions after the infection. For the former, we used the FC value cutoff value of at least 0.82 (10-fold activation), and for the latter the cutoff value of at most − 0.82 (10-fold inhibition) between the infected and the healthy groups [11]. GEMs are composed of different metabolic subsystems containing metabolic reactions with specific functionalities. For example, Human-GEM, which was used as a reference model in our analysis, is composed of 142 subsystems ranging from Glycolysis/Gluconeogenesis to Cholesterol Metabolism. We used the generated lists of significantly up-and down-regulated reactions for each of the MEMs and each of the datasets to perform the enrichment analysis of the metabolic subsystems with the hypergeometric test as described by Walakira et al. [39]. We identified the metabolic subsystems that were consistently enriched in the same dataset using different MEMs and/or consistently enriched in different datasets using the same MEMs. These results were then used to perform the biological interpretation of the analysed datasets and compared with related COVID-19 studies.
Reconstruction of context-specific models, their curation, and sampling of model fluxes were performed with the COBRA [55] and RAVEN Toolboxes [56] in Matlab R2019b (MathWorks Inc., Natick, Massachusetts, USA) using the Gurobi solver (Gurobi Optimisation, LLC, Beaverton, Oregon, USA). Further analyses of reconstructed models, visualisation, and enrichment analysis were performed in Python 3. 2

Models vary by size in dependence on different factors
We analysed how the size of the extracted models depends on different factors, namely on a MEM, a dataset, and infection. The distributions of model sizes in dependence on different factors are presented in Fig. 2(A). Differences in model sizes were significant only between different MEMs (p < 10 − 5 , Kruskal-Wallis H-test). Models produced with GIMME pertained to the largest number of reactions (mean model size was 7134.6 reactions) and models produced with iMAT the smallest number of reactions (mean model size was 3035.4 reactions).
To identify the number of reactions that carry nonzero flux in simulations, we generated a set of flux samples for each of the models. We evaluated the size for each of the models by counting the reactions that had nonzero flux in at least one of the samples. The distributions of model sizes after flux sampling is presented in Fig. 2(B). This analysis indicated that the majority of reactions pertained in the GIMMEproduced model were inactive in simulations (mean model size was 2751.8 reactions). On the other hand, iMAT-produced models pertained to the majority of the reactions even after the flux sampling analysis (mean model size was 2903.9 reactions). Even though differences between model sizes were smaller in this experiment, these differences were significant between MEMs (p < 10 − 6 , Kruskal-Wallis H-test) and between the infected and uninfected group of models (p = 0.023, twosample Kolmogorov-Smirnov test). This indicates that the reaction sets preserved after the flux sampling reflected larger biological significance in a given context than the full sets of reactions identified solely by a MEM.

Metabolic reactions should be analysed on the quantitative level
We proceeded with the analysis of the obtained models based on the generated flux samples. We analysed the specificity of active reactions in dependence on each factor using the Jaccard index metric [51]. We analysed the distribution of this metric within each factor and between the factors (see Fig. 3).
We further analysed the significance of specificity of active reactions of each of the factors using the two-sample Kolmogorov-Smirnov test. We adjusted the calculated p-values for multiple testing using the Benjamini-Hochberg procedure. Active reactions were significantly specific for each of the MEMs and all the datasets except for the HBE data. Reactions were not significantly specific for neither healthy nor infected models. This indicated that differences between healthy and infected models should be investigated further based on quantitative flux values through the observed reactions and/or within a specific MEM/dataset. Complete results of the reaction specificity analysis are available in Supplementary Table 1.

Infection and cell lines separate GIMME-and tINIT-produced models
Principal component analysis (PCA) can be used to evaluate the degree of variability in the models explained by each of the factors [9,39]. As in the case of the reaction specificity analysis, we conducted the PCA analysis based on generated flux samples. Namely, reactions that had zero fluxes in all generated samples were removed from a model.
As expected, the model extraction method (MEM) described the largest amount of variability in the data. More precisely, the first principal component (PC1) was able to explain around 30% of the variability and MEM described almost 89% of the variability in the PC1. The separation of models observed in the PCA plot using the first two principal components (see Supplementary Fig. 1 and Supplementary Table 2) complied with the separation observed in the t-SNE plot (see Supplementary Fig. 2). We further analysed the models produced with each MEM to evaluate the amount of variability explained by the remaining factors. In all cases, a dataset explained a significantly larger amount of variability than infection (see Supplementary Table 3). Within the PC1 the former equalled almost 94% for iMAT (see Supplementary Figure 3), 82.3% for GIMME (see Supplementary Figure 4), 54.6% for INIT (see Supplementary Figure 5), and 71.5% for tINIT (see Supplementary  Fig. 6). However, even though a dataset explained a larger amount of variability, models extracted using GIMME could be separated by infection using the PC2 explaining 21.4% of variability in the data, which was almost as much as PC1, which explained 25.3% of variability (see Supplementary Fig. 4).
Models were also partially separated by infection using tINIT and PC1, which explained around 23.3% of variability in the data. The only outliers were healthy models obtained using lung biopsy cell data (see Supplementary Fig. 6). However, models produced by tINIT were separated also by the datasets, which represent separate biological entities. These were cells or tissue infected by the SARS-CoV virus, which differed in cell-specific metabolic steady-states, levels of viral load and host response to the virus. GIMME models did not separate by the dataset, while PCA analysis of the original data showed strong separation by the dataset/cell type except for lung biopsy cell data. Separation by infection and cell type in GIMME and tINIT models indicated that these two MEMs can generate models which reflect large compliance with predefined groups of data and should have a larger emphasis in further analysis.

Results of the enrichment analysis vary with a MEM
We performed the enrichment analysis using the two-sample Kolmogorov-Smirnov test to identify up-and down-regulated metabolic reactions. Full results of the enrichment analysis are available at the supplementary GitHub repository. Furthermore, we used the lists of enriched metabolic reactions in a combination with metabolic subsystems as defined in the reference model [13] to perform the hypergeometric test, and to identify up-and down-regulated metabolic subsystems in each of the models (see also Section 3).
Different MEMs and different datasets produce different, in some cases contradictory results [9]. We analysed a specific MEM by extracting the metabolic subsystems that were consistently enriched (either up-or down-regulated) in a predefined number of models obtained with all of the observed datasets and with a given MEM. Fig. 4 visualises the 23 metabolic subsystems that were consistently enriched in at least two out of five datasets for a MEM. All enriched subsystems are available as Supplementary Fig. 7.
tINIT has identified more than twice the number of enriched metabolic subsystems than GIMME combined for all datasets. Interestingly, GIMME found the lowest number of enriched subsystems in comparison to all MEMs. Enriched metabolic subsystems were mainly from the lipid-related metabolism, from fatty acids, glycerophospholipids to cholesterol and their metabolites, ubiquinone, vitamins, etc. GIMME and tINIT had few commonly enriched subsystems in the same dataset. These were downregulation of vitamin E and D metabolism, upregulation of transport reactions, and opposite direction of fatty acid biosynthesis (odd chain). These results confirm that the selection of MEM has a higher effect on results than the dataset itself.

Results of the enrichment analysis vary with a dataset
We opted to analyse the subsystems which were consistently enriched in at least a predefined number of MEMs for each dataset in a similar way as described in Section 4.4. Fig. 5 visualises the 12 metabolic subsystems that were consistently enriched in at least two out of four MEMs for a dataset. All enriched subsystems are available as Supplementary Fig. 8. Commonly enriched subsystems were relatively specific for each of the datasets, similarly as observed in the case of MEM-based enrichment analysis. This is in line with the fact that each dataset represents a different cell type, which differs in the level of susceptibility to the infection and host response. There were only a few enriched metabolic subsystems common between three MEMs, while there is not even one subsystem commonly predicted by all MEMs in the same dataset. Therefore, we can confirm that the selection of MEM had the highest effect on the results and can significantly affect the biological interpretation.

Discussion
Survivors of severe or critical COVID-19 have long-lasting metabolic abnormalities [57]. Multiple studies confirmed changes in metabolism in COVID-19 patients with significant changes in serum lipids [58]. Moreover, serum metabolome could be used as a predictive and diagnostic biomarker [59,60]. Serum cholesterol and fatty acids were among the metabolites, which can predict progression to severe COVID-19. Our analysis correctly identified that modulation of lipid metabolic pathways in cells infected by SARS-CoV-2 is in line with the observations in COVID-19 patients. Several of the enriched pathways have been also indicated for antiviral therapy to decrease the severity of the disease, and have been confirmed to have an active role in the SARS-CoV-2 infection and replication in preclinical models. Taking into account the PCA results and the data from COVID-19 patients, tINIT best preserved the biological variability in the data and enabled the best prediction of enrichment of metabolic subsystems in comparisons to other

MEMs.
Vitamin D3 is produced in the skin by UV irradiation from 7-dehydrocholesterol. It is biologically active after it is converted by enzymes in the liver and kidney to 1,25-dihydroxycholecalciferol. Vitamin D3 metabolism was predicted to be down-regulated in one dataset by GIMME models, and in all by tINIT models. This is in line with several meta-analyses of published data, which found significantly lower 25 (OH)D concentration, and significant relation between the low concentration and infection, severity or mortality of COVID-19 patients [61][62][63][64][65][66]. Vitamin D3 supplementation has been hypothesised to affect clinical outcomes in patients with COVID-19, however; this was not successfully confirmed [61,67]. Currently, vitamin D3 supplementation is not recommended [68]. However, all studies agree that the lack of randomised controlled studies, small cohorts, different dosages and vitamin D formulations, non-standardised study conditions and large heterogeneity between the studies hinders the evaluation and conclusions. Especially since studies in preclinical models showed that vitamin D3 inhibits SARS-CoV-2 virus replication in cells [69,70], but affect also other aspects of COVID-19 disease [71]. Additionally, the analysis of tINIT models proposed that the metabolism of vitamin D3 molecules is changed in different cell types and this could affect the efficacy and availability of vitamin D3 in the body. Therefore, different dosages and vitamin D formulations would need to be tested in clinical studies.
Retinol, a biologically active form of vitamin A, maintains the innate and adaptive immunity in viral infections [72]. Both tINIT and GIMME models predicted downregulation of retinol metabolism in different datasets, which is in line with the observations in COVID-19 patients. Low levels of vitamin A in plasma of COVID-19 patients was observed and this was significantly associated with the severity of the disease [73,74]. High-throughput screening of natural compounds found that all-trans retinoic acid inhibited 3C-like protease of SARS-CoV-2 and by this exhibited antiviral effect [75]. Additionally, an AM580 compound that has inhibited viral replication of MERS-CoV (Middle East respiratory syndrome) in vitro and in vivo in mice, is a retinoid derivate and RARα (Retinoic Acid Receptor alpha) agonist [76]. However, a meta-analysis of studies on vitamin A supplements in children did not confirm the effect on respiratory tract infection [77]. Both tINIT and GIMME models predicted also downregulation of vitamin E metabolism in different datasets, however, only one study so far measured the vitamin E level in COVID-19 patients, which was found unchanged [73].
Modulation of different pathways involved in fatty acid metabolism was predicted by GEMs in applied datasets or cells. Increased levels of plasma free fatty acids and triglycerides were observed in multiple COVID-19 patient studies. Meta-analysis of omics data confirmed alterations in fatty acid metabolism [58]. The viral infections are known to hijack the lipid biosynthesis to enable viral reproduction [78]. The SARS-CoV-2 spike protein undergoes palmitoyl modification [79] and needs new fatty acids for membrane formation. FASN (Fatty Acid Synthase) is one of the rate-limiting enzymes in fatty acid synthesis and the key enzyme in the synthesis of palmitate. Knockdown of FASN in cell lines resulted in lower SARS-CoV-2 infection and lower quantity of viral RNA [80]. The same study also showed that inhibiting fatty acid synthesis by drugs, such as orlistat, lowered viral levels in the lung and increased survival in a mouse model. Interestingly, a study in elderly patients showed a lower metabolic flux in the fatty acid pathway in survivors vs the deceased [81]. A more detailed analysis at the level of a single fatty acid showed opposite changes in levels of different fatty acids in serum of patients with severe COVID-19 depending on the type of the fatty acid and its desaturation index [82]. For example, the palmitic and stearic acid, needed for the formation of viral membranes, were decreased. A decrease in serum 2-palmitoyl-glycerol in COVID-19 patients was one of the three potential diagnostic markers of the disease [59]. The retinoid derivate, AM580, is also an SREBF (Sterol Regulatory Element Binding Transcription Factor) inhibitor. SREBF is the major regulator of lipid metabolism and could express its antiviral function through this mechanism [76].
Downregulation of cholesterol and terpenoid biosynthesis was predicted by tINIT models in almost all datasets. Numerous studies have compared the plasma cholesterol level in COVID-19 patients, and two meta-analyses of this data concluded that total, HLD and LDL cholesterol are significantly lower in hospitalized severe patients and non-survivors [83,84]. Although there are some discrepancies between the studies in terms of the direction of change and which serum/plasma parameter is changed, they all confirm that cholesterol metabolism is affected by the SARS-CoV-2 infection. Moreover, factors involved in cholesterol homeostasis are actively involved in viral infection [85]. Cholesterol-rich domains are relevant for viral budding [86]. Membrane cholesterol was found to be essential for SARS-CoV-2 entry into the cell [87]. SARS-CoV-2 nucleoprotein interacts with NPC1 (Niemann-Pick type C1), an intracellular cholesterol transporter, and inhibitors of this interaction were able to reduce infection in vitro [33]. Moreover, functional interrogation of host factors required for SARS-CoV-2 infection identified several major regulators of cholesterol biosynthesis and homeostasis [88][89][90]. For example, HDL scavenger receptor SCARB1 (Scavenger Receptor Class B Member 1) facilitates virus entry via cholesterol-binding site [91]. The 25-hydroxycholesterol was identified as an antiviral host factor that inhibits SARS-CoV-2 infection [92,93].

Conclusion
We applied different MEMs in a combination with various scopes of publicly available COVID-19 specific datasets to analyse the metabolic signatures and potentially propose novel antiviral targets. We proposed a set of analyses that can be used to identify the most suitable model extraction method that can be used in the precise analysis of observed datasets. We propose that PCA and/or tSNE analyses can help explore the data and select the MEM that is the best in preserving the biological variability of the data, which seems to be crucial for generating precise predictions. When we paralleled the enrichment of perturbed metabolic pathways with patient data, we confirmed that tINIT-produced models have identified several fatty acid and cholesterol metabolic pathways, which were modulated also in COVID-19 patients. Moreover, these pathways were identified as factors of host response and potential drug targets. Most interestingly, tINIT models have predicted lower metabolism of several vitamins and thus a potential lack of vitamins needed to fight the virus.

Declaration of competing interest
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.