A benchmark-driven approach to reconstruct metabolic networks for studying cancer metabolism

Genome-scale metabolic modeling has emerged as a promising way to study the metabolic alterations underlying cancer by identifying novel drug targets and biomarkers. To date, several computational methods have been developed to integrate high-throughput data with existing human metabolic reconstructions to generate context-specific cancer metabolic models. Despite a number of studies focusing on benchmarking the context-specific algorithms, no quantitative assessment has been made to compare the predictive performance of these methods. Here, we integrated various and different datasets used in previous works to design a quantitative platform to examine functional and consistency performance of several existing genome-scale cancer modeling approaches. Next, we used the results obtained here to develop a method for the reconstruction of context-specific metabolic models. We then compared the predictive power and consistency of networks generated by our method to other computational approaches investigated here. Our results showed a satisfactory performance of the developed method in most of the benchmarks. This benchmarking platform is of particular use in algorithm selection and assessing the performance of newly developed algorithms. More importantly, it can serve as guidelines for designing and developing new methods focusing on weaknesses and strengths of existing algorithms.


S1 Text. A benchmark-driven approach to reconstruct metabolic networks for studying cancer metabolism
Oveis Jamialahmadi, Sameereh Hashemi-Najafabadi, Ehsan Motamedian, Stefano Romeo, Fatemeh Bagheri  Parameter optimization GIMME. GEMs were generated by simultaneously varying the fraction of objective function and gene expression threshold linearly from 10 -6 to 1 (11 points), and between 1 st and 99 th percentile of the input expression profile (21 points), respectively. Figure A shows the simultaneous effect of 2 adjustable parameters of GIMME in terms of growth rate prediction ( Figure A (a)) and the number of non-empty GEMs ( Figure A (b)). While the sensitivity of GIMME to expression threshold was relatively low (except at moderate objective fractions), the decision on the fraction of objective function (i.e. growth) greatly affected the growth predictions, which was mainly due to the large difference between measured and predicted growth rates for cancer cell lines. The parameters corresponding to the lower relative error and higher model count (i.e. number of GEMs) were selected for all simulations (i.e. the fraction of objective function = 0.1 of maximum growth rate and expression threshold = 6 th percentile). iMAT. GEMs were generated by simultaneously varying 3 parameters of the algorithm, namely, flux activation threshold (ԑ), low and high expression thresholds were selected based on sensitivity analysis. More than 14500 GEMs were generated by simultaneously varying the flux activation threshold (ԑ) from 10 -3 to 10 (log-scale, 5 points), low expression threshold from 1 st to 75 th (7 points) percentile, and high expression threshold from 25 th to 99 th percentile (7 points). Figure B shows the simultaneous effect of 3 adjustable parameters of iMAT in terms of growth rate prediction and the number of non-empty (functional) GEMs. As depicted, the number of functional GEMs was profoundly affected by increasing ԑ (Figure B (f, h and j)). Moreover, at lower ԑ values, lower expression thresholds resulted in better growth rate predictions and higher number of functional GEMs. Best results were achieved when high expression threshold was set to 25 th percentile ( Figure B (a and b)). Therefore, moderately expressed genes in the default parameters (i.e. low threshold= 25 th , and high threshold = 75 th percentile) were considered as highly expressed genes [1].

Figure B. Parameter optimization for iMAT.
More than 14500 GEMs were generated by simultaneously varying the flux activation threshold (ԑ), low and high expression thresholds. The 3 adjustable parameters were optimized based on the performance in growth rate predictions (a, c, e, g and i), and number of functional GEMs (b, d, f, h and j).
CORDA. GEMs were generated by linearly varying the constraint value for defining the reaction dependency from 1% to 99% of maximal flux rate (20 points). As shown in Figure C, the decision on this parameter did not affect the generated GEMs. Therefore, the constraint was set to 10% of maximal flux rate.

Figure C. Parameter optimization for CORDA.
A total of 1200 GEMs were generated by linearly varying the constraint parameter from 1% to 99% of maximum flux rate. The parameter did not exhibit any evident effect on predicted growth rates (represented as mean relative error ± standard deviation) (a) or the number of functional GEMs (b).

TRFBA.
The constant parameter C in TRFBA was calculated according to the original publication [2]. To determine the range at which TRFBA shows a non-monotonic dependence on growth rate, C was varied between several extreme points (from 0 to 1). Next, the C was varied linearly from 5 0.03 to 0.05 (15 points) and the mean relative error was calculated at each C. The C corresponding to the minimum relative error (C = 0.0405) was selected for all simulations (Figure. D).

Figure D.
Parameter optimization for TRFBA. GEMs were generated by linearly varying the parameter C between 0.03 and 0.05. Data shown as mean relative error ± standard error.