Improving the phenotype predictions of a yeast genome‐scale metabolic model by incorporating enzymatic constraints

Abstract Genome‐scale metabolic models (GEMs) are widely used to calculate metabolic phenotypes. They rely on defining a set of constraints, the most common of which is that the production of metabolites and/or growth are limited by the carbon source uptake rate. However, enzyme abundances and kinetics, which act as limitations on metabolic fluxes, are not taken into account. Here, we present GECKO, a method that enhances a GEM to account for enzymes as part of reactions, thereby ensuring that each metabolic flux does not exceed its maximum capacity, equal to the product of the enzyme's abundance and turnover number. We applied GECKO to a Saccharomyces cerevisiae GEM and demonstrated that the new model could correctly describe phenotypes that the previous model could not, particularly under high enzymatic pressure conditions, such as yeast growing on different carbon sources in excess, coping with stress, or overexpressing a specific pathway. GECKO also allows to directly integrate quantitative proteomics data; by doing so, we significantly reduced flux variability of the model, in over 60% of metabolic reactions. Additionally, the model gives insight into the distribution of enzyme usage between and within metabolic pathways. The developed method and model are expected to increase the use of model‐based design in metabolic engineering.

If you feel you can satisfactorily deal with these points and those listed by the referees, you may wish to submit a revised version of your manuscript. Please attach a covering letter giving details of the way in which you have handled each of the points raised by the referees. A revised manuscript will be once again subject to review and you probably understand that we can give you no guarantee at this stage that the eventual outcome will be favorable. Here the authors propose a set of new constraints for use in flux balance analysis of genome-scale metabolic models. These constraints account for the flux capacity of enzymatic reactions. They apply these constraints with a model of yeast, then demonstrate how a model with these constraints is capable of generating predictions that a standard model cannot produce. Interestingly, the constraints are all encoded as additional reactions and stoichiometric coefficients in the model, meaning the new model is completely compatible with existing FBA software.
The idea of adding constraints to FBA to account for enzyme capacity and overall enzyme availability has been thoroughly explored in previous work, and the authors did an excellent job of reviewing this work in their introduction. The authors also did a good job of contrasting previous methods with their own technique. Despite all of these existing methods, it appears as though what the authors are presenting is a novel and significant advance. The authors also present a compelling set of demonstrations for their model, showing how the model with their enhanced constraints can capture biology that the existing model cannot.
Overall, this is an extremely well-written manuscript, with a very clearly and concisely described approach, and it represents a significant advance to the art.
I have only two significant comments, and one minor comment.
Significant comments: 1.) The authors have done a great job of showing a number of case-studies where their new technique captures biology that standard FBA approaches cannot. However, it would be useful if the authors also compared the enhanced predictive capabilities of their algorithm with other existing similar techniques (e.g. ME modeling, FBAwMC, RBA). Although it is certainly out of scope to actually apply the competing methods to each of the described case studies, it would be useful if the authors could comment on whether or not the existing competing methods would be expected to perform worse, the same, or better than GECKO. Without this discussion, it's difficult to fully evaluate GECKO vs these competing methods.
2.) Models commonly map genes to metabolic reactions, while the new constraints introduced by the authors relate to proteins. In prokaryotes, there is typically a 1-1 relationship between genes and proteins, but in eukaryotes, this is not the case. Due to splice variants, an individual gene may map to several different proteins, likely with different cat values. Can the authors comment on this? Is their model mapped to protein IDs or genes? Is it expected that this might impact results? How might this impact the proteomics-based analysis, if at all?
Minor comments: 1.) It could be I missed it somewhere in the manuscript, but it appears as though the authors never indicate how many of model reactions they were able to find kcat values for in BRENDA. Did kcat values exist for all reactions? If not, what kcat value was used for reactions where no measured value could be found? How many kcat values were not exact matches but exploited the "flexible" matching to brenda mentioned in the methods? Are the measured kcat values all collected for similar/identical enzymes? Reviewer #2: Overall, the formulation presented in this is elegant and represents a very good contribution to the field. Moreover, the model provided is also a relevant tool for the community. However, the full formulation of the method with individual enzyme levels has not been sufficiently exploited and the result shown of decrease flux variability seems somewhat obvious. Although the impact of decreasing flux variability in metabolic engineering design has been explored, I would expect a wider variety of analysis here. The results shown for the more general constraint of total enzyme mass are somewhat more consistent and span a variety of effects. However, novelty in the approach here is somewhat more limited compared with existing methods, models and concepts. Thus, I believe the authors need to expand the results section for the full formulation of the method such that this paper can bring the expected level of contribution for MSB. Moreover, in section 2.4 it is essential to know if the experiments with Yeast 6 have been performed with pFBA or FBA. Given the high variability found, I suspect they were performed with FBA and, given the formulation of GECKO, inspired in pFBA, I believe this is not a fair comparison Some minor concerns: Introduction Sentence "However, when considering the production of a metabolite of interest, these models typically make the assumption that the uptake rate of the carbon source (e.g., glucose) limits production. This may be an oversimplification because experimental yields are usually considerably lower than the maximum theoretical yields5" Although the first sentence is correct, it does not imply the second observation. Usually in Metabolic Engineering design projects the maximum theoretical yield is not assumed.
In terms of the present formulation, the authors need to refer to the paper Machado et al 2016 PLOS Comput Biol given the obvious similarities (although overall the GECKO method brings a sufficient level of innovation)

Results
Overall, sections 222 and 223 are of moderate interest but I feel some observations would need to be discussed. As this would probably expand this section, it might make sense to put part of it as supplementary material. As an example: Why proteins in complexes and promiscuous are faster and smaller (for complexes)? Also, the network properties also require further discussion Regarding data in Figure 3E -I miss a comparison with predictions made with nominal maintenance Growth rates (233) For comparison purposes, the authors either use their method with random kcat values or use normal FBA simulations using the uptake rates given by the GECKO simulation. How would the original model behave in FBA in defined medium if all input fluxes are experimentally given (as this is the standard simulation method)? Although for complex medium this might be difficult to get, it is usually obtained in experiments using defined media at least for the carbon source uptake rate. A related question: how do the uptake rates obtained with the new model / formulation compare with the ones observed experimentally? Also in this section, flux values could have been compared with experimental 13C data, for example for one fermentable and one non-fermentable carbon source Section 2.5 "Given their unconstrained nature, GEMs tend to overestimate biological performance such as knockout growth and/or production of a specific metabolite of interest." I believe there is a confusion with model and simulation. In this case, other simulation tools (non FBA) such as MOMA or ROOM provide less overestimates for biomass performance.

Reviewer #1:
Here the authors propose a set of new constraints for use in flux balance analysis of genome-scale metabolic models. These constraints account for the flux capacity of enzymatic reactions. They apply these constraints with a model of yeast, then demonstrate how a model with these constraints is capable of generating predictions that a standard model cannot produce. Interestingly, the constraints are all encoded as additional reactions and stoichiometric coefficients in the model, meaning the new model is completely compatible with existing FBA software.
The idea of adding constraints to FBA to account for enzyme capacity and overall enzyme availability has been thoroughly explored in previous work, and the authors did an excellent job of reviewing this work in their introduction. The authors also did a good job of contrasting previous methods with their own technique. Despite all of these existing methods, it appears as though what the authors are presenting is a novel and significant advance. The authors also present a compelling set of demonstrations for their model, showing how the model with their enhanced constraints can capture biology that the existing model cannot.
Overall, this is an extremely well-written manuscript, with a very clearly and concisely described approach, and it represents a significant advance to the art.
We thank the reviewer for the kind words.
I have only two significant comments, and one minor comment. Significant comments: 1.) The authors have done a great job of showing a number of case-studies where their new technique captures biology that standard FBA approaches cannot. However, it would be useful if the authors also compared the enhanced predictive capabilities of their algorithm with other existing similar techniques (e.g. ME modeling, FBAwMC, RBA). Although it is certainly out of scope to actually apply the competing methods to each of the described case studies, it would be useful if the authors could comment on whether or not the existing competing methods would be expected to perform worse, the same, or better than GECKO. Without this discussion, it's difficult to fully evaluate GECKO vs these competing methods.

"GECKO is based on the FBAwMC approach but extended to limit each individual enzyme, thereby giving a physiologically constrained and thus more feasible solution. On the other hand, as GECKO uses inequalities instead of equalities, it is less constrained than RBA, thus relying less on the quality of the experimental data.
Finally, GECKO does not require a detailed description of protein synthesis, and therefore its implementation to model eukaryal organisms is less demanding compared to the ME-modeling strategy. Furthermore, the resulting models have the same structure as any GEM, such that it can be used for any constrained-based analysis method (e.g. FBA, FVA, random sampling, etc.); and it can do so in similar computational times compared to purely metabolic models, further differentiating them from ME-models, which require larger computational resources." 2.) Models commonly map genes to metabolic reactions, while the new constraints introduced by the authors relate to proteins. In prokaryotes, there is typically a 1-1 relationship between genes and proteins, but in eukaryotes, this is not the case. Due to splice variants, an individual gene may map to several different proteins, likely with different cat values. Can the authors comment on this? Is their model mapped to protein IDs or genes? Is it expected that this might impact results? How might this impact the proteomics-based analysis, if at all?
We agree that splice variants could eventually impact results, however in yeast the amount of splice variants is very low. In particular, no splice variants are reported in Swissprot for any of the genes in the Yeast7 model, therefore our model uses always the correct match gene-protein. The reviewer's observation is nonetheless a very important consideration if this method is to be implemented in other organisms with a higher frequency of splice variants and we therefore added a comment about this in the discussion (5 th paragraph), as such:

"[…]Special care should be taken to, for instance, distinguish how kinetics vary among different isoforms of the same gene, in the case of eukaryal organisms that exhibit splice variants. It is worthy to mention here that no splice variants are reported for any of the genes in the Yeast7 model."
Minor comments: 1.) It could be I missed it somewhere in the manuscript, but it appears as though the authors never indicate how many of model reactions they were able to find kcat values for in BRENDA. Did kcat values exist for all reactions? If not, what kcat value was used for reactions where no measured value could be found? How many kcat values were not exact matches but exploited the "flexible" matching to brenda mentioned in the methods? Are the measured kcat values all collected for similar/identical enzymes?
Overall our method extracted 3249 values from BRENDA, from which more than 90% come from using at most 1 wild card, and more than 50% are values from S. cerevisiae. Further details can be found in supplementary  table S3 (section 4.1 in the supplementary material).

Reviewer #2:
Overall, the formulation presented in this is elegant and represents a very good contribution to the field. Moreover, the model provided is also a relevant tool for the community.
However, the full formulation of the method with individual enzyme levels has not been sufficiently exploited and the result shown of decrease flux variability seems somewhat obvious. Although the impact of decreasing flux variability in metabolic engineering design has been explored, I would expect a wider variety of analysis here.
The results shown for the more general constraint of total enzyme mass are somewhat more consistent and span a variety of effects. However, novelty in the approach here is somewhat more limited compared with existing methods, models and concepts. Thus, I believe the authors need to expand the results section for the full formulation of the method such that this paper can bring the expected level of contribution for MSB.
We thank the reviewer for his very good suggestion. We have included as additional analysis in section 2.4: 1 (table S5, figure  S12).

A study that shows how flux variability reduction in different pathways relates to total enzyme usage (figure 5C).
Moreover, in section 2.4 it is essential to know if the experiments with Yeast 6 have been performed with pFBA or FBA. Given the high variability found, I suspect they were performed with FBA and, given the formulation of GECKO, inspired in pFBA, I believe this is not a fair comparison In the comparison made of FVA results in sections 2.3.1 and 2.4, both the purely metabolic model and the enzyme-constrained model were tested using the same procedure: 1) Exchange rates for glucose, growth, oxygen, carbon dioxide, ethanol, glycerol, acetate and pyruvate were fixed at the values predicted by the enzyme-constrained model when minimizing for glucose at a fixed dilution rate, in order to compare the internal flux distribution at the same physiological conditions.

2) Each reaction in the model was first minimized and then maximized to find the flux variability. When doing so, any corresponding reversible reaction was blocked.
Some minor concerns: Introduction Sentence "However, when considering the production of a metabolite of interest, these models typically make the assumption that the uptake rate of the carbon source (e.g., glucose) limits production. This may be an oversimplification because experimental yields are usually considerably lower than the maximum theoretical yields5" Although the first sentence is correct, it does not imply the second observation. Usually in Metabolic Engineering design projects the maximum theoretical yield is not assumed.
We agree with the reviewer; this sentence has been therefore rewritten (section 1, 1 st paragraph), as such:

"[…] This may be an oversimplification, as metabolic fluxes are limited by their corresponding enzyme levels. However, this cannot be directly tested in traditional GEMs because they do not allow for connecting enzyme concentrations to metabolic fluxes.."
In terms of the present formulation, the authors need to refer to the paper Machado et al 2016 PLOS Comput Biol given the obvious similarities (although overall the GECKO method brings a sufficient level of innovation) We have now referenced this paper as suggested (section 2.1, 2 nd paragraph).

Results
Overall, sections 222 and 223 are of moderate interest but I feel some observations would need to be discussed. As this would probably expand this section, it might make sense to put part of it as supplementary material. As an example: Why proteins in complexes and promiscuous are faster and smaller (for complexes)? Also, the network properties also require further discussion We think that additional discussion regarding why different types of enzymes are faster and/or smaller would be mostly speculative, and therefore we decided not to address it. Regarding the network properties, even though one could analyze even further the difference between both networks, we believe it would be peripheral and would draw attention away from our main message.
Regarding data in Figure 3E -I miss a comparison with predictions made with nominal maintenance

We have included this analysis in supplementary material (figure S10) and referred to it in the manuscript (section 2.3.2).
Growth rates (233) For comparison purposes, the authors either use their method with random kcat values or use normal FBA simulations using the uptake rates given by the GECKO simulation. How would the original model behave in FBA in defined medium if all input fluxes are experimentally given (as this is the standard simulation method)? Although for complex medium this might be difficult to get, it is usually obtained in experiments using defined media at least for the carbon source uptake rate. A related question: how do the uptake rates obtained with the new model / formulation compare with the ones observed experimentally?
The studies from which the experimental data was taken in this section 2,3 are from shake-flask cultivations and do not report the substrate uptake rates, therefore we can only compare the specific growth rates and cannot perform the analysis the reviewer suggests. An exception is one batch performed in a bioreactor under aerobic conditions on glucose 3 , for which an average biomass yield of 0.12 g/g is reported, which for the specific growth rate reported of 0.4 h -1 corresponds to a specific glucose uptake rate of 19.0 mmol/gDWh. This is in very good agreement with the enzyme-constrained model, which predicts a value of 17.9 mmol/gDWh. Furthermore, we have already shown that both specific uptake and production rates are correctly simulated by our model in chemostats, both under aerobic (figure 3A, 3E and S11) and anaerobic (figure 3D) conditions. Overall, we therefore believe that our enzyme-constrained model correctly predicts specific substrate uptake rates under batch conditions.
Coming back to the first question, if the predicted specific uptake rate by the enzyme-constrained model is a good proxy for the experimental value, then the simulations we show in figure 4A show how the metabolic model would perform with experimental uptake rates, given that the model was limited with all specific uptake rates predicted by the enzyme-constrained model in each condition. We can therefore infer that the purely metabolic model will over-predict growth if we assign experimental uptake rates, as said in the manuscript.
Also in this section, flux values could have been compared with experimental 13C data, for example for one fermentable and one non-fermentable carbon source We have included a detailed comparison of both Yeast7 and ecYeast7 to experimental 13C chemostat data in section 2.4 (table S5 and figure S12).
Section 2.5 "Given their unconstrained nature, GEMs tend to overestimate biological performance such as knockout growth and/or production of a specific metabolite of interest." I believe there is a confusion with model and simulation. In this case, other simulation tools (non FBA) such as MOMA or ROOM provide less overestimates for biomass performance.
We agree with the reviewer and hence we have changed the wording of the sentence (section 2.5, 1 st paragraph), as such:

"Constrained-based modeling techniques such as FBA tend to overestimate biological performance under perturbed conditions, e.g. knockout growth and/or production of a specific metabolite of interest."
2nd Editorial Decision 14 June 2017 Thank you again for submitting your work to Molecular Systems Biology. I greatly apologize for the delay in getting back to you. Unfortunately one of the reviewers failed to respond to our multiple reminders. Rather than delaying further the process even further, I prefer to make a decision now. As you will see, reviewer #1 is now fully supportive and I am pleased to inform you that we will be able to accept your paper for publication pending the following minor amendments: -Appendix Figures S1-S15 and Appendix Tables should be called out from the main text as "Appendix Figure S1", "Appendix Figure S2", "Appendix Thank you for submitting this paper to Molecular Systems Biology.
Reviewer #1: The authors have responded well to all significant comments in my previous review. Overall, I find this paper to be suitable for publication.
2nd Revision -authors' response 14 June 2017 The authors made the requested changes and submitted the final version of their manuscript. 3. Were any steps taken to minimize the effects of subjective bias when allocating animals/samples to treatment (e.g. randomization procedure)? If yes, please describe.
For animal studies, include a statement about randomization even if no randomization was used.
4.a. Were any steps taken to minimize the effects of subjective bias during group allocation or/and when assessing results (e.g. blinding of the investigator)? If yes please describe. Do the data meet the assumptions of the tests (e.g., normal distribution)? Describe any methods used to assess it.
Is there an estimate of variation within each group of data?
Is the variance similar between the groups that are being statistically compared?

Captions
The data shown in figures should satisfy the following conditions: Source Data should be included to report the data underlying graphs. Please follow the guidelines set out in the author ship guidelines on Data Presentation. a statement of how many times the experiment shown was independently replicated in the laboratory.
Any descriptions too long for the figure legend should be included in the methods section and/or with the source data.
Please ensure that the answers to the following questions are reported in the manuscript itself. We encourage you to include a specific subsection in the methods section for statistics, reagents, animal models and human subjects.
In the pink boxes below, provide the page number(s) of the manuscript draft or figure legend(s) where the information can be located. Every question should be answered. If the question is not relevant to your research, please write NA (non applicable).

B--Statistics and general methods
the assay(s) and method(s) used to carry out the reported observations and measurements an explicit mention of the biological and chemical entity(ies) that are being measured. an explicit mention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner. the exact sample size (n) for each experimental group/condition, given as a number, not a range; a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).

Reporting Checklist For Life Sciences Articles (Rev. July 2015)
This checklist is used to ensure good reporting standards and to improve the reproducibility of published results. These guidelines are consistent with the Principles and Guidelines for Reporting Preclinical Research issued by the NIH in 2014. Please follow the journal's authorship guidelines in preparing your manuscript.