A benchmark of optimization solvers for genome-scale metabolic modeling of organisms and communities

ABSTRACT Genome-scale metabolic modeling is a powerful framework for predicting metabolic phenotypes of any organism with an annotated genome. For two decades, this framework has been used for the rational design of microbial cell factories. In the last decade, the range of applications has exploded, and new frontiers have emerged, including the study of the gut microbiome and its health implications and the role of microbial communities in global ecosystems. However, all the critical steps in this framework, from model construction to simulation, require the use of powerful linear optimization solvers, with the choice often relying on commercial solvers for their well-known computational efficiency. In this work, I benchmark a total of six solvers (two commercial and four open source) and measure their performance to solve linear and mixed-integer linear problems of increasing complexity. Although commercial solvers are still the fastest, at least two open-source solvers show comparable performance. These results show that genome-scale metabolic modeling does not need to be hindered by commercial licensing schemes and can become a truly open science framework for solving urgent societal challenges. IMPORTANCE Modeling the metabolism of organisms and communities allows for computational exploration of their metabolic capabilities and testing their response to genetic and environmental perturbations. This holds the potential to address multiple societal issues related to human health and the environment. One of the current limitations is the use of commercial optimization solvers with restrictive licenses for academic and non-academic use. This work compares the performance of several commercial and open-source solvers to solve some of the most complex problems in the field. Benchmarking results show that, although commercial solvers are indeed faster, some of the open-source options can also efficiently tackle the hardest problems, showing great promise for the development of open science applications.

• Point-by-point responses to the issues raised by the reviewers in a file named "Response to Reviewers," NOT IN YOUR COVER LETTER.
• Upload a compare copy of the manuscript (without figures) as a "Marked-Up Manuscript" file.
• Each figure must be uploaded as a separate file, and any multipanel figures must be assembled into one file.• Manuscript: A .DOC version of the revised manuscript • Figures: Editable, high-resolution, individual figure files are required at revision, TIFF or EPS files are preferred ASM policy requires that data be available to the public upon online posting of the article, so please verify all links to sequence records, if present, and make sure that each number retrieves the full record of the data.If a new accession number is not linked or a link is broken, provide production staff with the correct URL for the record.If the accession numbers for new data are not publicly accessible before the expected online posting of the article, publication of your article may be delayed; please contact the ASM production staff immediately with the expected release date.
For complete guidelines on revision requirements, please see the journal Submission and Review Process requirements at https://journals.asm.org/journal/mSystems/submission-review-process.Submission of a paper that does not conform to mSystems guidelines will delay acceptance of your manuscript.
Please return the manuscript within 60 days; if you cannot complete the modification within this time period, please contact me.If you do not wish to modify the manuscript and prefer to submit it to another journal, please notify me of your decision immediately so that the manuscript may be formally withdrawn from consideration by mSystems.
Major Comments: 1.I am wondering how much the efficacy of commercial packages is due to the optimization algorithms used for solving LP and MILP problems.I think the value of this work can improve markedly given information and comparison on the specific optimization methods in different packages.2. The quality of the results, in addition of the computational efficacy, should be presented.In the Discussion mentioned that different packages produced the same optimal values.This result should be provided in the Result.3.Besides computational speed, I wonder if memory requirement ever become a (limiting) factor when scaling up model size.For example, I wonder if the unexpected trends observed for COIN and CPLEX have to do with the memory requirement and/or management.4. The author mentioned that a computer with multiple cores were used.Do all packages take advantage of these multiple cores? 5.If I understand correctly, the FBA considered here is the standard formulation.Providing the optimization problems explicitly in the Methods will give clarity.6.Last but not least, the omission of parsimonious FBA from the comparison is rather surprising, especially considering the author's 2014 PLoS Comp.Biology paper.

Reviewer #2 (Comments for the Author):
This thorough and well executed assessment of numerical solvers used in the context of constraint-based modeling is timely and extremely useful for the practitioner.I only have minor suggestions for improvement.One point considers the growth media used in the simulations, these should shortly be described.
In the absence of line numbers, I quote original text fragments prior to my remaining comments: 1. "creation of large model collections": consider referencing additionally the AGORA / AGORA2 model collections, as the human gut microbiome is mentioned multiple times as an area of application 2. "to address most societal issues": this appears to be quite an exaggeration, consider a more modest claim 3. "reconstruction (COBRA) methods": consider moving the parenthesis after "methods" to improve reading flow 4. "requires efficient linear optimization problems": consider replacing "problems" by "solvers" 5. "due to their known computational efficency": consider dropping "known" 6. "that can be a promising replacement for": consider replacing by "that might be competitive alternatives to"

Reviewer #1 (Comments for the Author):
In this work, the author performed a compara5ve analysis of computa5onal solver packages for linear programming (LP) problems with a par5cular focus on Flux Balance Analysis (FBA) of COBRA (constrained-based analysis and reconstruc5on) models.In total, six solvers were included in the comparison: two commercial packages-CPLEX and GUROBI and four non-commercial packages-GLPK, COIN, SCIP, and HiGHS.Two types of FBA were considered, the standard LP and a mixed integer LP (MILP).The results conclusively showed that the commercial packages, par5cularly GUROBI, are the most efficacious solvers for these types of problems.Among the non-commercial packages, HiGHS has a notable performance (efficacy and robustness), but s5ll is slower than the commercial packages.
As a prac55oner in the field, I found the work to be of high quality and well wriVen despite the unsurprising results.S5ll, the comparison and findings will be useful, especially the op5on of free, albeit slower, noncommercial packages.
I thank the reviewer for the posi5ve feedback and for all the construc5ve cri5cism.Please find below a response to the comments.
Major Comments: 1.I am wondering how much the efficacy of commercial packages is due to the op5miza5on algorithms used for solving LP and MILP problems.I think the value of this work can improve markedly given informa5on and comparison on the specific op5miza5on methods in different packages.
For solving LPs all solvers (except SCIP) offer the op5on to alternate between simplex (primal or dual) and the barrier algorithm (also known as the interior point method).I performed an addi5onal benchmark to compare the different algorithms for each solver and expanded the results and discussion.Regarding MILPs the scenario is a bit more complex.Each solver implements some varia5on of the branch-and-cut method with its own addi5onal heuris5cs.These heuris5cs are not described in detail, especially for the commercial solvers since they are part of their intellectual property for compe55ve advantage.
2. The quality of the results, in addi5on of the computa5onal efficacy, should be presented.In the Discussion men5oned that different packages produced the same op5mal values.This result should be provided in the Result.
I considered plo]ng this, but since the results are effec5vely the same across all solvers and replicated simula5ons, the result would be a single overlapping dot, which is not very informa5ve.Nonetheless, all the simula5on results are stored as CSV tables in the supplementary github repository, as well as the code that generated the results.Any user can therefore confirm and reproduce this observa5on.
3.Besides computa5onal speed, I wonder if memory requirement ever become a (limi5ng) factor when scaling up model size.For example, I wonder if the unexpected trends observed for COIN and CPLEX have to do with the memory requirement and/or management.
Thank you for the sugges5on.I have now also performed memory benchmarks (supp fig 2).It seems that memory is not an issue.Most execu5ons required a maximum of 200 -300 MB of memory, even for the largest models.These observa5ons have been added to the results sec5on.
4. The author men5oned that a computer with mul5ple cores were used.Do all packages take advantage of these mul5ple cores?It seems that only the commercial solvers use mul5ple cores.This point has been added to the discussion. 5.If I understand correctly, the FBA considered here is the standard formula5on.Providing the op5miza5on problems explicitly in the Methods will give clarity.
The equa5ons for the standard FBA formula5on (LP problem) and for compu5ng a minimal medium (MILP problem) are now provided in the methods sec5on.6.Last but not least, the omission of parsimonious FBA from the comparison is rather surprising, especially considering the author's 2014 PLoS Comp.Biology paper.I had ini5ally also performed pFBA simula5ons.I later removed them because the results weren't much more informa5ve than the FBA simula5ons alone.Nevertheless, I agree that having a comparison with a second method might be valuable.I have re-included the pFBA simula5ons in the paper (supp fig.1).

Reviewer #2 (Comments for the Author):
This thorough and well executed assessment of numerical solvers used in the context of constraint-based modeling is 5mely and extremely useful for the prac55oner.I only have minor sugges5ons for improvement.One point considers the growth media used in the simula5ons, these should shortly be described.I thank the reviewer for the very posi5ve comments and for the detailed proof-reading that has considerably improved the quality of the text.I have made all requested modifica5ons accordingly.I think that a point-bypoint response is not really appropriate in this case.
In the absence of line numbers, I quote original text fragments prior to my remaining comments: December 11, 2023 1st Revision -Editorial Decision Re: mSystems00833-23R1 (A benchmark of optimization solvers for genome-scale metabolic modeling of organisms and communities) Dear Prof. Daniel Machado: Your manuscript has been accepted, and I am forwarding it to the ASM production staff for publication.Your paper will first be checked to make sure all elements meet the technical requirements.ASM staff will contact you if anything needs to be revised before copyediting and production can begin.Otherwise, you will be notified when your proofs are ready to be viewed.Data Availability: ASM policy requires that data be available to the public upon online posting of the article, so please verify all links to sequence records, if present, and make sure that each number retrieves the full record of the data.If a new accession number is not linked or a link is broken, provide production staff with the correct URL for the record.If the accession numbers for new data are not publicly accessible before the expected online posting of the article, publication may be delayed; please contact ASM production staff immediately with the expected release date.
Publication Fees: For information on publication fees and which article types have charges, please visit our website.We have partnered with Copyright Clearance Center (CCC) to collect author charges.If fees apply to your paper, you will receive a message from no-reply@copyright.com with further instructions.For questions related to paying charges through RightsLink, please contact CCC at ASM_Support@copyright.com or toll free at +1-877-622-5543.CCC makes every attempt to respond to all emails within 24 hours.
ASM Membership: Corresponding authors may join or renew ASM membership to obtain discounts on publication fees.Need to upgrade your membership level?Please contact Customer Service at Service@asmusa.org.
PubMed Central: ASM deposits all mSystems articles in PubMed Central and international PubMed Central-like repositories immediately after publication.Thus, your article is automatically in compliance with the NIH access mandate.If your work was supported by a funding agency that has public access requirements like those of the NIH (e.g., the Wellcome Trust), you may post your article in a similar public access site, but we ask that you specify that the release date be no earlier than the date of publication on the mSystems website.

Embargo Policy:
A press release may be issued as soon as the manuscript is posted on the mSystems Latest Articles webpage.The corresponding author will receive an email with the subject line "ASM Journals Author Services Notification" when the article is available online.
Featured Image Submissions: If you would like to submit a potential Featured Image, please email a file and a short legend to mSystems@asmusa.org.Please note that we can only consider images that (i) the authors created or own and (ii) have not been previously published.By submitting, you agree that the image can be used under the same terms as the published article.File requirements: square dimensions (4" x 4"), 300 dpi resolution, RGB colorspace, TIF file format.
Author Video:: For mSystems research articles, you are welcome to submit a short author video for your recently accepted paper.Videos are normally 1 minute long and are a great opportunity for junior authors to get greater exposure.Importantly, this video will not hold up the publication of your paper and you can submit it at any time.

Details of the video are:
• Minimum resolution of 1280 x 720 • .movor .mp4video format • Provide video in the highest quality possible but do not exceed 1080p • Provide a still/profile picture that is 640 (w) x 720 (h) max • Provide the script that was used We recognize that the video files can become quite large, so to avoid quality loss ASM suggests sending the video file via https://www.wetransfer.com/.When you have a final version of the video and the still ready to share, please send it to mSystems staff at mSystems@asmusa.org.
Thank you for submitting your paper to mSystems.