CFSA: Comparative flux sampling analysis as a guide for strain design

Genome-scale metabolic models of microbial metabolism have extensively been used to guide the design of microbial cell factories, still, many of the available strain design algorithms often fail to produce a reduced list of targets for improved performance that can be implemented and validated in a step-wise manner. We present Comparative Flux Sampling Analysis (CFSA), a strain design method based on the extensive comparison of complete metabolic spaces corresponding to maximal or near-maximal growth and production phenotypes. The comparison is complemented by statistical analysis to identify reactions with altered flux that are suggested as targets for genetic interventions including up-regulations, down-regulations and gene deletions. We applied CFSA to the production of lipids by Cutaneotrichosporon oleaginosus and naringenin by Saccharomyces cerevisiae identifying engineering targets in agreement with previous studies as well as new interventions. CFSA is an easy-to-use, robust method that suggests potential metabolic engineering targets for growth-uncoupled production that can be applied to the design of microbial cell factories.


Introduction
Microbial cell factories (MCF) are microorganisms engineered for the production of bio-molecules that thrive on renewable carbon sources.A broad range of bio-molecules can be produced using MCF from nonedible feedstocks such as recalcitrant biomass or industrial waste streams thereby providing sustainable replacements for production systems based on fossil fuels (Lee et al., 2019).MCF design requires the choice of an appropriate host strain (chassis), and selection of a suitable available pathway, the discovery of new pathways, or the design of synthetic pathways for new-to-nature compounds.Still, industrial feasibility requires extensive engineering to improve MCF performance (Cho et al., 2022).
GEnome-scale Metabolic models (GEM) comprehensively represent cellular metabolism and allow the simulation of metabolic fluxes and the prediction of cellular phenotypes.They have largely been used to identify metabolic engineering targets to optimize pathway performance (Fang et al., 2020).Machado et al. (Machado and Herrgård, 2015) classify strain design algorithms relying on GEMs in two main groups: methods based on the analysis of elementary flux modes (EFM), and those based on the optimization of an objective function.
EFMs are minimal sets of reactions that can jointly operate at steadystate such that all steady-state solutions can be described as a combination of EFMs (Trinh et al., 2009).They provide an unbiased framework to explore the metabolic space but have limited scaling potential and applicability to larger models.Instead, related approaches, based on minimal cut sets (MCS) such as minimal metabolic functionality (MMF) and FluxDesign, scale better to genome-scale (Machado and Herrgård, 2015;von Kamp and Klamt, 2014).
Optimization-based approaches rely on the simulation of metabolic fluxes of wild type and/or mutant strains using an objective function.Many of these methods use Flux Balance Analysis (FBA) to calculate fluxes and therefore only explore one of the multiple flux distributions that can lead to the optimal objective ignoring the rest.In this way, OptKnock (Burgard et al., 2003) and derived algorithms such as RobustKnock (Tepper and Shlomi, 2010), OptGene (Patil et al., 2005), and OptCouple (Jensen et al., 2019), aim at identifying gene knock-outs to couple the production of the compound of interest to the production of biomass using growth as the objective function.Other methods such as OptForce (Ranganathan et al., 2010) and OptDesign (Jiang et al., 2022) are used to predict modulation of gene expression, including up and down-regulations.They compare simulated fluxes of growth and production phenotypes and minimize the number of interventions required for the overproduction phenotype.Although flux variability analysis (FVA) might be used to explore feasible flux ranges and constrain the solution space, comparisons among phenotypes are based on non-unique FBA solutions.Although FVA can find the ranges of each metabolic flux at the optimum, it considers reactions individually.Therefore, within the FVA flux bounds of multiple reactions, there can be flux sets that together result in an infeasible or sub-optimal solution (Ebrahim et al., 2013).Alternatively, flux sampling allows the exploration of the full space of feasible flux distribution of all reactions given a set of constraints on the metabolic model (Herrmann et al., 2019).Contrarily to FBA, flux sampling does not require the selection of an objective function, which biases FBA predictions.Besides, in contrast to FVA, flux sampling results in sets of feasible solutions considering all model's reactions simultaneously, which allows a better exploration of the design space.Advantages of flux sampling have been exploited to evaluate metabolic flux differences between different conditions (Ravi and Gunawan, 2021), and to identify flux profiles compatible with fluxomic data (Backman et al., 2023).Although the latter application can additionally be used for the prediction of gene knock-outs, a strain design method based on flux sampling that does not require a priori experimental data is not yet available.
The described strain design algorithms focus on the identification of growth-coupling strategies, where production becomes a requirement for growth (Burgard et al., 2003;Tepper and Shlomi, 2010;Patil et al., 2005;Jensen et al., 2019).This strategy is suitable for experimental implementation and further optimization using adaptive laboratory evolution but requires multiple simultaneous interventions.Alternatively, strains with growth-uncoupled production can be used in two-stage fermentation processes where the growth and production phases are sequential.This strategy can alleviate metabolic stress and improve productivity (Lo et al., 2016;Raj et al., 2020).
We present Comparative Flux Space Analysis (CFSA), a modelguided strain design approach based on extensive sampling of the feasible solution space in alternative scenarios.Growth and production phenotypes are simulated and compared, also with a growth-limited scenario, which serves as a negative control for down-regulation targets.Flux distributions are statistically compared resulting on the identification of potential down-regulations, knock-outs and overexpressions targets leading to growth-uncoupled increased production.As a proof of concept, we use CFSA to identify metabolic engineering targets for lipid production by Cutaneotrichosporon oleaginosus and naringenin production by Saccharomyces cerevisiae and compare them with available data.

Comparative flux sampling analysis (CFSA)
The workflow used by CFSA is presented in Fig. 1 and described in detail below.The code of the algorithm and examples of its use are available in GitLab.

Flux sampling
To implement CFSA a GEM of the desired organism including the production pathway of choice is required.As the first step, media conditions (e.g.substrate uptake rates, aerobic/anaerobic growth) are specified.Model reactions are grouped in seven categories to facilitate later filtering: required reactions including the growth and maintenance reactions; not biological reactions including boundary, exchange, sink and demand reactions; blocked reactions (unable to carry flux); reactions without associated genes; essential reactions; reactions containing essential genes and transport reactions.
Reaction fluxes are sampled from the metabolic solution space in three scenarios: growth, slow growth, and production (Fig. 1).In the growth and production scenarios, the optimality parameter ensures that sampled flux distributions result in at least a specified fraction of the optimal growth or production predicted by FBA.This is achieved by constraining the lower bound of the biomass or product exchange reactions.In the slow growth scenario, the maximum growth rate compatible with the specified minimal production rate is calculated and used as an upper bound for the biomass synthesis reaction.To limit the solution space a parsimonious FBA approach (Lewis et al., 2010) is implemented by introducing an additional constraint to limit the total sum of fluxes to the minimum value compatible with maximal growth given by the flux fraction parameter.This extra constraint is applied in the production and slow growth scenario simulations to limit unrealistic futile cycles.
The Optimal Gaussian Process (OptGP) sampler (Megchelenbrink et al., 2014) implemented in cobrapy (Ebrahim et al., 2013) is used to model the distribution of the target space and iteratively sample from this distribution.A thinning parameter is used to reduce the correlation between samples.Invalid samples (i.e.those that do not meet constraints specific to each scenario) are discarded.Last, the Geweke diagnostic is used to calculate the chain convergence for each process, and samples corresponding to reactions whose distributions have not converged are discarded (Cowles and Carlin, 1996).

Fig. 1.
Comparative Flux Sampling Analysis (CFSA).CFSA requires a genome scale metabolic model (GEM) as input.In the "Flux Sampling" module samples are taken in three different scenarios: growth, production and slow growth.This module provides two PDF files with distribution graphs for down-regulation (KD) and over-expression (OV) targets and an Excel file with the sampling results.The sampling results are used as input in the "Target filtering" module.Here, the user can filter the metabolic engineering targets based on a pre-defined set of parameters or additional, user-defined filters.This generates the filtered results Excel file including all the metabolic engineering targets found.Intermediate files are shown in light grey boxes, user-friendly files are shown in dark grey boxes.R.P. van Rosmalen et al.

Filtering of metabolic targets
For each reaction, the two-sample Kolmogorov-Smirnov (KS) test is used to compare samples from different scenarios and determine if they belong to the same continuous distribution.Potential targets are selected if, for a specific reaction, distributions differ in the scenarios based on KS-statistics and p-values corrected for multiple testing using Bonferroni.The p-value and KS cut-offs can be adjusted by the user in the Target filtering module (Fig. 1).Besides, reactions whose fluxes correlate with fluxes through the biomass synthesis reaction, or that are not associated to genes (through a gene-protein-reaction association) are discarded as potential targets.Only reactions whose change in flux between the growth and production scenario is bigger than a user-specified threshold are considered to be suitable targets.CFSA allows the comparison of fluxes based on mean absolute changes, mean relative changes and mean fold changes.Similarly, targets can be filtered based on the standard deviation of the samples taken in the production scenario.
Potential targets are then divided into over-expression or downregulation targets depending on whether the mean fold change comparing growth and production scenarios is above or below 1.To account for reversible reactions, absolute flux changes are used to compute the mean fold change.Knock-down targets that correspond to non-essential genes are classified as possible knock-out targets.Reactions are clustered based on the correlation of the absolute fluxes between samples in order to identify redundant targets (i.e.belonging to the same metabolic pathway).

Selected applications 2.2.1. Production of lipids by C. oleaginosus
The iNP636_Coleaginosus_ATCC20509 genome-scale metabolic model was manually curated to provide all fatty acid elongation reactions (Pham et al., 2021).In total 7 reactions were added, 1 of them in the cytoplasm and 6 of them in the mitochondria, and all reactions were associated with the corresponding genes.Details of the model curation can be found on Gitlab.Glycerol and urea were used as the carbon and nitrogen sources respectively and nitrogen-depleted biomass composition was assumed.CFSA was used (optimality = 0.90, flux fraction = 1.25, KS1 --KS2 ≥ 0.75, mean absolute change ≥0.01, standard deviation in production ≤50) using the lipid synthesis reaction (lipid_synthesis) as a target for the production scenario.The feasibility of selected targets was evaluated based on previous studies.

Production of naringenin by S. cerevisiae
The Yeast8 genome-scale metabolic model of S. cerevisiae (Lu et al., 2019) was modified to include the naringenin production pathway (Table 1) and glucose was the selected carbon source.CFSA was used (optimality = 0.90, flux fraction = 1.25, KS1 --KS2 ≥ 0.75, mean absolute change ≥0.01, standard deviation in production ≤50) with the naringenin exchange reaction (EX_NAR) as an objective for the production scenario.Two proteomic datasets representative of S. cerevisiae aerobic growth on glucose during shake flask and chemostat fermentations were used as additional filters to restrict down-regulation targets to detected proteins (Costenoble et al., 2011;Lahtvee et al., 2017).

Comparative flux sampling analysis (CFSA)
In CFSA flux sampling is used to explore the size and shape of the convex flux polytope that defines the solution space in different scenarios (Wiback et al., 2004).However, the obtained flux distributions should not directly be understood as probabilities (Wiback et al., 2004).CFSA was implemented in Python (v3.7) using the cobrapy (v0.26.2) toolbox.Statistical analysis methods based on the Kolmogorov-Smirnov test were implemented using scipy stats package.The CFSA output consists of a first excel file with sampling results for all model reactions that is used as input for filtering.After filtering (based on user defined parameters), a new excel file containing the filtered results is generated (Fig. 1).This file contains the suggested reaction targets and their associated genes as well as possible off-targets caused by multifunctional enzymes.It provides a summary of the sampling results including the mean fluxes in the growth and production scenario as well as the absolute flux change and the reaction equation.Additionally, distribution graphs for suggested targets are generated and can be checked before experimental implementation.The complete files for both case studies are available in GitLab.

Distribution graphs
A distribution graph can be generated for each reaction in the model.It shows the distribution of sampled fluxes in each scenario: growth, production, and slow growth (i.e. in how many samples a reaction had a specific flux).Genes are classified as over-expression targets when the absolute mean flux through the corresponding reaction is higher in the production scenario compared to the growth scenario and the flux distributions do not overlap (Fig. 2A and B).Similarly, genes are classified as down-regulation targets when the absolute flux through the corresponding reactions is lower in the production scenario compared to the growth scenario and distributions do not overlap (Fig. 2C and D).The most extreme case of a down-regulation, a knock-out, is obtained when, for a down-regulation target, the flux in the production scenario is zero and the gene is not classified as essential (Fig. 2E and F).
The slow growth scenario is used to reduce the number of false positive down-regulation targets.Growth and production are competing objectives and often low fluxes obtained in the production scenario are not related to increased production but to a decreased growth rate (e.g.fluxes through reactions for biomass components).In order to avoid the identification of these genes as down-regulation targets, reactions where the production and slow growth distributions overlap are considered false positives (Fig. 2G).Distribution graphs also show the allowed variability of a reaction flux.Reactions with sharp distributions require a specific flux to obtain high production.Conversely, reactions with broad distributions are allowed to carry different fluxes without impacting production and are hence not suitable metabolic engineering targets.
Although targets are automatically filtered based on the overlap between flux distributions in different scenarios, the mean flux, and the range of the distribution during production, visual examination of distribution graphs is recommended as the final filter before experimental implementation.This allows the identification of reactions with nonnormal distributions as well as reactions with changed directionality in the different scenarios.

Effect of sampling parameters on target identification
Samples are taken in three scenarios: growth, slow growth and production.The user might choose an optimality constraint that determines the minimum growth or production in the growth and production scenarios respectively.The optimality parameter can take values from 0 to 1, where 0 indicates that flux distributions resulting in zero growth or production are allowed, and 1 indicates that only flux distributions with maximum growth or production are allowed.Increasing the optimality parameter tightens the flux constraints, reducing the feasible solution space and decreasing the number of overexpression targets (Fig. 3, Sup.Fig. 5).When the optimality parameter is increased, the minimum allowed production increases which results in sharper distributions an increased number of down-regulation targets (Fig. 3, Sup.Fig. 5).Reducing the solution space with the optimality parameter ensures that only relevant phenotypes are captured and a value of 0.9 is recommended as default.This optimality value ensures high production while allowing the sampling of sub-optimal phenotypes which increases the robustness of the predictions.
Similarly, a flux faction parameter is used to limit the total sum of fluxes in the production scenario based on the total sum of fluxes in the growth scenario.This parameter can take values equal of bigger than 1, so higher flux fractions increase the available total flux and, therefore, enlarge the solution space.Lastiri-Pancardo et al. show that the fraction of unused proteome available for the expression of heterologous pathways is limited and influences production (Lastiri-Pancardo et al., 2020).However, estimating this fraction is difficult and depends on the growth conditions (Elsemman et al., 2022).Reducing the solution space with the flux fraction parameter as a proxy for proteome constraints reduces the risk of identifying unrealistic loops or excessively long pathways as engineering targets.We show that increasing the flux fraction parameter widens the solution space, increasing the number of down-regulation targets.At the same time, increasing the solution space might results in broader distributions that reduces mean flux differences between growth and production scenarios, reducing the number of possible over-expressions (Fig. 3).We tested flux fraction values in the 1 to 1.5 range and found 1.25 to be a suitable value for the case studies here presented, although it can be adapted for other applications.
Changing these parameters did not affect the sampling run time which is determined by the model size (94 ± 1 min for S. cerevisiae model and 46 ± 1 min for C. oleaginosus model on an Intel(R) Xeon(R) CPU E5-2650 v4 @2.2 GHz).

Effect of filtering parameters on target identification
Samples can be evaluated based on the KS test, the mean flux in the different conditions, and the variability of the mean flux in the production conditions (its standard deviation).Here, the effect of these filtering parameters was evaluated using the recommended sampling parameters (optimality = 90% and flux fraction = 1.25).
KS-test identifies whether samples belong to the same distributions (p-value) and the overlap between distributions (KS-statistic).Due to the large number of samples, the test is overpowered and p-values do not constitute a good filtering criteria even if corrections for multiple testing such as Bonferroni are applied.The KS-statistics determines the overlap between distributions, where a KS value of zero indicates complete overlap.This parameter largely affects the number of selected targets.KS1 indicates the overlap between the growth and production scenarios and ensures significantly different flux distributions among these conditions.Therefore, increasing KS1 reduces the number of overexpression and down-regulation targets (Fig. 3).KS2 indicates the overlap between the slow growth and production scenarios.This condition is used to distinguish reactions in which flux decreases due to the decreased growth in the production scenario instead of the increased production.Therefore, increasing KS2 reduces the number of downregulation targets.As default, the use of 0.75 for KS1 and KS2 is recommended but this value can be tuned based on the distribution of KS values obtained (Fig. 1).
The absolute flux change between growth and production scenarios is used to rank the targets and can be used as an additional filter (Fig. 3).Targeting reactions with considerable flux changes when growth or production are maximized, increases the chance of significant phenotype changes in vivo.Contrarily, low flow changes are unlikely to be achieved adjusting gene expression.A value of 0.001 mmol/gDW/h is used as default.Although we recommend the use of mean absolute change to favour reactions with larger fluxes, CFSA allows alternative filtering based on mean fold changes and mean relative changes of reaction fluxes.
The standard deviation of the flux distributions in the production scenario is used as an additional filter, since reactions with broad distributions are not considered relevant targets.Higher standard deviations indicate that production is not affected by the flux of the studied reaction.Decreasing the maximum allowed standard deviation therefore reduces the number of suggested targets for up and downregulation (Fig. 3).Considering that reactions with high mean fluxes also have a higher standard deviation, the default value of this parameter is set to 50.

Case studies
The number of possible metabolic-engineering targets obtained using CFSA for lipid production in C. oleaginosus and naringenin production in S. cerevisiae is presented in Table 2.These values were obtained using optimality = 0.90, flux fraction = 1.25, KS1 --KS2 ≥ 0.75, mean absolute change ≥0.01 and standard deviation in production ≤50.The complete list of target reactions can be found in GitLab.The sections below elaborate on some of the targets found.

Production of lipids by C. oleaginosus
C. oleaginosus is an oleaginous yeast able to accumulate lipids above 40% (w/w) of its biomass when growing on a nitrogen-limited cultivation medium (Duman-Özdamar et al., 2022).The composition of the produced fatty acids, stored as triacylglycerols (TAGs) within lipid bodies in the cell, is comparable to commonly used plant-derived oils.
Therefore it has been flagged as an auspicious MCF for sustainable lipid production at an industrial scale (Zhang et al., 2011;Di Fidio et al., 2021).The lipid accumulation initiates with the transport of citrate from the mitochondria to the cytosol where it is cleaved into acetyl-CoA by ATP citrate lyase (ACL).Acetyl-CoA is converted into malonyl-CoA by acetyl-CoA carboxylase (ACC) leading to the activation of the lipid synthesis and elongation pathways (Fig. 4 A) (Bracharz et al., 2017;Görner et al., 2016).
CFSA was applied to investigate metabolic engineering strategies for enhanced lipid production in C. oleaginosus.As a result, we obtained 1 candidate reaction for down-regulation and 25 candidate reactions (belonging to 11 groups) for over-expression.The complete list of reactions is available in GitLab.The only down-regulation target found corresponds to ATP diphosphohydrolase that catalyzes the conversion of ATP to AMP and reflects the high energy requirements of lipid production compared to growth.
As expected, reactions from the fatty acid synthesis pathway (fatty acyl-acp synthase (ACPS) and stearoyl-CoA desaturase (SCD)) were suggested as over-expression targets (Fig. 4A).Besides, key reactions ACL and ACC, which over-expression has improved lipid accumulation in other oleaginous yeast, were also predicted as targets (Zhang et al., 2014;Yan et al., 2020).Additionally, pyruvate dehydrogenase (PDH), which connects the glycolytic pathway to the TCA cycle, and citrate synthase (CS), which synthesizes citrate from acetyl-CoA and oxaloacetate, were predicted as up-regulation targets to improve cytoplasmic citrate supply for lipid synthesis.However, CFSA did not predict down-regulation targets from the β-oxidation pathway that are commonly used to increase the availability of acetyl-CoA for fatty acid synthesis (Madzak, 2021;Blazeck et al., 2013).
Along with these experimentally validated targets, unanticipated reactions from the mevalonate pathway e.g., hydroxymethylglutaryl CoA synthase (HMGS) were suggested as beneficial up-regulations by CFSA.Although these reactions consume acetyl-CoA, simulations that reduce HMGS flux result in a diminished lipid production (Sup Fig. 6).The complex relationship between the mevalonate pathway and lipid synthesis, intrinsic to the GEM used, is captured by CFSA as a possible strategy to increase fatty acid production.
Finally, CFSA suggested over-expression targets from amino acid metabolism, including reactions involved in threonine synthesis (aspartate-semialdehyde dehydrogenase (ASAD), aspartate kinase (ASPK), threonine synthase (TS)), as well as glutamate and glutamine synthases.Threonine synthesis requires cytosolic oxaloacetate which is produced during citrate conversion to acetyl-CoA by ACL.Therefore, the over-expression of the theronine synthesis pathway could improve lipid production balancing the over-production of oxaloacetate.Furthermore, Kim et al. suggested that upregulation of threonine synthesis could potentially increase the fluxes through the TCA cycle which increases citrate supply for acetyl-CoA synthesis (Kim et al., 2019).

Production of naringenin by S. cerevisiae
S. cerevisiae is a model organism with in-depth genetic and physiological characterization, ample application in industrial bioprocesses and Generally Regarded as Safe (GRAS) status (Gottardi et al., 2017;Lian et al., 2018).Flavonoids such as naringenin are precursors of anthocyanins and have traditionally been used for fragrance, flavour, and colour in various food types (Nigam and Luke, 2016;Rodriguez et al., 2017).Naringenin is derived from the shikimate (SHK) pathway for aromatic amino acid biosynthesis, which starts with the condensation of erythrose-4-phosphate (E4P) and phosphoenolpyruvate (PEP) (Fig. 4 B).Production of this compound requires the heterologous expression of 4CL, CHS, CHI, and either PAL, C4H and CPR for production from phenylalanine, or TAL for production from tyrosine (Table 1).Moreover, naringenin production requires an appropriate supply of malonyl-CoA (Lyu et al., 2017).
We used CFSA to find metabolic engineering strategies that improve naringenin production.We obtained 41 reaction candidates for downregulation (belonging to 28 groups) and 50 targets for up-regulation (belonging to 35 groups).The complete list of reactions is available in GitLab.Two proteomic datasets covering 48.4% of the genes and 23.6% of the reactions in the model were used as additional filtering step to prioritize the 34 detected proteins as down-regulation targets.
As expected, reactions from the shikimate and naringenin production pathways were predicted as over-expression targets, with a preference for the tyrosine branch.Reactions belonging to glycolysis and nonoxidative pentose phosphate pathway were also predicted as suitable targets (Fig. 4B).Interestingly, CFSA suggested a decreased flux through the phosphofructokinase (PFK) and aldolase (FBA) reactions that result in the conversion of fructose-6-phosphate (F6P) to dihydroxyacetonephosphate (DAP) and glyceraldehyde-3-phosphate (GAP).Instead, it proposed F6P conversion to E4P and DAP via sedoheptulose-1,7- biphosphate (S1,7bP).In this way E4P and DAP, that can be later converted to PEP, are simultaneously produced, feeding the shikimate pathway with its two precursors.In the cells, PFK and FBA are responsible for both pathways and conversion of F6P to DAP, and GAP is favoured due to a higher affinity of FBA towards F1,6-bP.According to the simulations, expressing pfk genes with higher affinity towards S7P such as ppi-pfk from Clostridium thermosuccionogenes (Koendjbiharie et al., 2020) is suggested as a novel strategy to improve naringenin production.
CFSA also suggested experimentally validated strategies to increase the production of acetyl-CoA and malonyl-CoA syntheses such as downregulation of PDH and CIT2 and up-regulation of PDCS, ALD, ACS and ACC (Fig. 4B) (Chen et al., 2014).However, it fails to predict the down-regulation of fatty acid synthesis as a strategy to improve malonyl-CoA availability (Lian et al., 2018;Chen et al., 2014;Li et al., 2021).Similarly, CFSA does not suggest the deletion of Ehrlich pathway (EP) genes, involved in the degradation of intermediates that have shown to improve the production of other aromatic compounds (Rodriguez et al., 2017;Li et al., 2021;Milke et al., 2018;Gold et al., 2015).
The model suggests down-regulation of reactions involved in serine and branched chain amino acid synthesis (leucine, isoleucine and valine), likely to decrease the conversion of pyruvate into biomass components.Besides, down-regulation of reactions involving tetrahydrofolic acid (THF) and glycine are also suggested, probably to reduce chorismate consumption for THF formation (Fig. 4B).
Last, CFSA suggested the over-expression of adenylate kinase (ADK) which has been reported to increase the production of malonyl-CoA derived products (Ferreira et al., 2019).Similarly, although experimentally unrealistic, CFSA suggested down-regulation of ATP synthase and reactions from the electron transport chain, reflecting the lower energy requirements of production compared to growth.

Discussion
Inspired by other strain design algorithms that suggest metabolic engineering targets based on the comparison of predicted fluxes between wild type and production phenotypes (Burgard et al., 2003;Tepper and Shlomi, 2010;Patil et al., 2005;Jensen et al., 2019), we present CFSA, a tool based on flux sampling that, by analyzing the complete GEM solution space, can guide the design of MCF.While current tools focus on design approaches for growth-couple production, we present a first tool that allows the design of growth-uncoupled production strategies.Besides, as opposed to previous tools, CFSA designs are based on the complete exploration of the solution space achieved using flux sampling (Herrmann et al., 2019) instead of non-unique FBA solutions, which ensures the full inspection of cell metabolism.
We apply CFSA to improve the production of lipids by C. oleaginosus, an endogenous product in an emerging MCF, as well as the production of naringenin in S. cerevisiae, a heterologous product in an established MCF.In both cases, CFSA suggested experimentally validated targets including evident up-regulations belonging to the product synthesis pathway and distant targets that improve precursor availability.It also suggested new engineering strategies involving alternative PPP reactions for naringenin synthesis and threonine and mevalonate pathways gene over-expressions for lipid production.Although the lack of mechanistic understanding of some of the suggested targets could question their implementation, it also highlights the potential of modeldriven approaches to find non-obvious engineering strategies.
Regarding down-regulations, CFSA fails to predict reported successful targets from pathways involved in the degradation of production pathway intermediates (e.g.Ehrlich pathway genes deletion for naringenin production (Rodriguez et al., 2017;Li et al., 2021;Milke et al., 2018;Gold et al., 2015) or β-oxidation gene knock-outs for improved lipid production (Madzak, 2021;Blazeck et al., 2013)).This pitfall is shared with other GEM-based strain design approaches since, although active in vivo, these reactions remain inactive in GEM simulations.
In addition to the simulation of growth and production phenotypes, we include the simulation of slow growth.This scenario is used to differentiate between fluxes that change due to the simulated lowgrowth in the production scenario, from fluxes that are potentially related to increased production (Fig. 2).The simulation of this phenotype is used as negative control reducing false positive target predictions (Fig. 3).Other tools use FBA to maximize growth or production and use the obtained solution to compare reaction fluxes and identify engineering targets.Instead, we include the optimality parameter to ensure that not only the optimal flux profile is sampled, but less efficient behaviour is also tolerated leading to increased robustness.
CFSA is easy to implement and the filtering criteria used will result in a reduced list of potential targets for up-, down-regulations, and knockouts for subsequent inspection.Default parameters can easily be modified according to user needs and additional filtering criteria such as the use of proteomic data can be integrated into the workflow.Other strain design methods provide minimal intervention strategies that, although useful in theory, might not return the expected results in practice.For example, when entire pathways are predicted as up-regulation targets, GEM cannot identify limiting reactions as enzyme kinetics and regulation are not considered in constraint-based models (Machado and Herrgård, 2015).Instead, we provide a complete list of possible interventions and endow the user with additional information to make a decision based on the most feasible suggestions.
Disadvantages of CFSA include the difficulty of estimating the quantitative effect of the suggested manipulations, which are identified based on statistical testing.There is no guarantee that the effect size in flux correlates to the strength of the knock-down or over-expression required.Besides, targets are suggested as individual interventions and the effect of possible combinations of targets is not provided.Still, after a first round of CFSA, the algorithm can be re-applied using GEMs including desired interventions (i.e. with modified reaction bounds) to find potential complementary targets.
As with other methods based on GEMs, errors in the models can lead to unexpected outcomes.For example, loops in the metabolic network can lead to artificially high fluxes and thus false predictions.Often parsimonious flux balance analysis (Lewis et al., 2010) or loop-less flux balance analysis (Schellenberger et al., 2011;Desouki et al., 2015) is used to limit these errors, however, implementing these in flux sampling is non-trivial.We mimicked the parsimonious approach by adding a flux fraction parameter that constrains the total sum of fluxes in the production scenario based on the growing phenotype.However, this approach, while biologically reasonable for the reference condition at a high growth rate, does not necessarily apply to the production condition, as the assumption of minimal flux and thus protein usage does not hold for this artificial scenario.Instead, loop-less flux sampling approaches such as the loopless Artificially Centered Hit-and-Run on a Box algorithm (ll-ACHRB) (Saa and Nielsen, 2016) could be used to improve this process.Although we did not test the effect of different sampling algorithms in CFSA results, as additional sampling algorithms become available in cobrapy, they can easily be incorporated in the CFSA workflow (Ebrahim et al., 2013).
CFSA is the first strain design algorithm based on flux sampling that explores the whole solution space of a GEM and suggests metabolic engineering targets for growth-uncoupled production.Its robustness, simplicity, and flexibility make it ideal to complement and systematize MCF design.CFSA predictions, including non-obvious targets, can be sequentially tested using high-throughput approaches such as automated platforms and biosensor-aid screening accelerating and broadening the strain design process.

Fig. 2 .
Fig. 2. Example of flux distribution plots illustrating behaviors of candidate targets for metabolic engineering.A-B.Overexpression targets (r_0109 and r_1050 from Yeat8 as examples).C-D.Down-regulation targets (r_0471 and r_0503 from Yeat8 as examples).E-F.Knock-out targets (r_0961 and r_2093 from Yeat8 as examples).G. False positive (r_0938 from Yeat8 as example).In each panel, the x-axis represents possible flux values, and the y-axis represents the frequency of each flux value obtained when sampling the solution space, normalized to an area of one.Note that over-expression and down-regulation targets are obtained based on absolute flux values.

Fig. 3 .
Fig. 3. Effect of sampling and filtering parameters on the number of reported targets.The optimality parameter determines the minimum fraction of the optimum value allowed in the growth and production scenario.The flux fraction parameter determines a constraint on the total sum of fluxes on the production and slow growth scenarios compared to the growth scenario.The Kolmogorov-Smirnov (KS) statistic determines the overlap of distributions in the growth and production scenarios (KS1) or the slow growth and production scenarios (KS2).The absolute flux change between the growth and production scenario and the standard deviation of the distributions in the production scenario are shown.The effect of sampling parameters was assessed using loose values for the filtering parameters (KS1 --KS2 ≤ 1, absolute flux change ≥0, std production ≤1000).The effect of the filtering parameters was assessed on samples taken using the recommended sampling parameters (optimality = 90% and flux fraction = 1.25).Filtering parameters not under study were set to KS1 --KS2 ≤ 1, absolute flux change ≥0.std production ≤1000.

Table 2
Number of targets obtained in the C. oleaginosus and S. cerevisiae case studies.