Kinetic models of metabolism that consider alternative steady-state solutions of intracellular fluxes and concentrations

Large-scale kinetic models are used for designing, predicting, and understanding the metabolic responses of living cells. Kinetic models are particularly attractive for the biosynthesis of target molecules in cells as they are typically better than other types of models at capturing the complex cellular biochemistry. Using simpler stoichiometric models as scaffolds, kinetic models are built around a steady-state flux profile and a metabolite concentration vector that are typically determined via optimization. However, as the underlying optimization problem is underdetermined, even after incorporating available experimental omics data, one cannot uniquely determine the operational configuration in terms of metabolic fluxes and metabolite concentrations. As a result, some reactions can operate in either the forward or reverse direction while still agreeing with the observed physiology. Here, we analyze how the underlying uncertainty in intracellular fluxes and concentrations affects predictions of constructed kinetic models and their design in metabolic engineering and systems biology studies. To this end, we integrated the omics data of optimally grown Escherichia coli into a stoichiometric model and constructed populations of non-linear large-scale kinetic models of alternative steady-state solutions consistent with the physiology of the E. coli aerobic metabolism. We performed metabolic control analysis (MCA) on these models, highlighting that MCA-based metabolic engineering decisions are strongly affected by the selected steady state and appear to be more sensitive to concentration values rather than flux values. To incorporate this into future studies, we propose a workflow for moving towards more reliable and robust predictions that are consistent with all alternative steady-state solutions. This workflow can be applied to all kinetic models to improve the consistency and accuracy of their predictions. Additionally, we show that, irrespective of the alternative steady-state solution, increased activity of phosphofructokinase and decreased ATP maintenance requirements would improve cellular growth of optimally grown E. coli.


Introduction
Over the last decades, advances in genome editing technologies have allowed the redirection of carbon flow within the organism towards specialty products of interest and desired physiologies [1].
Identifying candidate enzymes is fundamental for genetic modifications that have seen applications in metabolic engineering, basic and applied biology, biotechnology and medical sciences [2][3][4].
Increasingly available high-throughput sequencing data has enabled the construction of stoichiometric genome-scale metabolic models (GEMs) that describe mathematically the balanced metabolic fluxes within an organism [5]. Metabolic models such as these GEMs have been extensively used to characterize overall network behavior of organisms, which can provide guidance about the genes that can be modified to improve a desired product biosynthesis. Improved guidance for metabolic engineering and basic biology will be achieved with kinetic models of the reactions/networks in GEMs.
The construction of a kinetic model of metabolism requires knowledge of steady states and/or dynamics of metabolic fluxes and metabolite concentrations that can be used to estimate the unknown kinetic parameters that describe these data. However, there are many sources of uncertainty in metabolic fluxes and metabolite concentrations that hamper the accurate estimation of kinetic parameters. Advances in C13 isotopomer techniques facilitated the measurement of fluxes across cellular reactions and promoted the development of metabolic flux analysis (MFA) [6]. One main uncertainty in fluxes is the flux directionality as reactions can be thermodynamically bidirectional [7]. Metabolomics and thermodynamics can be used as it is done in thermodynamicbased flux analysis (TFA) [7][8][9] to constrain the direction of some of these fluxes. But even when information about the directionality of all the reactions and fluxomics from labeling experiments are used, there is still a great uncertainty on exact estimation of fluxes as the degrees of freedom remain high, especially as we increase the size of the networks. The addition of constraints based on measured gene expression data [10,11] and enzymatic data [12] can reduce the degrees of freedom. However, the system remains underdetermined, resulting in multiple alternative steadystate flux distributions corresponding to the physiology under study. Different steady-state solutions could directly affect the predictions of kinetic models, leading towards very distinct conclusions and guidance for metabolic engineering.
Several promising methods exist for constructing kinetic models around representative steady states of metabolic fluxes and metabolite concentrations [13,14]. The Optimization and Risk Analysis of Complex Living Entities (ORACLE) workflow [15][16][17] and frameworks built around ensemble modeling [18,19] have made significant strides towards genome-scale kinetic modeling of metabolism. These methods generate populations of non-linear kinetic models around a selected reference steady state (RSS) that is chosen based on its ability to characterize the observed physiology. Methods commonly used for selecting a RSS include using the computed optimal solution to an objective function that defines physiological tasks [20], fitting the data from MFA [6], or performing principal component analysis (PCA) on a sampled solution space [15]. Once a RSS is established, kinetic models are constructed around it, which allows the study and prediction of cellular metabolic response to perturbations [21]. These populations of kinetic models can be studied using statistical procedures to identify target enzymes, sensitively analyze kinetic parameters, and design experiments [22,23].
There is no unique and evident approach for selecting a RSS for such an underdetermined system.
To our knowledge, the impact of alternative RSSs describing a physiology using the kinetic parameters and the outputs of these kinetic models have not been studied.
Hereby, we examine how uncertainty in intracellular flux solutions and metabolite concentrations influences the metabolic control analysis (MCA) of populations of non-linear kinetic models built around alternative steady states. We integrated physiological data from E. coli grown aerobically in a batch cultivation [24] into a reduced core model derived from the iJO1366 E. coli GEM [25][26][27] and found that the data were not sufficient to uniquely determine the steady state metabolic flux distribution as several reactions could operate in either the forward or reverse direction. These socalled bi-directional reactions result in the existence of multiple feasible flux directionality profiles (FDPs) that represent the same physiology, because in any FDP, reactions operate only in one direction [8]. We constructed populations of kinetic models for 4 selected FDPs to demonstrate how significantly MCA outputs and metabolic engineering decisions are affected.

Results and Discussion
The procedure for characterization and analysis of steady-state multiplicities arising from the underdetermined nature of the system is a constitutive part of the ORACLE workflow [15-17, 23, 28-30]. The workflow assists with more reliable and robust MCA-based metabolic engineering decisions that will enable the identification of study-specific target enzymes, independent of the steady state.
Various types of biological data are combined into a thermodynamically feasible stoichiometric model of a given physiology ( Figure 1). We follow this workflow to discuss our results. At first, we identify the bi-directional reactions and determine feasible flux directionality profiles (FDPs). We discuss how alternative FDPs affect the conclusions of kinetic models. We then consider how the flux values and the metabolite concentration levels within a FDP affect kinetic model predictions. The MCA outputs of the kinetic models are studied to systematically derive metabolic engineering decisions. For further information on the methodologies used, we refer the reader to the methods section of the manuscript.  The procedure   consists of several computational steps wherein the available data are integrated, the alternative solutions are identified, the populations of non-linear models are built, and the output variables are analyzed to make robust conclusions (for details see main text).

Multiplicity of flux directionality profiles
To derive a reduced E. coli metabolic model from the iJO1366 GEM [25], we used the redGEM and lumpGEM algorithms as they provide a systematic and modular way for reducing GEMs, whilst preserving growth and gene essentiality [26,27]. The obtained core stoichiometric model of the E.
coli metabolism consisted of 277 reactions and 160 metabolites distributed over the cytosol and the extracellular space (Methods). To constrain the model and derive alternative steady states, we integrated fluxomics and metabolomics data [24] (Supp 1) within the thermodynamic formulation of the stoichiometric model for aerobically grown E.coli. Hence, we set the glucose uptake to 7.54 mmol/gDW/h, the growth rate to 0.61 /h and, the excretions of acetate, formate and succinate to 3.5 mmol/gDW/h, 0.5 mmol/gDW/h and 10 -4 mmol/gDW/h, respectively ( Figure 2A). We made assumptions about reaction directionalities based on available literature [24,[31][32][33][34] (Methods).
Some of these reactions such as PGI and FUM are commonly considered as unidirectional. However, Rabinowitz and coworkers reported that these seven identified reactions are bi-directional in E.coli, yeast, and immortalized baby mouse kidney cells [36]. This suggests that, for previously uncharacterized physiologies and/or for reactions with no fluxomics data, we should consider all feasible reaction directionalities. This is a way of ensuring that we account for the flexibility of cellular metabolism. For simplicity and clarity of further discussion, we wanted to analyze four FDPs with the most distinct physiologies out of 25 feasible ones. We assumed that changing the directionality of reactions with the largest TVA flux range would result in the most distinct FDPs. PGI and FUM had the largest feasible TVA flux ranges from the seven bi-directional reactions. Hence, to generate these four distinct FDPs, we were changing the directionality of both PGI and FUM in either the forward or backward direction while keeping the directionalities fixed for the 5 remaining bidirectional reactions (Figure 2A). The directionality of these 5 bi-directional reactions (other than PGI and FUM) was determined as follows. We first defined and calculated the flux variability score for each of the 25 FDPs (Methods). A higher flux variability score suggests that the reactions of the FDP are on average more flexible and can operate in relatively wider flux ranges. We then took the directionalities of the remaining five bi-directional reactions from the FDP with the highest score. In this study, we assessed the model predictions and their implications on metabolic engineering decisions around these four FDPs. Nevertheless, different study-dependent criteria for selecting the FDPs could be devised.     Figure   S4). Based on a consistent magnitude across all the FDPs, PGM and PFK were the top two target enzymes that seemed to control the glucose uptake of optimally grown E. coli. TPI, PPC, NADTRHD, glucose 6-phosphate dehydrogenase (G6PDH2r), and 6-phosphogluconolactonase (PGL) control GLCptspp in at least one FDP but not across all. As for the control of cellular growth, the differences in thermodynamic equilibrium and kinetic coupling between the FDPs explain these results.
These observations emphasize the importance of considering enzyme kinetics and the existence of alternative steady states before making metabolic engineering conclusions based on kinetic models, especially considering that further similar observations were made for FCCs of other fluxes.
Generally, we noticed that the enzymes whose control remained unchanged across FDPs were found in the central carbon metabolism. Inconsistent enzyme control was observed in peripheral and transport reaction enzymes topologically further away from the central carbon pathways.

Study of uncertainty in FCCs.
To characterize the variability of cellular growth FCCs with respect to all central carbon enzymes (i.e., no transporter nor exporter) for the populations of 50,000 models (Figure 7), we studied the uncertainty across the four FDPs using PCA (Methods). The first two principal components (PCs) covered a majority of the variance, with 93%, 62%, 56%, and 69% for FDP1-4, respectively. For FDP2-4, at least seven, eight and six PCs, respectively, were required to account for more than 90% of the variance between the FCC populations. This suggests that the uncertainty in the cellular growth FCCs was considerably more distributed for FDP2-4 than for FDP1 that required only two PCs.
In PCA, each variable has a score on the PCs that are under consideration, which corresponds to its contribution to the variability described by the given PC. In FDP1, the growth FCCs with respect to the enzymes NADK and NADPPPS corresponded to the highest PC scores in terms of magnitude along the first PC, suggesting that most of the uncertainty comes from these ETC enzymes (Figure 7).
Their scores on both PCs were strongly opposed in terms of sign, suggesting that these FCCs anticorrelate. In fact, the cellular growth FCCs of NADK and NADP phosphatase (NADPPPS) had a -1.00 Pearson correlation coefficient, further indicating that they were exactly anti-correlated. On the other hand, enzymes PGM and RPM had very similar PC scores, and we note that the correlation coefficient of these FCCs was 0.91, indicating a near-perfect correlation.
Similarly to FDP1, we studied FDP2-4 to find underlying covariance patterns between the cellular growth FCCs (Figure 7). We noticed that certain trends were preserved between the FDPs as, for NADTRHD, PFK, and ATPM were the top candidates for improving cellular growth, as determined previously based on absolute means ( Figure 6). If we had to select one of these three enzymes for genetic engineering, we want it to be the one with the least uncertainty. We observed that PFK scores lower than NADTRHD and ATPM on the PCs across all the FDPs (Figure 7), demonstrating the least uncertainty and suggesting that it could be the most prominent target enzyme. A similar analysis could be performed for FCCs related to other reactions, such as glucose uptake (Supp 4, Figure S5). We conclude that to improve growth of aerobically grown E. coli, PFK would also be a top candidate enzyme to metabolically engineer, despite the uncertainties.
The uncertainty in the kinetic parameters and its impact on our studies remains difficult to quantify due to the underdetermined nature of this highly non-linear solution space but it could be further characterized by methods such as the one developed by Andreozzi et al [22]. For our study, when we next compare the effect of uncertainties stemming from the flux and the concentration steady-state solutions, we decided to fix the distributions of the sampled enzyme saturations to ones obtained from a previous RSS using beta distributions (Methods). This allowed us to keep the same level of uncertainty in all the kinetic parameters for our comparisons.

Impact of flux and concentration profiles
We next assume that we know the directionality of each reaction in our network but the system still G6PDH2r, may appear to be attractive targets based on some of the ESSs, but since this property is not in consensus agreement across all ESSs, they are less reliable targets. The top enzymes controlling cellular growth were sensitive to the ESSs, as just 47% of them were in agreement signwise with each other (Supp 4, Figure S6). However, we can still find reliable target enzymes, such as ENO, AKGDH, and glutamate dehydrogenase (GLUDy), mainly in the central carbon reactions that have a reasonable magnitude and consensus agreement.   Figure S6). NADTRHD and ATPM were the most appealing enzymes for controlling cellular growth due to sign and magnitude consistency across the ESSs of their FCCs.

Conclusions
This work studied the impact of alternative concentration and flux steady states on the conclusions derived from the MCA outputs of the non-linear kinetic models built around them using the physiology of optimally grown E. coli. We show that different FDPs can lead to distinct metabolic engineering conclusions when analyzing output FCCs of the non-linear models. The ME2, PPC, and PGI examples illustrate how thermodynamics and kinetic coupling can change the control from one FDP to another. These enzymatic reactions were topologically close to the bidirectional reactions that changed between the FDPs, though, less intuitively, we also noticed that there were changes in thermodynamic displacements across FDPs in enzymes that were topologically far away from the bidirectional reactions. We then studied the uncertainty within a single FDP, and using PCA to study the extremes of the solution space, found that within a FDP, MCA outputs appeared to be more sensitive to concentration values rather than flux values. These observations emphasized the importance of considering alternative solutions when studying a physiology as the steady state affects directly the decisions for hypothesis generation in basic research and design in synthetic biology and metabolic engineering. Hence, we propose a workflow for assessing this uncertainty to make more reliable metabolic engineering decisions that can be broadly applied to any kinetic model to improve the predictions resulting from it.
We then used our workflow to pick target enzymes for genetic modification, identifying NADTRHD, PFK, and ATPM as the top target enzymes independent of the FDP for improving the cellular growth of optimally grown E. coli. PFK and PGM were selected as top enzymes independent of the FDP for improving glucose uptake of optimally grown E. coli. We stress the importance of selecting target enzymes that exhibit control across all the FDPs to make more reliable decisions, highlighting the need to consider alternative steady states when building non-linear kinetic models for a given physiology, as they have imminent implications on the conclusions derived from the MCA. The herein proposed workflow can be used to suggest metabolic engineering decisions for a given study and can provide insights into the design of experiments, as the ranking of candidate enzymes can highlight reactions or enzymes that need further characterization and study due to their variability.

Reduced E. coli model
The model stoichiometry for this study was derived from E. coli iJO1366 [25] using redGEM, a systematic framework for developing core models that are consistent with their genome-scale counterparts [26,27]. The resulting reduced models are context-specific and in the process of reduction it is important to define the carbon sources, the content of media and also the metabolic subsystems of interest for the study. We applied the redGEM algorithm on the latest genome scale model. We used a minimal media with glucose as the sole carbon source and the selected starting subsystems were ones pertaining to central carbon metabolism (glycolysis/gluconeogenesis, citric acid cycle, pentose phosphate pathway, pyruvate metabolism, and glyoxylate metabolism). Omics data for the physiology of optimally grown E. coli under aerobic conditions were extracted (Supp 1) from McClosekey et al. [24]. The data were integrated in the form of constraints into the MILP formulation of the thermodynamics-based metabolic flux analysis [35].
We make the following directionality assumptions for several bi-directional reactions: 1. Fructose-biphosphate aldolase (FBA) that is part of mid-lower glycolysis is set towards catabolism [31].
2. The bi-directional transports of magnesium and phosphate are both set towards uptake [32,33].
3. Acetate kinase (ACKr) and phospho-transacetylase (PTAr) are both set towards the acetate production, because acetate is one of the main by-products [24].
For some of intracellular metabolites, a corresponding transport reaction has not been biochemically characterized and does not appear in the E. coli iJO1366 and in our reduced model. However, these metabolites, unless they are highly polar or very large, are subject to passive diffusive transport through the cell membrane. Therefore, we explicitly added transport reactions for these metabolites that operate at least at basal level (10^-6 mmol/(gDW*h)).

Identification of alternative flux directionality profiles
As first step (Figure 1), in order to identify the reactions that are able to operate in both directions, flux variability analysis (FVA) was performed [40]. If the system has a number z of bi-directional reactions, it could have up to 2 z FDPs as in a FDP, reactions can only operate in a unique direction.
We enumerated the FDPs by adjusting the boundaries of the bi-directional reaction so that they can only operate in a unique direction. We define the coefficient of variability, CV i , as: where, UB and LB are the upper and lower bounds respectively of the flux i derived using thermodynamic-based variability analysis (TVA) [7,35]. F is the average of UB and LB. We define the flux variability score of each FDP as the Euclidean norm of the vector whose entries are the CV of each flux. The FDP with the highest flux variability score has the highest relative flexibility in terms of the allowable flux ranges.

Computation of reference and extreme steady states for alternative FDPs
For each of the identified FDPs, in step two (Figure 1

Thermodynamic displacement analysis.
Within each FDP, in step 3 (Figure 1), we compute the displacement of the reactions from thermodynamic equilibrium, Γ [17,45,46]. For a simple uni-uni reaction with a substrate S and a product P, the thermodynamic displacement, Γ, is defined as: where, k eq is thermodynamic equilibrium. More specifically, we first use the vector of the reference steady-state concentrations together with values of standard Gibbs free energies of reactions to compute Γ. For reactions with negative Gibbs free energy, 0 < Γ < 1. For reactions that are far away from equilibrium Γ is close to 0, and for reactions near equilibrium Γ ≈ 1. We then classify the reactions in terms of Γ in the following four classes: reactions that operate (i) near equilibrium (NE), 0.9 ≤ Γ ≤ 1; (ii) near to middle equilibrium (NM), 0.5 ≤ Γ ≤ 0.9; (iii) middle to far from equilibrium (MF), 0.1 ≤ Γ ≤ 0.5; and (iv) far from equilibrium (FE), 0 ≤ Γ ≤ 0.1. The information about Γ is important, as it is known that enzymes that operate near equilibrium do not have control over fluxes and concentrations in the network [17].
Within the MCA framework, Kaeser and Burns [47] define the concentration control coefficients, ! ! , and the flux control coefficients, ! ! , as the fracitonal change of metabolite concentrations and metabolic fluxes, respectively, in response to fractional change of system parameters. According to the log(linear) formalism [48,49], we can derive ! ! and ! ! as: where, N is the stoichiometric matrix, V is the diagonal matrix whose whole elements are the steadystate fluxes, E is the elasticity matrix with respect to metabolites and is the matrix of elasticities with respect to parameters. If we now consider a uni-uni reaction, i, with a substrate S and a product P we write its reaction rate v i as follows: where, V max is the maximum velocity at enzyme saturation, and, ! ! and ! ! , are the Michaelis constants of S and P, respectively. We define, as done previously by Hatzimanikatis and coworkers [15], the elasticities with respect to S and P, respectively, as: where, ! ! ! and ! ! ! are entries of the elasticity matrix E. Evidently, if the reaction is at thermodynamic equilibrium (i.e. Γ ≈ 1), the first terms of the elasticity terms ! ! ! and ! ! ! tend towards infinity and we consequently have no control with respect the considered enzyme. However, if the reaction is far away from thermodynamic equilibrium (i.e. Γ ≈ 0), the second terms of ! ! ! and ! ! ! can have impact on the elasticities, potentially resulting in control. Hence, it is essential to consider thermodynamic displacement with the kinetics in order to understand control at systems level. The elasticity matrix E is directly affected by Γ, and hence the control coefficients ! ! and ! ! will also be impacted.

Kinetic parameter sampling.
We build populations of kinetic models for the computed vectors of the reference steady-state fluxes and concentrations. We integrate the information about the kinetic properties of enzymes available from the literature [50] and the databases [51,52]. We use the reversible Hill kinetics [53] and convenience kinetics [54] for reactions with unknown kinetic mechanism. For kinetic mechanisms with no or partial information about their parameter values we sample the space of kinetic parameters by direct sampling of the degree of saturation of the active site of an enzyme considering one [17] or multiple enzymatic steps [45]. We then parameterize a population of kinetic models (Supp 5-6), perform consistency verifications [14,16,17], and compute the flux and concentration control coefficients [17,55] where, ! is the flux across a reaction i and, ! is the concentration perturbation of an enzyme k. We then compute the mean of the FCCs, ! ! , across the kinetic models for an FDP as shown in Table 1.
We considered the FCCs for fluxes that were larger than 0.01 mmol/gDW/h across all FDPs because we wanted to focus our study around the reactions that carry more significant amount of carbon  Figure S7). To compare more significant FCCs, we only considered ones that had more than absolute 0.1 fold change across the 4 FDPs so that we focus on FCCs with significant control. This meant that we kept 1'263 out of the previous 34'650 FCCs (Supp 4, Figure S8).

4.5.
Characterizing the distribution of kinetic parameters 4.5.1. Beta distributions. The kinetic parameter solution space is studied in step 4 ( Figure 1) by sampling uniformly the degree of saturation of an enzyme's active site as defined by Wang et al [17].
We obtain distributions of scaled metabolite concentrations from this sampling and consequently, kinetic parameter distributions. The degree of saturation of an enzyme's active site has a welldefined range from zero to one, allowing us to resort to parametric distributions for their characterization. Beta distributions provide an efficient way of quantitatively expressing variability over a fixed range by estimating its two parameters [56]. These parameters can be obtained and compared for populations of kinetic parameters generated with different operational configurations.

Implying prior beta distributions for sampling.
In this work we compare how alternative steady states describing a physiology impact metabolic engineering conclusions. It is thus desirable to ensure that the sampled degrees of saturation of enzyme active sites are similar for the populations of kinetic models built around alternative steady states within FDPs in step 5 ( Figure 1).
Hence, we compute beta parameters describing the distributions of the kinetic parameters of a given RSS. These beta parameters are used to sample degrees of saturation of enzyme active sites for alternative steady states from similar density distributions using the prior samples. The Beta distribution parameters are implied within the ORACLE workflow as input for sampling degrees of saturations of enzymes when parameterizing new kinetic models. Beta distributions hence bias sample densities for the sampling of degrees of saturation states for an enzyme.

Analysis of alternative solutions within FDPs
We investigate in step 5 ( Figure 1) how different flux profiles and metabolite concentration vectors, within FDPs, affect the populations of control coefficients. We separately studied the effects of the flux profiles and the metabolite concentration vectors, in order to decouple their effects on control coefficients. We take the reference steady-state concentration vector and we form the pairs with the extreme steady-state flux profiles computed in step 2 of the procedure (Figure 1). We then generate populations of kinetic models as described in step 3. In the generation of missing kinetic information, we use the distributions of kinetic parameters that have been characterized in step 4 for this FDP. This way, we obtain alternative populations of kinetic models that have in common the reference steady-state concentration and the distribution of kinetic parameters. We compare these populations of kinetic models together with the population of kinetic models that was computed in step 3 for the reference steady state of this FDP. This enables the assessment of the effects of alternative flux profiles within the FDP onto the control coefficients.
The effects of alternative values of concentrations on control coefficients are estimated in an analogous way, where we take the reference steady-state flux and we form the pairs with the extreme steady-state concentrations and we repeat the procedure discussed above. Taken together these two comparisons of alternative solutions allow us to identify sets of enzymes within a FDP whose control over the fluxes and concentrations in the network is robust both with respect to the alternative concentrations and fluxes. We also identify enzymes that are robust only with respect to the alternative concentrations or alternative fluxes.

Metabolic Engineering and Synthetic Biology Design
We next analyze in step 6 the results obtained in steps 3-5 ( Figure 1) in the light of metabolic engineering and synthetic biology design. We single out the enzymes whose control over fluxes and concentrations of interest is consistent over all FDPs and within FDPs. In this step we can also design the experiments that would give sufficient information for discriminating alternative solutions between FDPs and within FDPs.