SUPPLEMENTARY INFORMATION for Metabolic adaptation processes that converge to optimal biomass flux distributions

In simple organisms like E.coli, the metabolic response to an external perturbation passes through a transient phase in which the activation of a number of latent pathways can guarantee survival at the expenses of growth. Growth is gradually recovered as the organism adapts to the new condition. This adaptation can be modeled as a process of repeated metabolic adjustments obtained through the resilencings of the non-essential metabolic reactions, using growth rate as selection probability for the phenotypes obtained. The resulting metabolic adaptation process tends naturally to steer the metabolic fluxes towards high growth phenotypes. Quite remarkably, when applied to the central carbon metabolism of E.coli, it follows that nearly all flux distributions converge to the flux vector representing optimal growth, i.e., the solution of the biomass optimization problem turns out to be the dominant attractor of the metabolic adaptation process.


Introduction
Constraint-based computational methods such as Flux Balance Analysis (FBA) are nowadays widely used when investigating metabolism of bacteria and other simple unicellular organisms [1,2]. Within the framework of FBA, a commonly accepted hypothesis is that biomass production has a special role: evolution has shaped cellular metabolism of these organisms so as to optimize growth, hence if growth is used as objective function of an optimization problem, the vector of fluxes found in correspondence of the optimum represents a plausible flux distribution for the organism. Although such a criterion is phenomenological, it is reasonable, and indeed the fluxes constructed by FBA methods describe well the empirical fluxes observed in many experimental situations, dealing with wild type organisms [3], knockout mutants [4], engineered strains, screenings of drugs [5], nutrient shifts [6] or stress responses.
For bacteria like E.coli, the short-term metabolic response to genetic and environmental perturbations is characterized by a growth arrest and by the activation of a number of latent pathways, a strategy which can favor the survival of the organism at the expenses of efficient biomass production [4,6,7]. This activation is however only transient, and most latent reactions become resilenced as the microorganism adapts to the new condition [7][8][9]. Although experimental data describing this adaptation process at metabolic, genomic, gene expression and proteomic level are starting to appear [6][7][8][9][10][11][12], it is still unclear how this dynamical recovery is implemented by the organism. From the experimental data one can deduce for instance that the FBA criterion is inadequate to describe the transient response, but that it can still be used to characterize the end-point of the metabolic adaptation [4,6]. The two main criteria proposed in the literature to describe the metabolic response to a perturbation are MOMA (Minimization Of Metabolic Adjustment [13]) and ROOM (Regulatory On/Off Minimization [14]). Both capture the idea that metabolism tends to minimize the adjustment with respect to the pre-perturbation fluxes at the expenses of growth, and for both criteria this results in a number of non-essential reactions being activated, which is coherent with the aforementioned experimental evidence [7,8]. However, these methods can provide only a static snapshot of the early adjustments that follow a perturbation. Attempts to model the dynamical changes happening during adaptation have been made for example using kinetic models [15] or combining pseudo steady-states of FBA with kinetic models as in dynamical FBA [16], see [17] for an overview. Other types of proposals include the incorporation of extra time-dependent constraints in the model, representing for instance molecular crowding [18] or other growth-limiting factors [19]. An alternative to adding kinetic parameters or constraints is to combine multiple datasets, such as gene expression [10,20] and/or proteomic [12], see [21,22] for reviews. The transcriptional or translational information obtained in this way can be used to tune the constraints of an FBA model, leading to improved matches with empirically observed fluxes [12,20]. None of these methods is however able to provide a systematic interpretation of how and why the organism accomplishes the adaptation, let alone to propose a mathematical principle combining adaptation and FBA.
A possible way to obtain a dynamical description of metabolic adaptation is proposed in [23]. Starting from a non-adapted progenitor metabolism, a population of phenotypes is obtained through the resilencing of a single reaction (i.e., letting the corresponding flux become negligible). If the growth rate of the different phenotypes is taken as measure of fitness, then a selection biased towards the fittest phenotypes favors the recovery of growth, see Fig 1. If the procedure is iterated, then a Markov chain is obtained. Since at each step of the chain the metabolism of the selected phenotype differs from its predecessor only for a single silenced reaction, it can be seen as a short-term adjustment and computed through a MOMA. For the central carbon metabolism, the resulting process of iterated metabolic adjustments leads rapidly to metabolic adaptation of the microorganism. It is shown in [24] that this model can be used to describe a series of experimental results dealing with adaptation to single carbon sources of various E.coli knockout strains [6]. In particular, it allows to achieve a good agreement with both the experimental growth rates and measured flux data reported in [6,8].
The aim of this paper is to take the approach one step further, by showing that for the core metabolism of E.coli, the Markov chains of recursive resilencings constructed in this way exhibit a single dominant end-point flux vector, and that this vector corresponds to one of the FBA optima, namely the parsimonious enzyme usage FBA optimum (pFBA, minimizing the number of fluxes [12]).
To do so, we compute a large number of trajectories for our Markov chains from random initial conditions, and show that they tend to become absorbed into the pFBA flux distribution or at least to become highly correlated with it. More specifically, the chain of pseudo steadystate fluxes computed through the adjustments that follow the resilencings steers the vast majority of all admissible flux vectors towards alignment with the pFBA vector, regardless of the norm (and growth rate) achieved by the flux vectors at the end-point of the process.
In dynamical systems terms, we can say that the pFBA flux vector constitutes the dominant attractor of the fitness landscape associated to the process of metabolic adaptation. The fact that the single dominant peak of this landscape corresponds to the pFBA flux distribution sheds a novel perspective on FBA optimization, and may contribute to turning this phenomenological argument into a rigorous mathematical model.

FBA and pFBA
In FBA [2], the polytope of admissible steady state metabolic fluxes is represented by where v is the vector of fluxes, of lower and upper bounds l = [ℓ 1 . . . ℓ n ] and u = [u 1 . . . u n ], and S is the stoichiometric matrix. The FBA optimal flux vector is given by where g = ξ T v is the growth rate, i.e., the linear combination of fluxes that constitutes the biomass reaction. When the optimum is degenerate, a secondary optimization criterion can be used to discriminate among equivalent optimal solutions. For example, overall enzyme investment is minimized by the pFBA solution v pFBA , which corresponds to minimization of the sum of the (absolute values of the) fluxes [12].

Adaptation as a Markov chain of repeated resilencing
Following [23], we assume that the adaptation dynamics form a stochastic process of recursive resilencings described by the Markov chain where v 0 is a randomly chosen initial condition in Γ 0 = Γ. The stochastic process can be summarized as follows. At step k, assume the population of bacteria has an homogeneous metabolism, i.e., all cells have the same n k active reactions with the same fluxes v k−1 (this corresponds to a specific sampling of our Markov chains). From v k−1 , it is possible to obtain n k + 1 different phenotypes, corresponding to the resilencing of one of the enzymes (n k possibilities) or to the current phenotype remaining unchanged for another step. The n k possible silencings of a reaction yield the n k reduced poly- The corresponding fluxes v k,i are computed via a MOMA projection on these reduced polytopes: where k Á k is the Euclidean norm, see Fig 1 for a sketch. Each choice of v k,i leads to a possible growth rate: g k,i = ξ T v k,i , i = 1, . . ., n k . Viable phenotypes have g k,i > 0 while non-viable phenotypes (e.g. when an essential reaction is suppressed) have g k,i = 0. In what follows these growth rates will be placed on the diagonal of a fitness matrix where g k,0 represents the current growth rate.

Selection probabilities as solutions of a replicator equation
To the n k + 1 possible choices g k,i , it is possible to associate selection probabilities through a basic replicator equation which uses the g k,i as fitness function. Denote p k,i , i = 0, 1, . . ., n k , the probabilities (or frequencies) associated to the g k,i . If Δt is the time duration of each step, then the replicator equation is and ðp k Þ ¼ P n k i¼0 g k;i p k;i is the average fitness. The explicit solution of Eq (2) can be expressed as a Boltzmann distribution, see S1 Text for the details. In synthesis, in the two cases we can distinguish (sketched in Fig. B of S1 Text) one gets for the selection probabilities: 1. uniform priors: at the begin of the time interval the selection probability is p k , all choices are equiprobable. In this case the Boltzmann distribution for the selection probabilities at the end of the time interval is where Z k ðDtÞ¼ P n k i¼0 e g k;i Dt is a partition function. 2. non-uniform priors: at the begin of the time interval the selection frequencies are not uniform but are themselves expressible as a Boltzmann distribution where β k has the interpretation of an inverse temperature. In this case, at the end of the time interval we obtain A through derivation of these selection probabilities is available in the S1 Text.

Metabolic adaptation as a completely reducible Markov chain
In both cases described above p k (Δt) has the meaning of transition probability between the current state S k−1 = {v k−1 , Γ k−1 } and the possible states achievable at the k-th step Since the fluxes v k,i can take any value between lower and upper bound, the corresponding transition matrix is infinite dimensional. However, in order to understand the properties of the stochastic process we are considering, it is useful to look at its projection over the subspace of active reactions (i.e., over the binary equivalent of the polytope Γ k ). In terms of this projection, the possible states of the Markov chain are the 2 r possible combinations of the r reactions of the network, see Fig 2 for a toy example with r = 4. Denote Z 1 , . . ., Z 2 r these discrete states and P ij = P(X k = Z i j X k−1 = Z j ) the corresponding transition probabilities. As in our model the resilencings are irreversible, P is triangular, i.e., it is completely reducible, see Fig 2D. Since in reality P is the result of a projection, it is P = P(v), i.e., the exact transition probabilities p k depend on the values of the fluxes and hence on v 0 . However, even in the complete model the fully reducible structure is preserved. In particular it follows that a certain number of states Z i must correspond to ergodic classes, i.e., "absorbing" states in which the Markov chain stabilizes. Full reducibility implies that each ergodic class is composed of a single state. Periodic chains of states are impossible.

Results
The metabolic adaptation process described in Methods and in Fig 1 is applied to the network that describes the central carbon metabolism of E.coli [25]. In order to do this, a large number of realizations of the Markov chain is produced. Even in a network of modest dimensions like that of E.coli central metabolism (r = 95 reactions, see S1 Text for details), the number of possible discrete states Z of the Markov chain is enormous (2 95 * 10 28 ), hence numerical computations are necessarily limited to a fraction of all possible trajectories. For this study, some 10 5 trajectories have been generated, starting from randomly chosen initial conditions in the polytope of admissible fluxes Γ and using various forms for the priors. Given the irreversibility of the resilencing, the number of steps required to reach an endpoint state can be computed from the trajectories. A trajectory is considered absorbed into an end-point state (i.e., it has reached a local maximum of the fitness landscape) when no further silencing happens for 5 consecutive steps. With this stopping condition, the expected time until absorption of our trajectories is 24.6 ± 4.1 steps. Except for pathological initial conditions (those missing some essential reactions), all trajectories stabilize to flux vectors of positive growth. Nearly 20% of all trajectories reach the maximal growth computed by the FBA criterion, g FBA , and for nearly 50% of all trajectories the growth at the end-point is ! 0.85 g FBA , see Fig 3. The remaining 50% of trajectories are more or less uniformly distributed in the interval 0.2 < g/g FBA < 0.85. Much more remarkable is the correlation between the end-point flux distribution v and the flux distribution given by the pFBA criterion v pFBA : the mean of the correlation is 0.96 and the median is 0.98, with 88% of all trajectories achieving a correlation of at least 0.9, see Fig 3. The meaning of this result is that nearly all initial conditions in Γ tend to become aligned with the flux distribution v pFBA , regardless of the biomass they can produce, see Fig. A of S1 Text for a sketch.
The time evolution of the 3D histogram of Fig 3 during adaptation is shown in Fig. C of S1 Text. It can be seen that while randomly chosen initial conditions in Γ usually give a zerogrowth, already with the first silencings growth starts to recover, and gradually improves in the first 10 steps of the Markov chain. During the transient, no significant intermediate peak is visible, meaning that many different routes are explored by the trajectories. After * 10 steps, the high correlation / high growth peak starts to appear, and rapidly becomes dominant.
Examples of the resulting trajectories are shown in Figs. D-F of S1 Text. For instance, the first row of Fig. D of S1 Text shows a set of trajectories originating from the same random initial condition, all converging towards v pFBA , although through slightly different paths. None of the trajectories of the second row of Fig. D of S1 Text instead achieves a growth rate higher than 0.75g FBA . However, all of the end-points flux vectors become aligned with v pFBA (correlation higher than 0.97). In this case the two values of g reached by the trajectories correspond to two different ergodic states, as can be seen by the grouping of the number of active reactions eventually reached. It should be observed how for this phenotype of non-optimal growth the number of reactions is much less than for v pFBA . This is indeed a constant pattern in our metabolic adaptation strategy. As can be seen in Fig. G of S1 Text, at the end-point the number R of active reactions of v and g are positively correlated: for strains that have sub-optimal growth more resilencings are possible i.e., more directions with slow but positive Δg exist and are explored. In fact, Fig. G of S1 Text shows that indeed the length L of a trajectory is inversely correlated with the growth g of v.
Other than the peak at high correlation / high growth, Fig 3 does not show any other sufficiently significant peak (and nor does Fig. C of S1 Text). It is however worthwhile observing that a small fraction of trajectories is steered towards flux distributions of maximal growth different from v pFBA , i.e., to alternative FBA optima. An example of such trajectory is shown in Fig. E of S1 Text (bottom row): while most of the trajectories converge to v pFBA , a few do not (one is shown in green), and stabilize in an alternative FBA flux vector of correlation 0.75 with v pFBA . Cases like this lead to a correlation corr(v pFBA , v k ) which decreases when v k falls into the basin of attraction of a local maximum other than v pFBA . In the ensemble of the trajectories, however, these situations are unfrequent: if we look at the average of all trajectories, corr(v pFBA , v k ) is always monotonically increasing, regardless of the final g achieved, see Fig 3 and Fig. H of S1 Text. Similarly, also g is monotonically growing on the vast majority of the trajectories (Fig. I of S1 Text).
Interestingly, if we start relaxing the assumption of irreversibility that characterizes a large fraction of the metabolic reactions (49 of the 95 reaction are irreversible in our network), then the convergence rate to v pFBA quickly decreases, in favor of other v FBA , see Figs. J-L of S1 Text. In particular, when all reactions are considered as reversible, then the correlation between v pFBA and v at absorption is completely lost, although optimal biomass is still achieved by most trajectories, see Fig. L of S1 Text.
It follows directly from the linearity of the expression for the biomass that when a trajectory v becomes aligned with v pFBA , the growth rate it can produce depends on the norm of v. In fact, Fig 4 shows that there is a sharp direct proportionality between kvk at ergodicity and g: when Mean ± std of the correlation between v k and v pFBA during adaptation (solid lines). At absorption, the mean of the correlation is 0.96. As can be seen in the histogram of the end-points, the distribution is highly skewed towards maximal correlation, with 68% of end-points above the mean. In fact the median is 0.98 (dashed line). C: The 3D histogram shows the correlation between v pFBA and the end-point of the trajectories v versus the growth rate g reached by v. Of the trajectories reaching g/g FBA > 0.85, 99% have correlation ! 0.85. the recursive process of silencings and adjustments leads to a v which is smaller in norm than v pFBA , also the corresponding g will be smaller than g FBA .
In order to understand when an initial condition can lead to an end-point v of norm comparable to v pFBA , one can look at how many of the bounds that delimit the polytope of admissible fluxes at step k, Γ k , become active during the adaptation (i.e., a flux for a reaction becomes equal to one of its lower or upper bounds). Fig. N of S1 Text shows that in the early part of the adaptation, trajectories that do not achieve high growth (which, from Fig 4, correspond to trajectories having kvk < kv pFBA k) tend to saturate less than those achieving higher growth. Hence fluxes that tend to stay in the interior of the polytope rarely will reach a high kvk. From Fig. O of S1 Text it can be observed that the difference in active bounds concerns mostly certain specific pathways: in strains achieving high growth, uptake bounds on many exchange reactions tend to become saturated in the early transient, and so do upper bounds of pyruvate metabolism, signs of a more efficient use of the available resources. Coherently, Fig. P of S1 text says that high growth is achieved when TCA cycle and pentose phosphate pathway remain fully functional during adaptation. Notice that uptake bounds of gluconeogenic carbon sources such as acetate are almost never saturated in the high growth trajectories.

Discussion
Mathematically, the dynamical model used in this paper to describe metabolic adaptation has many aspects in common with standard evolutionary models based on natural selection [26].
The only extra assumption we require is that the "selection" that leads to adaptation occurs only through the silencing of non-essential reactions. That a multitude of latent pathways becomes active after an environmental perturbation is a known fact experimentally [7,8]. That during adaptation these low-yield pathways tend to become resilenced is also a commonly accepted hypothesis [8,27], supported for example by gene expression data. It has in fact been observed that e.g. after a change of carbon source a major rearrangement occurs at gene expression level, with more than 10 3 genes differentially expressed [8]. A similar pattern is observed also in response to a wide variety of stress factors [7]. After a strain has adapted to the new condition, however, most differentially expressed genes have returned to their baseline level, and so is probably the concentration of the corresponding enzymes.
As shown in [7], different stress responses elicit early metabolic responses that are less stereotypical than those observed at gene expression level. When growth is recovered, however, the metabolic profiles in the various cases show a high similarity. This picture is coherent with the presence of an attractor in flux space, which can compensate for possibly widely different flux distributions right after a perturbation. For metabolic responses such as the stress responses of [7], it is unclear how to include the direct effect of the perturbation on the metabolic fluxes of an FBA model. To cope with this fact, in our Markov chains the initial condition for the flux vector, v 0 , is chosen randomly in the polytope Γ, which implies that at the begin of the Markov chain most reactions are already active.
When instead the effect of a specific perturbation can be explicitly included in the FBA model, then the Markov chains can be used to investigate also the early stages of the transient, with the activation of the latent pathways. This is the point of view taken in [24], where the experimental setting of [8] is considered. It is shown in [24] that the shift from rich medium to single carbon sources for various E.coli mutants can be reproduced closely by the metabolic adaptation process described in the Methods section. Proceeding in this way corresponds to fixing specific initial conditions on the Markov chains, and following the specific family of trajectories that results from them (activatory phase included). It becomes then interesting to see what happens when these "nominal" trajectories are compared to more general trajectories in which the initial fluxes v 0 are randomly chosen in Γ. For glucose as single carbon source, the two types of trajectories are compared in Fig. M of S1 Text. As can be seen, for all 4 mutant strains there is a high correlation between the end-points achieved by the flux vectors, meaning that the specific pattern of transient activations of the latent pathways is not crucial to the achievement of the adapted flux distribution, as both types of trajectories converge towards v pFBA . Notice how the pgi mutant has a secondary peak at low correlation: this corresponds to a less frequent second phenotype of lower growth, described in [8]. Such a phenotype is sometimes achieved by both the nominal trajectories of [24] and the randomly initialized trajectories computed in this paper.
A number of possible optimality criteria alternative to biomass optimization have been investigated in the literature [2,[28][29][30]. Common choices are for example maximization of yield (instead of biomass), maximization of ATP, minimization of overall intracellular flux (i.e., minimum enzyme investment), minimization of redox potential, etc. In [29] a thorough analysis of their coexistence/complementarity is carried out. By using reaction resilencing to progressively adjust the metabolism to the new environment, two of the most accepted among these criteria, biomass optimization and minimization of overall fluxes, are naturally combined.
It is shown in [31] that in FBA irreversibility of a large fraction of metabolic reactions is a key factor in achieving optimal flux distributions that are sparse, as our pFBA is. Indeed also for our metabolic adaptation process irreversibility is key to convergence to v pFBA , as Figs. J-L of S1 Text clearly show. It is worth observing that when we start relaxing the assumption of irreversibility, what is lost is not the achievement of optimal growth, but only convergence to the sparsest degenerate solution of Eq (1) (i.e. v pFBA ). On the contrary, in the case of all reversible reactions the ratio g/g FBA achieved by the trajectories is even better than in Fig 3, with a mean value for g of 0.91g FBA and a median value of 0.997g FBA , see Fig. L of S1 Text. Given that the irreversibility of the constraints follows from thermodynamic considerations [32] which are usually considered sufficiently reliable, our results provide novel evidence in favor of sparse optimal biomass solutions such as pFBA, and a novel point of view on the coexistence of optimality criteria such as biomass production and enzyme parsimony of the solution. They also confirm that the repeated resilencing process described in this paper is indeed an effective strategy for describing the recovery of growth that occurs in metabolic adaptation.
It is worth remarking that the method used in this paper is fundamentally different from a dynamical FBA [16]. In the latter, in fact, growth is used as the objective function of an optimization problem, and the adjustments of the metabolic fluxes follow the gradient direction indicated by the solution of such a problem. In our case, instead, the growth rate is only used to shape the fitness landscape of a population of possible phenotypes (corresponding to the possible silencings that can occur), but the metabolic adjustments are always computed through MOMA projections. In general, there is no a priori guarantee that a greedy fitness landscape constructed in this way i) may be regular; ii) may achieve maximal growth, and iii) may lead to flux distributions that resemble those of the pFBA. In our trajectories, in fact, what we observe is that the fitness landscape is rugged, but the plethora of local maxima have all a very small basin of attraction, as opposed to the global maximum which attracts around 50% of all trajectories when we count based on growth. If instead we look at normalized flux distributions then the basin of attraction of v pFBA jjv pFBA jj grows to 90% of all v jjvjj . This tells us that for what concerns central metabolism, a procedure like the one described in this paper is substantially a monotonic process of alignement of v k on v pFBA . The robustness of the convergence is also reflected in the low sensitivity to the randomness of the Markov chains, see S1 Text and Fig. T of S1 Text for more details.
In conclusion, one can say that simple flux reorganization rules based on local fitness are sufficient to drive the cell toward a more efficient use of the metabolic resources. It is quite remarkable that most of the trajectories end up in the pFBA optimum, without knowing it, and without ever using growth rate to update metabolic fluxes (growth rate is used only for the selection probabilities in the resilencings; metabolic fluxes are always updated via MOMA). Clearly this fact provides a further evidence in favor of the FBA criterion, and one could even speculate that it provides a more fundamental principle, from which FBA follows as a corollary.
Supporting Information S1 text. Methods. Details of the population dynamics model and of the selection probabilities. Further considerations of the method and on its applicability. (PDF)