Elementary Growth Modes provide a molecular description of cellular self-fabrication

In this paper we try to describe all possible molecular states (phenotypes) for a cell that fabricates itself at a constant rate, given its enzyme kinetics and the stoichiometry of all reactions. For this, we must understand the process of cellular growth: steady-state self-fabrication requires a cell to synthesize all of its components, including metabolites, enzymes and ribosomes, in proportions that match its own composition. Simultaneously, the concentrations of these components affect the rates of metabolism and biosynthesis, and hence the growth rate. We here derive a theory that describes all phenotypes that solve this circular problem. All phenotypes can be described as a combination of minimal building blocks, which we call Elementary Growth Modes (EGMs). EGMs can be used as the theoretical basis for all models that explicitly model self-fabrication, such as the currently popular Metabolism and Expression models. We then use our theory to make concrete biological predictions. We find that natural selection for maximal growth rate drives microorganisms to states of minimal phenotypic complexity: only one EGM will be active when growth rate is maximised. The phenotype of a cell is only extended with one more EGM whenever growth becomes limited by an additional biophysical constraint, such as a limited solvent capacity of a cellular compartment. The theory presented here extends recent results on Elementary Flux Modes: the minimal building blocks of cellular growth models that lack the self-fabrication aspect. Our theory starts from basic biochemical and evolutionary considerations, and describes unicellular life, both in growth-promoting and in stress-inducing environments, in terms of EGMs.


A detailed derivation of the growth rate in terms of reaction rates, at balanced growth
We start with Here N denotes the stoichiometry matrix, and v j (c) is the j-th reaction rate, as a function of concentrations c. Since n k = V c k , we havė In balanced growth, all concentrations are constant over time (1),ċ = 0, which means thaṫ The time derivative of V = V (n) is given byV Hence, using (4) and (1) in (3), we have in balanced growth that  The steady-state assumption ensures that the concentrations c, and therefore also the reaction rates v j (c) are constant in time. Therefore, the complete right-hand side of (5) is constant in time, and thus the left-hand side must be as well. This strongly suggests (but does not directly imply) that ∂V ∂n l is independent of time as well. 1 Since n does change in time, this implies that ∂V ∂n l does not depend explicitly on n, i.e.: ∂V ∂n l = ρ l for all l, for constants ρ l . 2 The biophysically reasonable assumption that total volume is the sum of all the volumes taken up by individual molecules, are not. This is unlikely. 2 If we assume that the osmotic pressure is kept constant in a cell, the import of a particle must lead to the import of water. The molar volume parameters, ρ l , will therefore not only comprise the volume of particle l, but also the volume of the water that is needed to keep the osmotic pressure constant. These parameters thus depend on the osmotic pressure.
is therefore suitable to study balanced growth states. With this definition, we also have at balanced growth. The (instantaneous) growth rate is now defined as Since ρ · c = 1 at balanced growth, = ρ · N v(c) − µ using (7).
We conclude that the growth rate of a cell at balanced growth can be expressed in terms of the catalytic activities of its constituents reactions as,

Some basic Linear Programming
We give a short review of some basic Linear Programming (see e.g. (2; 3)) necessary for the definition of Elementary Growth States and Elementary Growth Modes. Let A be an m × n matrix, m < n. The linear programs in this paper are all in equational or standard form We may assume that A has full rank m (if A does not have full rank the matrix may be reduced to a matrix with fewer rows which does have full rank without changing the solution space of vectors satisfying Ax = b). A vector x is called a feasible solution if it satisfies all the constraints, so Ax = b and x ≥ 0.
Let D ⊆ {1, . . . , n} be an index set, and let A D denote the matrix consisting of columns from A whose indices are in D. We will use similar notation for the restriction of x to the index set D, x D . A vector x is called a basic feasible solution if x is a feasible solution for which there exists a set D ⊆ {1, . . . , n} with m distinct elements such that A D is square and non-singular, x satisfies A D x D = b, and x j = 0 for all j / ∈ D. The vector x D is then uniquely defined, since For any index set D with A D invertible, we can compute x D . If this solution satisfies x D ≥ 0, then it induces a basic feasible solution x that satisfies x j = 0 for all indices of j / ∈ D. D is then called a feasible basis for x. (This parlance is different from the usual definition of a basis in Linear Algebra, in which a basis is formed by a set of vectors.) We also recall a few standard concepts from convex geometry. A set X ⊂ R n is called convex if for each x, y ∈ X and λ ∈ (0, 1), we have λx + (1 − λ)y ∈ X. A set in R n is a convex polyhedron if it is an intersection of finitely many closed half-spaces in R n . A cone is a set X ⊂ R n with the property that x ∈ X implies λx ∈ X for all λ ∈ R and the cone is pointed if λx ∈ X if and only if λ ≥ 0. A convex polytope is a bounded convex polyhedron.
A vertex of a convex polyhedron P ⊂ R n is a point x ∈ P such that there exists a vector c with the property that c · x > c · y for all y ∈ P\{x}. The space of feasible solutions of a Linear Program form a convex polyhedron. A standard result from Linear Programming states that the vertices of this polyhedron coincide exactly with the basic feasible solutions of the LP. Lastly, for a cone defined by A standard result from LP is that the spanning rays of such a cone are those vectors in the cone with minimal support.

A single pathway toy model showing the three inherent nonlinearities of self-fabrication
We here consider the simplest possible model of a self-fabricator, and compare it with a more conventional model. This will allow us to highlight the inherent nonlinearities of self-fabrication that can be captured with EGM theory. We will then extend this model with an additional step so that we create a core model of overflow metabolism, based on a non-self-fabricating model that we analyzed before (4). These models are not made with the intention to make the most realistic model possible. For this, various more extensive models have already been made by others. See (5) for an overview of all resourceallocation models of overflow metabolism. Instead, our models are meant to illustrate the differences in the mathematical structure of models of self-fabrication compared to conventional models. The Matlab code used to generate the plots in this section can be found separately as Supporting Information.

Setting up the stoichiometry and the balanced growth assumption
The first model contains two metabolic reactions: 1) a transporter step that imports the external nutrient and converts it to a metabolite X, and 2) a reaction that generates ATP using this metabolite. In addition, the model contains a ribosome that synthesizes itself and the enzymes needed for the metabolic steps using variable amounts of X and ATP as precursors. This is captured in the following stoichiometry: In non-self-fabricating models the enzyme synthesis is not explicitly modelled. Instead, the demand for precursors is often modelled by a virtual biomass reaction. This reaction needs to be imposed on the model and is often based on experimental data. To compare our selffabricator model with a non-self-fabricating model, we will assume that the biomass reaction is equal to the mean of the enzyme synthesis reactions, i.e., Note that, since we do not keep track of the production of enzymes and ribosomes anymore, the bottom half of the stoichiometric matrix drops out.
We then add the balanced growth assumption to the model, demanding that all concentrations of all cellular compounds are constant. The net synthesis and consumption of the compounds is captured by the stoichiometry, but on top of that, the compounds also dilute by growth. We thus have where v and w are the flux vectors of metabolic reactions (catalyzed by enzymes) and enzyme synthesis reactions (catalyzed by the ribosome), x is the vector of metabolite concentrations, and e is the vector of enzyme concentrations, including the ribosome concentration this time as its last entry.
Linearized models do not include a dilution by growth term. This is probably because metabolite and enzyme concentrations are not explicitly calculated in these models, and the dilution rate is therefore unknown. Instead, a steady-state assumption is used where w is now the biomass reaction rate.

Introducing enzyme/ribosome kinetics
In the model we choose the following rate equations v trans = e trans glc ext k cat,trans where we have denoted the concentrations of metabolites by using lowercase letters. The α-variables denote the fractions of the ribosome that are translating the corresponding enzyme. Although we could have gone for a more general choice, for the current model we choose the same rate equation for the synthesis of the different enzymes: Here we encounter another difference between the general self-fabricator setup and the conventional models. Models with a virtual biomass reaction implicitly assume that the synthesis of all cellular components is equally fast. In contrast, here we could have picked different rate equations for the synthesis of the different enzymes. If the enzyme concentrations would then change because of a different resource allocation, the average rate of enzyme synthesis by the ribosome would change accordingly.

A definition of the volume and an induced growth rate expression
We argue in the main text, and in SI1, that the cellular volume should be the sum of the volumes of its components. Dividing the resulting expression for the volume by the volume on both sides gives: where the ρ, σ-variables are volumetric parameters. For example, ρ X is the volume of a mole of metabolite X. In the model, we choose: ρ X = 0.005, ρ ATP = 0.0002, and σ trans = 0.6538, σ resp = 1.1346, σ rib = 1.2231. This choice will be motivated shortly. Under the balanced growth assumption (Equation (13)) the assumption of Equation 16 is equivalent to an expression for the growth rate: We then introduce a j = such that the expression for the growth rate simplifies to: This notation shows that all reactions can directly contribute to the growth rate. The coefficients a j capture the volume added by the reaction itself, and b j denotes the volume added by the synthesis reaction of the corresponding enzyme. This added volume is equal to the sum of the volumes of the products minus the sum of the volumes of the substrates. Based on the assumption that chemical reactions do not drastically change the volume of the reagents, we have chosen our ρ, σ-parameters such that the coefficients are close to zero: In a typical balanced growth state, the metabolic fluxes (v) are much higher than the enzyme synthesis fluxes (w), so with Equation (18) we can see that the transport reaction will contribute most to volume growth. This seems reasonable: "the transport reaction imports volume''.
In conventional models the growth rate is often assumed to be proportional to the biomass reaction. In the model of self-fabrication that we consider above, however, the growth rate seems to be proportional to the transport reaction. More generally, as long as the assumption holds that the volumes of the products is approximately equal to the volumes of the substrates for each chemical reaction, the growth rate in self-fabrication models will always be determined by the exchange reactions. Below, we will investigate if defining the growth rate as proportional to the biomass reaction is a good approximation, and when it breaks up.

The complete model
Our model has now become the following set of differential equations: µ = a trans e trans f trans (glc ext , x) + a resp e resp f resp (x, atp) If we pick the ribosome allocation fractions α and assume that the systems is in a steady-state, the system has 6 equations for 6 variables: (x, atp, e trans , e resp , r, µ). In the main text we solve the last four of these equations and insert them in the first two to get our so-called balanced growth equations. This is however not necessary in order to solve the optimization problem and, to increase legibility, we will not do this here. Even for a model this small, the balanced growth equations in full are unwieldy. Instead, we can now use the three α-variables to maximize the growth rate under the above steady-state conditions and the constraint that α trans + α resp + α rib ≤ 1. We get max α µ, subject toẋ =ȧ tp =ė trans =ė resp =ṙ = 0, µ = a trans v trans + a resp v resp + b trans w trans + b resp w resp + b rib w rib , α trans + α resp + α rib ≤ 1, α trans , α resp , α rib ≥ 0.

Results
We start by doing a full run of two models: a model of self-fabrication and a non self-fabricating version. For a range of glucose concentrations we find the α's that maximize the growth rate in the first model, and the e's that maximize the biomass production rate in the second model. We see in Figures S1 and S2 that the model of self-fabrication is indeed well approximated by the conventional model. In the following, we will highlight three nonlinearities that are incorporated in the self-fabrication model. Two of these nonlinearities cannot be seen in the conventional model, and we investigate where these differences show.

Enzyme concentrations and fluxes are no longer proportional
A first nonlinearity that is incorporated in both models is due to the nonlinear enzyme kinetics. The dependency of the enzyme activity on the availability of its substrates and products makes sure that fluxes and enzyme concentrations are no longer forced to be proportional. In Figure S3 we plotted the enzyme concentrations and the corresponding fluxes that were calculated by the model. When nonlinear enzyme kinetics are not modelled this would be a straight diagonal line.

The biomass composition changes
In our model of self-fabrication the synthesis of the enzymes and ribosomes is modelled explicitly. The demand for precursors might vary between different enzymes, which is captured in the M -matrix of Equation (11). If the enzyme concentrations change, the precursor demand can change with it: the model thus calculates the demand for cell synthesis. This effect is visible in our model, although it is a small effect, see Figure S4. In conventional models a virtual biomass reaction is added and therefore the demand for cell synthesis is imposed on the model. This will therefore always be constant over the different growth rates. The effect that is shown is small because of the choice of M -matrix: the synthesis of all enzymes and the ribosome requires more ATP than X. Therefore, the decrease in the concentration of the transport-enzyme and the increase of the concentrations of the respiration-enzyme and the ribosome does not change the precursor demand that much. One could however imagine that, especially in larger models, there could be precursors for which the demand per enzyme synthesis reaction varies more. The change in the precursor demand will then have a larger effect on the model behaviour. We changed the M -matrix to emphasize this changing precursor demand a bit more: In Figure S5 we indeed see that the effect is now more pronounced.

The growth rate is not always proportional to the biomass production rate
In Section 3.3 we discussed the expression for the growth rate that is used in our self-fabrication models: where a j and b j denote the differences in the summed volume of the products with the summed volumes of the substrates. In principle, each reaction thus contributes to volume growth and hence to growth rate. However, there are two assumptions that we can make in which this simplifies. First, if for each reaction the summed volume of products is approximately the same as the summed volume of the substrates, then a j and b j will be negligible except for reactions that transport to or from the cell. Second, if we assume that metabolism is in a steady-state, all volume contributions of created internal compounds will cancel because these compounds will also be degraded. In both cases, only the exchange reactions will explicitly contribute to growth. In a situation where both of these assumptions cannot be made, for example in a dynamic model in which polymers take up more or less volume than their constituents because of their folding properties, our definition of the growth rate will clearly deviate from a growth rate that is defined to be proportional to a biomass reaction.
One could now ask if the rate of the biomass reaction rate is a good approximation of the cellular growth rate when one or both of the assumptions is met? In our base model, of which the results were shown in Figure S1, the cellular growth rate is indeed directly proportional to the total synthesis rate. This is because of the following: in our base model the transport step is the only exchange reaction and therefore the only reaction that contributes to volume growth. If the substrate yield (the ratio of the biomass reaction and the transport reaction) is constant, this implies that the growth rate must be proportional to the biomass reaction rate. The substrate yield is indeed quite constant because almost all imported nutrients will be converted to biomass, because the dilution of metabolites is negligible.
The above argumentation immediately indicates that the biomass reaction rate would not approximate the cellular growth rate if the dilution of metabolites is not negligible. We therefore again changed the M -matrix: ] . (20) We also adjust the volumetric parameters ρ k , σ j to keep the summed volume of the products of each reaction more or less equal to the summed volume of the substrates:  The results are shown in Figure S6. The growth rate and the biomass reaction rate are indeed no longer proportional. This is caused by the large dilution rate of the metabolites.

Solutions are not conserved under multiplication
Elementary Flux Modes are defined up to a multiplicative factor. This means that a steady-state flux vector can always be multiplied by a positive number to give another steady-state flux vector. In conventional resource-allocation models it is often assumed that v i = e i k cat,i f i (x), so that the reaction rate is proportional to the enzyme concentration when the metabolite concentrations are kept fixed. This means that a steady-state solution can always be scaled by multiplying all enzyme concentrations by the same factor. Indeed, scalar multiplication of the enzyme concentrations leads to the same multiplication of the reaction rates, and this again leads to a steady state by the aforementioned property of EFMs. For example, assume that (x 0 , atp 0 , e trans,0 , e resp,0 , r 0 ) corresponds to a steady-state solution. In a conventional model, the steady-state equations take the form If we keep the metabolite concentrations constant, and multiply all enzyme concentrations by λ ∈ R, we get for the i-th element of N v, The above is no longer true in a self-fabrication model. We see in Section 3.4 that even if we keep the metabolite concentrations fixed, a solution cannot be easily scaled to another solution. For example, assume that (x 0 , atp 0 , e trans,0 , e resp,0 , r 0 , µ 0 ) corresponds to one balanced growth state. We thus have If we now multiply the enzyme and the ribosome concentrations by λ, we find that The differential equation for the ribosome waṡ but now becomesṙ Similarly, the steady-state equations for the enzymes and the metabolite concentrations are no longer met. Inspection of these equations would suggest multiplying the α-variables by λ too. However, then the time derivatives of the metabolite concentrations will no longer be zero. The above shows that, in contrast to conventional models, the reaction rates in self-fabrication models are not proportional to the growth rate within an Elementary Mode. While Elementary Flux Modes could thus be defined up to this multiplicative factor, this is no longer possible for Elementary Growth Modes. This is why the concept of the Elementary Growth States was introduced.

A two pathway toy model showing overflow metabolism
We now add a fermentation reaction to the model as an alternative for the respiration reaction. It converts the metabolite X in 12 ATP-molecules, instead of the 26 molecules produced by respiration. Furthermore, we set the k cat of the fermentation reaction to be higher than the respiration reaction. These adaptations are based on the model of overflow metabolism that we used in (4). Additionally, we introduce an additional constraint on the concentration of the membrane protein e trans ≤ 0.2. This gives the results shown in S7.
This model illustrates that the co-utilization of two Elementary Modes can still be caused by growth rate maximization under two constraints. This extends the results found for EFMs in (4) to EGMs.

Conclusion
By working out this toy model we have illustrated some important differences between models of selffabrication and conventional models of metabolism. We have pointed out that 1) enzyme concentrations and fluxes are not necessarily proportional, 2) the biomass composition can change, 3) the growth rate is not necessarily proportional to the biomass production rate, 4) flux ratios in Elementary Modes are not constant over different growth rates and metabolite concentrations.
Moreover, we have also explored under what assumptions the above nonlinearities are approximately linear again. This is important because the conventional models were highly successful in fitting experimental data, but less biologically accurate in terms of their mechanistic underpinnings. To reproduce the experimental data with a model of self-fabrication we need some additional assumptions, for example that the biomass composition does not change drastically over different conditions. This thus indicates the importance of assumptions that were first taken for granted.

Extensions of EGM theory 4.1 Including non-enzymatic reactions
The metabolic ODEs are now augmented with non-enzymatic reactions with rate vector u(x), and stoichiometry matrix S, so thaṫ Setting c j = ∑ k ρ k S kj , the definition of µ may be succinctly phrased as The steady state equations for x now read Using the same line of reasoning as before, Eq (21) in the main text becomes Note that the ribosome concentration now appears in the equations, since it occurs on the left hand side, but not on the right due to the nonenzymatic reactions.

Including a positivity constraint for the ribosome concentration
The balanced growth equations were derived by first solving the steady state ribosome and enzyme equations, and substituting the result into the steady state equations for the metabolites. Sinceṙ = r(α n+1 − µ), one does not get immediate information about r in steady state. Since r eventually drops out of the balanced growth equations, because it is a factor in all of them, one needs to derive the steady state ribosome concentration a posteriori. The expression for r is given in Eq (19) in the main text, Since finding any balanced growth solution generally starts with prescribing some metabolite concentration vector x, it is not guaranteed that the resulting ribosome concentration is positive. If it is not, the enzyme concentrations in steady state are all negative as well, since e j = rα j /µ. For certain choices of x, and growth rate µ, some vectors in the polytope P x,µ may correspond to negative ribosome and enzyme concentrations. These should of course be discarded. One could incorporate an additional constraint into the polytope, requiring that the denominator of (29) be positive, This constraint is again of the same nature as the ones in B(x, µ)α = µu m+1 in the definition of P x,µ . Choosing a metabolite vector x such that this constraint is violated for all α ∈ P x,µ implies that the polytope to which the ribosomal constraint is add would be empty. Any biologically reasonable balanced growth solution would have a finite, strictly positive ribosome concentration. Therefore, such solutions will not hit this new constraint.

Including stress responses
Investments in stress responses are often viewed as orthogonal to investments in growth. In EGM theory, this is not necessary. For example, a heat shock response might involve increased expression of chaperone proteins that accelerate the refolding of denaturated proteins. Chaperones themselves do not contribute to the catabolic or anabolic parts of metabolism, and are therefore viewed as not contributing to growth. The chaperones, however, extend the lifetime of proteins (they lower their natural degradation rate, which we have ignored in the whole-cell model but can be put in without any modification of the main results). In this indirect way, chaperones of course increase the growth rate relative to the situation in which they are absent.
An example implementation would be as follows. First we incorporate degradation rates of enzymes, in which d j (T, h) are the degradation rate of enzymes, as a function of temperature (denaturation) and chaperone concentrations h. Next, the chaperones need to be synthesizeḋ net contribution to growth rate by metabolic reaction j mol −1 L b j net contribution to growth rate by enzyme/ribosome synthesis reaction j mol −1 L C x,µ cone corresponding to metabolite concentration x n.a. and growth rate µ P x,µ polytope corresponding to metabolite concentration x n.a. and growth rate µ u j j-th elementary unit vector n.a. Table 1: Overview of all variables and parameters used in this paper.
(here modelled without its own degradation rate, but one could put that in) then one would obtain a new whole cell model, with extended stoichiometry for all the enzymes and now also chaperones. The number of α j increases: the ribosome now needs to be allocated not only over all enzymes and the ribosome, but also over the chaperones.
If the degradation depends linearly on the chaperone concentration, the resulting balanced growth equations would have the same qualitative properties as before. The polytope P x,µ,T of all balanced growth solutions at metabolite concentrations x with growth rate µ and temperature T now has new vertices, which are again called EGMs. Depending on T , the maximal growth rate solution may have a positive α k for the chaperones. In that case, the investment of synthesising chaperones leads to a higher growth rate than if this investment were not made. Similar implementations may be given for other stress responses. A second example is cleaning up of toxins (see Fig 6 in the main text, and code below), either made as an inevitable byproduct of certain metabolic reactions, or passively diffuse into the cell. Such toxins could for instance lower the k cat of metabolic reactions, leading to a lower growth rate. As long as cleaning up such toxins using specific proteins is implemented using kinetics that are linear in those protein concentrations, the EGM theory applies. If the effect of the toxin is sufficiently detrimental to metabolic reaction rates, the growth rate maximiser will feature investment into proteins that naturalise those toxins.