Why Optimal States Recruit Fewer Reactions in Metabolic Networks

The metabolic network of a living cell involves several hundreds or thousands of interconnected biochemical reactions. Previous research has shown that under realistic conditions only a fraction of these reactions is concurrently active in any given cell. This is partially determined by nutrient availability, but is also strongly dependent on the metabolic function and network structure. Here, we establish rigorous bounds showing that the fraction of active reactions is smaller (rather than larger) in metabolic networks evolved or engineered to optimize a specific metabolic task, and we show that this is largely determined by the presence of thermodynamically irreversible reactions in the network. We also show that the inactivation of a certain number of reactions determined by irreversibility can generate a cascade of secondary reaction inactivations that propagates through the network. The mathematical results are complemented with numerical simulations of the metabolic networks of the bacterium Escherichia coli and of human cells, which show, counterintuitively, that even the maximization of the total reaction flux in the network leads to a reduced number of active reactions.

1. Introduction. The mathematical modeling of biological networks has focused on the influence of the network structure on the functional properties of the system [1,2,20]. Insights provided by these studies have shown, for example, that structural modules and hierarchical organization in the network are often related to compartmentalization of functional processes [21,27]. This is important since intracellular processes are rarely carried out by individual elements, and often involve the coordinated activity of multiple genes, proteins, and biochemical transformations. Because different components may be recruited for different processes, the most fundamental aspect of the dynamics of complex intracellular networks concerns precisely the characterization of the specific parts of the network that are active under given conditions. Recent research focused on the modeling of metabolic networks has shown that typical metabolic states tend to recruit a much larger number of reactions than states that maximize growth rate [18]. This counterintuitive property is important in multiple contexts. For example, it provides a partial explanation for the apparent dispensability of a large fraction of genes in single-cell organisms [19,5,7], since the genes associated with reactions that become inactive in growth-maximizing states are expected not to be essential. This also explains why genetic and environmental perturbations that cause growth defect are accompanied by a burst in reaction activity [9,10]. These bursts can be attributed to the transient activation of otherwise inactive reactions that are recruited by the suboptimal states that follow the perturbation [18], since such states tend to have a larger number of active reactions. Another important implication concerns the possibility of synthetic rescues [17,13], where the inactivation of one gene can be compensated by the targeted inactivation of other genes. The inactivation of such rescue genes can thus allow the recovery of lost biological function. This is possible in part because the rescue genes correspond to genes that would be inactive in an optimal state, so that disabling them helps bring the state of the system closer to the desired optimal state [16]. Understanding the root causes of the reduced reaction activity in optimal metabolic states is then of significant interest in the characterization and study of cellular metabolism.
Focusing on steady-state dynamics, here we use flux-balance based analysis and linear programming techniques to establish rigorous results on the number of reactions that can be active in a given metabolic network. We derive conditions for a specific reaction to be inactive in all (Sec. 3) or active in almost all (Sec. 4) feasible metabolic states. We also establish bounds for the number of reactions that can be active in states that optimize an arbitrary linear function of reaction fluxes (Sec. 5), which are derived based on the duality principle in linear programming, and we study the uniqueness of the optimal solution for typical linear objective functions (Sec. 6). Finally, we implement numerical simulations in reconstructed Escherichia coli and human metabolic networks, both to compare with the rigorous bounds and to consider nonlinear objective functions (Sec. 7). Taken together, our results show that the reduced number of active reactions in the optimal states of a linear objective function, including growth rate, are mainly determined by the presence of irreversible reactions in the network. The irreversibility constraints are shown to play a role also in nonlinear objective functions, such as the aggregated flux and mass flow activity, whose optimal solutions are shown to have a number of active reactions comparable to the corresponding number for linear functions.
2. Preliminary remarks. We consider time-independent metabolic states, which serve as an appropriate representation of the state of single cells at time scales much shorter than the lifetime of the cells, as well as of the average behavior of a large population of cells at arbitrary time scales in time-invariant conditions. Under this steady-state assumption, a cellular metabolic state is a solution of a homogeneous linear equation that accounts for all stoichiometric constraints, where S is the m × N stoichiometric matrix and v ∈ R N is the vector of metabolic fluxes. The components of v = (v 1 , . . . , v N ) T include the fluxes of n internal and transport reactions as well as n ex exchange fluxes, which model the transport of metabolic species across the system boundary. Constraints of the form v i ≤ β i imposed on the exchange fluxes are used to limit the maximum uptake rates of substrates in the medium. Additional constraints of the form v i ≥ 0 arise for the reactions that are irreversible. Assuming that the cell's operation is mainly limited by the availability of substrates in the medium, we impose no other constraints on the internal reaction fluxes, except for the ATP maintenance flux, which is set to a fixed positive value. These additional constraints can be organized in the form The set of all flux vectors v satisfying Eqs. (1) and (2) defines the feasible solution space M ⊂ R N , representing the capability of the metabolic network as a system. Because the number of fluxes N is larger than the number of metabolic species m, system (1) is under-determined and M is generally high dimensional.
Our study is formulated in the context of flux balance analysis [6,29], which is based on the maximization of a metabolic objective function c T v within the feasible solution space M (the superscript T is used to denote transpose). This reduces to a linear programming problem maximize: where we set α i = −∞ if v i is not bounded from below and β i = ∞ if v i is not bounded from above. For a given objective function, we can numerically determine an optimal flux distribution for this problem. This formulation is also appropriate for the derivation of the rigorous results presented below. In the particular case of growth maximization, the objective vector c is taken to be parallel to the biomass flux, which is modeled as an effective reaction that converts metabolic species into biomass.
3. Inactivity due to stoichiometric constraints. The first question of interest is to determine the conditions under which a reaction will be inactive for any solution of Eq. (1). Let us define the stoichiometric coefficient vector of reaction i to be the ith column of the stoichiometric matrix S. We similarly define the stoichiometric coefficient vector of an exchange flux. If the stoichiometric vector of reaction i can be written as a linear combination of the stoichiometric vector of reactions/exchange fluxes i 1 , i 2 , . . . , i k , we say that i is a linear combination of i 1 , i 2 , . . . , i k . We use this linear relationship to completely characterize the set of all reactions that are always inactive due to the stoichiometric constraints, regardless of any additionally imposed constraints, such as the availability of substrates in the medium, reaction irreversibility, cell maintenance requirements, and optimum growth condition.
Since v i = 0, we can divide this equation by v i to see that s i is a linear combination of To prove the backward direction, suppose that s i = k =i c k s k . If we choose v so that v k = c k for k = i and v i = −1, then for each j, we have Theorem 1 holds true independently of other constraints because the proof does not involve Eq. (2). In particular, the sufficient condition for inactivity applies to any nutrient medium condition and does not depend on the reversibility of the reactions under consideration. In the case of the reconstructed E. coli (human) metabolic network considered in this study (described in Sec. 7), which includes 922 (3328) unique internal and transport reactions, a total of 141 (475) reactions are always inactive in steady states as a result of the condition in Theorem 1.
4. Activity in typical steady states. The next question of interest concerns the number of reactions that will be active with probability one in typical metabolic states. The stoichiometric constraints Sv = 0 define the linear subspace Nul S = {v ∈ R N | Sv = 0} (the null space of S), which contains the feasible solution space M . However, the set M can possibly be smaller than Nul S because of the additional constraints arising from environmental and physiochemical properties (availability of substrates in the medium, reaction irreversibility, and cell maintenance requirements). Therefore, M may have smaller dimension than Nul S. If we denote the dimension of M by d, there exists a unique d-dimensional linear submanifold of R N that contains M , which we denote by L M . We can then use the Lebesgue measure naturally defined on L M [23] to make probabilistic statements, since we can define the probability of a subset A ⊆ M as the Lebesgue measure of A normalized by the Lebesgue measure of M . In particular, we say that v i = 0 for almost all v ∈ M if the set {v ∈ M | v i = 0} has Lebesgue measure zero on L M . An interpretation of this is that v i = 0 with probability one for an organism in a random state under given environmental conditions, which can be used to prove the following theorem. Consequently, the number n + (v) of active reactions satisfies where n m 0 is the number of inactive reactions due to the stoichiometric constraints (characterized by Theorem 1) and n e 0 is the number of additional reactions in the category 1 above, which are due to the environmental and irreversibility conditions. Combining this result with the finding that optimal states have fewer active reactions (next section), it follows that a typical state v ∈ M is non-optimal. Equation (4) will lead to a different number of active reactions for different nutrient medium conditions [determined by Eq. (2)], with the general trend that this number will be larger in richer medium conditions. In the case of the E. coli (human) reconstructed network simulated in glucose minimal medium, as considered here (see Sec. 7), the number n e 0 of inactive internal and transport reactions is 182 (1274), of which 158 (563) are due to environmental limitations and 24 (711) are due to reaction irreversibility. The latter includes the cascading-induced inactivation of some reactions due to the inactivation of different, irreversible reactions. 5. Activity in optimal states. We now turn to the central part of our study, which concerns the number of reactions that can be active in steady states that optimize a linear function of the metabolic fluxes. The linear programming problem for finding the flux distribution maximizing a linear objective function can be written in the matrix form: where A and b are defined as follows. If the ith constraint is v j ≤ β j , the ith row of A consists of all zeros except for the jth entry that is 1, and b i = β j . If the ith constraint is α j ≤ v j , the ith row of A consists of all zeros except for the jth entry that is −1, and b i = −α j . A constraint of the type α j ≤ v j ≤ β j is broken into two separate constraints and represented in A and b as above. The inequality between vectors is interpreted as inequalities between the corresponding components, so if the rows of A are denoted by a T 1 , a T 2 , . . . , a T K , the inequality Av ≤ b represents the the problem can be compactly expressed as maximizing c T v in M . The duality principle [4] expresses that any linear programming problem (primal problem) is associated with a complementary linear programming problem (dual problem), and the solutions of the two problems are intimately related. The dual problem associated with problem (5) is where {u 1 , u 2 } is the dual variable. A consequence of the Strong Duality Theorem [4] is that the primal and dual solutions are related via a well-known optimality condition: v is optimal for problem (5) if and only if there exists {u 1 , u 2 } such that Note that each component of u 1 can be positive or zero, and we can use this information to find a set of reactions that are forced to be inactive under optimization, as follows. For any given optimal solution v 0 , Eq. (10) is equivalent to , then the irreversibility constraint is binding, and the reaction is inactive (v i = 0) at v 0 . In fact, we can say much more: we prove the following theorem stating that such a reaction is actually required to be inactive for all possible optimal solutions for a given objective function c T v. Theorem 3. Suppose {u 1 , u 2 } is a dual solution corresponding to an optimal solution of problem (5). Then, the set M opt of all optimal solutions of problem (5) can be written as and hence every reaction associated with a positive dual component is binding for all optimal solutions in M opt .
Sketch of proof. Let v 0 be the optimal solution associated with {u 1 , u 2 } and let Q denote the right hand side of (11). Any v ∈ Q is an optimal solution of problem (5), since straightforward verification shows that it satisfies (8-10) with the same dual solution {u 1 , u 2 }. Thus, we have Q ⊆ M opt . Conversely, suppose that v is an optimal solution of problem (5). Then, v can be shown to belong to H, which we define to be the hyperplane that is orthogonal to c and contains v 0 , i.e., This, together with the fact that v satisfies Sv = 0 and Av ≤ b, from (8), can be used to show that v ∈ Q. Therefore, any optimal solution must belong to Q.
Putting both directions together, we have Q = M opt .
As an example, consider the five-reaction network shown in Fig. 1(a) where the flux v 4 is maximized. The problem can be written in the form of problem (5) with Note that the equality constraint v 1 = 1 is split into two inequality constraints  For this dual solution, we have u 11 = 1 > 0 (corresponding to the constraint v 1 ≤ 1) and u 13 = 1 > 0 (corresponding to the constraint −v 2 ≤ 0), and Eq. (11) in Theorem 3 becomes Note that the constraint v 1 = 1 can be omitted since it is satisfied by any v ∈ M in this example. Once we solve problem (5) numerically and obtain a single pair of primal and dual solutions (v 0 and {u 1 , u 2 }), we can use the characterization of M opt given in Eq. (11) to identify all reactions that are required to be inactive (or active) for any optimal solutions. To do this we solve the following auxiliary linear optimization problems for each i = 1, . . . , N : If the maximum and minimum of v i are both zero, then the corresponding reaction is required to be inactive for all v ∈ M opt . If the minimum is positive or maximum is negative, then the reaction is required to be active. Otherwise, the reaction may be active or inactive, depending on the choice of an optimal solution. Thus, we obtain the numbers n opt + and n opt 0 of internal and transport reactions that are required to be active and inactive, respectively, for all v ∈ M opt . The number of active reactions for any v ∈ M opt is then bounded as n opt + ≤ n + (v) ≤ n − n opt 0 .
We can also use Theorem 3 to further classify those inactive reactions caused by the optimization as due to two specific mechanisms: 1. Irreversibility. The irreversibility constraint (v i ≥ 0) on a reaction can be binding (v i = 0), which directly forces the reaction to be inactive for all optimal solutions. Such inactive reactions are identified by checking the positivity of dual components (u 1i ). 2. Cascading. All other reactions that are required to be inactive for all v ∈ M opt are due to a cascade of inactivity triggered by the first mechanism, which propagates over the metabolic network via the stoichiometric constraints. These inactive reactions occur in addition to the irreversibility/cascading-induced reaction inactivation identified in Sec. 4 for typical (in fact all) steady states.
In general, a given solution of problem (5) can be associated with multiple dual solutions. The set and the number of positive components in u 1 can depend on the choice of a dual solution, and therefore the categorization according to these specific mechanisms is generally not unique. For the example problem of Fig. 1 categorizing v 3 and v 5 under irreversibility, and v 2 under cascading. One can indeed see graphically in the projection onto (v 3 , v 4 , v 5 )-coordinates in Fig. 1(b) that the maximization of v 4 under the stoichiometric constraint v 3 + v 4 + v 5 = 1 (which follows from v 1 = 1 and Sv = 0) forces the irreversible fluxes v 3 and v 5 to be zero, which in turn forces v 2 = 0 [ Fig. 1(c)]. Clearly, one can also characterize the optimal solution space as  The results above are important both because they can be applied to any linear objective function and because a significant fraction of real metabolic reactions are irreversible. In the case of the E. coli (human) reconstructed network, a total of 73.4% (65.6%) of all internal and transport reactions are irreversible. Moreover, a number of other reactions are effectively irreversible because the irreversibility of different reactions in the same pathway constrains them not to run in one of the two directions; this leads to a total of 94.3% (74.9%) of the reactions whose fluxes are either necessarily nonnegative or necessarily nonpositive in all steady-state solutions. In the case of growth-maximizing states for the conditions considered in our numerical experiments, out of all 922 (3328) internal and transport reactions in the reconstructed network, a total of 146 (106) reactions are inactive due to irreversibility constraints, and a total of 114 (293) other reactions are inactive due to a cascade of reaction inactivation; some of the irreversible reactions can be assigned to either the first or the second of these two groups. The bounds provided by Eq. (12) depend on the objective function. In the case of growth-maximizing states, the lower and upper bounds are 273 (113) and 339 (1180), respectively. These numbers should be compared with the number 599 (1579) of reactions that are active in typical, suboptimal states. This clearly shows that optimal states are necessarily constrained to have a smaller number of active reactions, and that this is due to the presence of irreversible reactions in the network. 6. Typical linear objective functions. Another problem of interest concerns the uniqueness of the optimal solutions. While a number of necessary and/or sufficient conditions for this uniqueness are known [28,15], we are not aware of any probabilistic statements in the literature addressing this issue. Since the feasible solution space M is convex, its "corners" can be mathematically formulated as extreme points, defined as points v ∈ M that cannot be written as v = ax + by with a + b = 1, 0 < a < 1 and x, y ∈ M such that x = y. Intuition from the twodimensional case (Fig. 2) suggests that for a typical choice of the objective vector c such that problem (5) has a solution, the solution is unique and located at an extreme point of M . We prove here that this is indeed true in general, as long as the objective function is bounded on M , and hence an optimal solution exists. By definition, it can be written as v = ax + by with a + b = 1, 0 < a < 1 and x, y ∈ M such that x = y. Since v is the only solution of problem (5), x and y must be suboptimal, and hence we have c T x < c T v and c T y < c T v. Then, and we have a contradiction with the fact that v is optimal. Therefore, if M opt consists of a single point, it must be an extreme point of M .
We are left to show that the set of c ∈ B for which M opt (c) consists of multiple points has Lebesgue measure zero. By Theorem 3, for a given c, there exists a set of (14) where the union is taken over all I ⊆ {1, . . . , K} for which Q I contains multiple points. If c is in one of the sets in the union in Eq. (14), the set Q I , being the set of all optimal solutions, is orthogonal to c. Hence, c is in Q ⊥ I , the orthogonal complement of Q I defined as the set of all vectors orthogonal to Q I . Therefore, Because Q I is convex, it contains multiple points if and only if its dimension is at least one, implying that each Q ⊥ I in the union in Eq. (15) has dimension at most N − 1, and hence has zero Lebesgue measure in R N . Since there are only a finite number of possible choices for I ⊆ {1, . . . , K}, the right hand side of Eq. (15) is a finite union of sets of Lebesgue measure zero. Therefore, the left hand side also has Lebesgue measure zero.
Note that growth rate is not a typical objective function. Because this objective function has nonrandom coefficients and involves only a fraction of all metabolic fluxes, the objective vector c is generally perpendicular to a surface limiting the space of feasible solutions. For this reason, the growth-maximizing states are generally not unique. In the case of the E. coli reconstructed network simulated in glucose minimal medium, our numerical calculations indicate that the growth-maximizing solutions form a space that is 26-dimensional. In the case of the human reconstructed network, the corresponding dimension is 494. For other nonlinear objective functions of biological significance in cellular metabolism, we refer to Ref. [24]. Figure 3 shows the results of our numerical experiments for φ f and φ m on the E. coli reconstructed network. In both cases, the maximum (minimum) of the objective function for a given growth rate decreases (increases) as the growth rate increases, and it converges to essentially a single intermediate value for all states that maximize growth rate [ Fig. 3(a,b)]. The average activity, which we determined by randomly sampling the solution space, is essentially constant (it decreases very slowly as the growth rate increases). The sampling of the solution space was performed using the hit-and-run method [26], which is an efficient algorithm to sample highdimensional convex regions. Our implementation of this method is as described in our previous study [7] and involves artificial centering [12]. The observed behavior of the total flux activity and total mass flow activity should be contrasted with metabolic activity as measured in terms of the number of active reactions, which is significantly smaller at growth-maximizing states.
The average number of active reactions in the φ f -maximizing states across different growth rates is just 302 out of a total of 571 that would be active in typical states (the latter too is an average over different growth rates, and is smaller than the number 599 anticipated in Sec. 5 because of additional constraints set to the diverging cycles throughout this section-see below). A similar result holds true for φ m -maximizing states (Table 1). Therefore, the maximizations of total flux and total mass flow in the network also lead to a reduced (rather than increased) number of active reactions compared to typical states [such as those determined by the hit-and-run method in Fig. 3(a,b)]. This number varies very little with growth rate and is essentially undistinguishable from the number of active reactions in growthmaximizing states (see standard deviations in Table 1). While these results concern E. coli, we note that similar trends are observed for the human metabolic network.
If the irreversible reactions are made reversible [ Fig. 3(c,d)], then the number of active reactions increases. For states that minimize φ f and φ m , the number of active reactions jumps to a large number when α i of the irreversible reactions is assigned to be just slightly negative, and then decreases as the irreversibility  (Table  1).
constraints are further relaxed (Table 1). For states that maximize φ f and φ m , the increase in the number of active reactions is by a factor of nearly 2 (Table 1). This number is comparable to the number of active reactions in typical suboptimal states of the original network. Therefore, like in the case of linear objective functions, the reduced number of active reactions found in states that maximize the total flux or the total mass flow is due to the presence of irreversible reactions in the metabolic network.
All simulations presented in this paper are based on a reconstructed metabolic network of E. coli K-12, which represents a further curated version of the iJR904 model [22] in which duplicated reactions have been removed, and on the most Table 1. Number of active reactions in states maximizing or minimizing the total flux, φ f , and the total mass flow, φ m . The relaxation of the irreversibility constraints is implemented by allowing α i to be negative for all irreversible reactions, as indicated in the leftmost column. Each column shows the average and standard deviation calculated over the growth rates considered in Fig. 3. complete reconstructed human metabolic network, generated by applying the same curation to the Homo sapiens Recon 1 model [8]. The E. coli (human) network used consists of 922 (3328) reactions, 901 (1491) enzyme-and transport protein-coding genes, 618 (2766) metabolites, 143 (404) exchange fluxes, and the biomass flux.
For the E. coli network, the simulated medium had limited amount of glucose (10 mmol/g DW-h) and oxygen (20 mmol/g DW-h), and unlimited amount of sodium, potassium, carbon dioxide, iron (II), protons, water, ammonia, phosphate, and sulfate; the flux through the ATP maintenance reaction was set to 7.6 mmol/g DW-h. For the human network, we used a medium with limited amount of glucose (1 mmol/g DW-h) and unlimited amount of oxygen, sodium, potassium, calcium, iron (II and III), protons, water, ammonia, chlorine, phosphate, and sulfate; for the biomass composition, we followed Ref. [25]. In our simulations, 10 −6 mmol/g DW-h was used as a flux threshold to define the set of reactions considered active. A few cycles whose flux or mass flow would diverge in the optimization of the corresponding objective function were assigned the minimum feasible flux in the optimization of φ f and the minimum feasible mass flow in the optimization of φ m (under the constraint of not altering the fluxes of the other reactions). These minimum flux values were also adopted as bounds in our hit-and-run sampling. All numerical calculations were implemented using the COBRA Toolbox [3] and the CPLEX optimization software [11].