Designing Metabolic Division of Labor in Microbial Communities

Understanding how microbes assemble into communities is a fundamental open issue in biology, relevant to human health, metabolic engineering, and environmental sustainability. A possible mechanism for interactions of microbes is through cross-feeding, i.e., the exchange of small molecules. These metabolic exchanges may allow different microbes to specialize in distinct tasks and evolve division of labor. To systematically explore the space of possible strategies for division of labor, we applied advanced optimization algorithms to computational models of cellular metabolism. Specifically, we searched for communities able to survive under constraints (such as a limited number of reactions) that would not be sustainable by individual species. We found that predicted consortia partition metabolic pathways in ways that would be difficult to identify manually, possibly providing a competitive advantage over individual organisms. In addition to helping understand diversity in natural microbial communities, our approach could assist in the design of synthetic consortia.

KEYWORDS division of labor, flux balance analysis, genome-scale stoichiometric models, metabolic networks, microbiome, mixed-integer linear programming, resource allocation, synthetic ecology, microbial communities E ach microbial cell harbors a finite number of metabolic gene functions. The specific assortment of functions in a given organism thus represents the outcome of a trade-off between the cost of expressing different genes and the benefit of expression of those genes under conditions of different environments. This trade-off is considered to be one of the possible drivers of diversity in natural microbial communities, giving rise to metabolically differentiated groups rather than an individual superorganism (1)(2)(3)(4)(5)(6)(7)(8). The emergence of metabolically differentiated subpopulations from isogenic populations has also been documented to occur in a fixed environment (9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21). The viability of coexisting populations of metabolically differentiated strains or species is often enabled by the exchange of metabolites (7, 11-13, 17, 18, 22-29). For example, initially identical populations of Escherichia coli that evolved on minimal glucose medium have been observed to give rise to a specialized subpopulation of cells that use the acetate secreted as a by-product of glucose fermentation (12,13,17). More broadly, metabolic interactions mediated by the exchange of small molecules help maintain the diversity and stability of natural microbial communities and allow communities to accomplish tasks that are metabolically intensive (5,6,23,30). Moreover, obligate metabolic interdependencies (such as mutualism) are believed to contribute to the high prevalence of unculturability and fastidiousness among natural microbial strains (7,(31)(32)(33)(34)(35).
A recent and increasingly common strategy to study microbial interdependencies is the construction (or evolution) of artificial microbial consortia specifically designed to display obligate mutualism. Current approaches to building synthetic communities of interacting microbes have so far mainly relied on intuition about simple genetic perturbations that would cause organisms to engage in obligate cross-feeding. In these interactions, one strain is unable to synthesize an essential metabolite (e.g., an amino acid) that is supplied via overproduction or leakage by another strain (36)(37)(38)(39)(40)(41)(42)(43). This ensures that the two strains require each other's presence in order to grow. While interesting and valuable, these strategies explore only a small portion of the very large and complex space of possible environmental and organismal modifications; in principle, organisms may have the potential to display complex cross-feeding strategies for multiple metabolites simultaneously or in an environment-dependent manner (44,45). In fact, given the complexity of metabolism and its evolutionary history, it is possible that naturally evolved cross-feeding strategies may involve complex metabolic mutualism beyond single amino acid exchanges (46). In particular, loss of functions in one organism due to compensation by others has been hypothesized to be widespread (47) and may involve multiple genes and complex pathway architectures (29). In the engineering of consortia for specific metabolic engineering tasks, exploring this larger space of possibilities may open up novel strategies for bioproduction.
Surveying the landscape of possible paths for metabolic differentiation leading to obligate mutualism is a combinatorially difficult problem. While future elaborations of existing methods for high-throughput genetic modifications (e.g., multiplex automated genome engineering [MAGE] [14]) may enable a systematic exploration of this space in vivo, computational models can provide a preliminary assessment of the landscape of possible strategies and of how these strategies depend on different constraints on metabolic network complexity. Constraint-based models of metabolic networks, such as flux balance analysis (FBA) (48)(49)(50)(51)(52)(53)(54)(55)(56)(57)(58), can be leveraged specifically to ask questions that cannot be easily addressed experimentally. FBA represents metabolism as a set of biochemical reactions inferred from genome annotations and literature curation and evaluates cellular metabolism as a resource allocation problem. Given a set of biochemical, thermodynamic, and environmental constraints, FBA uses linear programming to determine the distribution of fluxes through a reaction network that satisfies a given optimization objective. Typically, this objective is that of maximizing the flux through a biomass-producing reaction, so FBA determines how a cell should optimally allocate nutrients based on its environment and biochemical capabilities such that the growth rate is maximized. FBA has also been increasingly used to study metabolic interactions in microbial consortia (7, 8, 23, 27-29, 38, 49-56, 59-63), as well as to predict optimal genetic knockouts for metabolite production (48).
Here, we explore how metabolic differentiation emerges from an isogenic population by using a newly developed constraint-based modeling approach which we name "division of labor in metabolic networks" (DOLMN). In particular, using DOLMN, we explore the space of feasible single-strain or multistrain metabolic networks by systematically limiting the number of intracellular and transport reactions in each metabolic model. After introducing the mathematical and integer linear programming formulation of DOLMN, we illustrate its capabilities through an analysis of division of labor based on core carbon metabolism in E. coli (57). We next apply DOLMN to a genome-scale E. coli model (58) and show that metabolically differentiated and interdependent communities are able to exist under stricter reaction constraints than a single, isolated strain and are even able to outcompete the single strain in some cases. Our results broaden knowledge of the scope of possible metabolic interdependencies between metabolically different species, with applications in understanding diversity in natural microbial communities and in designing new artificial consortia.

RESULTS
A method to design division of labor in microbial communities. The problem of metabolic division of labor consists of partitioning a given metabolic network (which we refer to here as the "global network") into subnetworks representing individual organisms (which we also call simply "strains" here) (see minimal example in Fig. 1). Each strain has its own metabolic network that includes intracellular reactions, as well as transport reactions, which determine how it interacts with the environment. Environmental availability of different nutrients is defined by constraints on the exchange reactions, which enable the inflow and outflow of environmental metabolites and by-products. In solving this problem concerning the division of labor, we make specific assumptions that reflect the nature and architecture of metabolic networks across different species as follows. (i) We do not set any specific a priori expectations about the presence of reactions in different strains. Consequently, a reaction from the global network may be selected to appear in one or more strains or may not appear in any strain at all. (ii) We expect each strain's subnetwork to represent a well-connected, fully functional metabolism, so as to be capable of producing that strain's biomass (see Materials and Methods). (iii) Upon simulation of cocultures of multiple coexisting strains, we require that all such strains must have equal growth rates, so that they would be able to stably coexist in a chemostat (8,43,64,65).
To solve this problem, we devised division of labor in metabolic networks (DOLMN), which is formulated as a combinatorial optimization problem. Inputs of DOLMN consist of the global network (encoded in a stoichiometric matrix S and accompanied by upper and lower flux bounds, as in standard FBA formulations; see Materials and Methods); the number of target strains (K); and constraints on the number of intracellular (T IN ) and transport (T TR ) reactions allowed in each strain. Key outputs of DOLMN consist of a binary reaction vector (t) whose elements indicate whether a given reaction is present in a given strain and a continuous flux vector (x) for all reaction rates. Note that there is no specific requirement for two or more strains to end up using different reactions from the global network. A specific solution could entail multiple strains having exactly the same reactions and yet not interacting with each other. We expect division of labor to arise only upon making the T IN or T TR value too small for any individual strain to be able to survive without receiving specific molecular components from a metabolically distinct partner (Fig. 1a). Note that elements of t can switch on or off as a function of the current constraints, irrespective of their state under conditions of different con- represents the constraint on the number of intracellular reactions allowed in each strain. Metabolites X, Y, and Z are required for each network to grow (i.e., produce biomass). (b and c) Two strains can exchange metabolites Y and Z (c), but a single strain can only take up environmental metabolites (b). (a) When T IN ϭ 2, 1-and 2-strain communities perform the same metabolic functions: uptake of metabolite X, conversion of metabolite X to metabolite Y, conversion of metabolite Y to metabolite Z, and creation of biomass from metabolites X, Y, and Z. The strains do not take up or secrete metabolite Y or metabolite Z (indicated as a hollow arrow). When T IN ϭ 1, the presence of an individual strain is no longer feasible because it cannot create metabolite Z (indicated as a hollow circle). The alternative solution, where reaction 1 is knocked out (not shown), is also infeasible because the single strain cannot then create metabolite Y. However, 2-strain communities are still feasible because the strains exchange metabolites Y and Z. If T IN ϭ 1 and the number of transport reactions allowed is constrained to two, then 2-strain communities are no longer feasible (not shown). (b and c) Toy metabolic network and corresponding (community) stoichiometric matrices and reaction binary vectors of 1-strain (b) and 2-strain (c) communities. The value of each stoichiometric coefficient represents the number of moles of each metabolite that participates in a reaction, with the sign indicating if a metabolite is a product (positive) or a reactant (negative). Exchange reactions are represented in black; transport reactions are represented in light green (b), orange (c), or purple (c); and intracellular reactions are represented in dark green (b), orange (c), or purple (c). straints, and that T TR does not differentiate between active transport and passive transport.
DOLMN, described in detail in Materials and Methods, involves the use of mixed integer linear programming (MILP). Our problem is NP (nondeterministic polynomial time) complete. It can be solved exactly for core metabolic network models (i.e., in a global network of ϳ100 reactions), but it requires heuristics and long computational times for genome-scale models (ϳ1,000 reactions). Despite the use of the heuristic speed-up methods, we still obtain a single optimal solution for each value of T TR , T IN , and K.
Metabolic division of labor in E. coli core metabolism. As a first test and illustrative example of DOLMN, we investigated how E. coli core carbon metabolism (57) on minimal glucose medium would be partitioned between two strains (i.e., two trimmed versions of the E. coli core network) for a given limit on the number of allowed reactions (see Fig. 2 and Materials and Methods for details). Besides imposing constraints on the number of reactions allowed in each strain, we further required that both strains have the same growth rate of at least 0.1 h Ϫ1 , effectively simulating stable coexistence in a chemostat (8,43,64,65). Overall, we solved for 156 nontrivial combinations of T TR and T IN , i.e., constraints under which either 1-strain or 2-strain communities could grow (65 nontrivial combinations for a single strain and 91 for two strains).
An interesting outcome of this analysis, obtained for subnetworks containing no more than 11 transport reactions and 26 intracellular reactions, was the discovery of a metabolic strategy in which each of two strains performs half of the tricarboxylic acid (TCA) cycle (Fig. 2). Both strains take up glucose, oxygen, phosphate, and ammonium; secrete carbon dioxide and water; perform glycolysis; and operate the nonoxidative phase of the pentose phosphate pathways (PPP). However, strain x (Fig. 2, orange) takes up four times as much glucose as strain y (Fig. 2, purple); operates the nonoxidative phase of the PPP in the forward direction; and utilizes different reactions in glycolysis. Furthermore, neither strain utilizes the glyoxylate shunt in the TCA cycle; instead, the metabolites oxaloacetate and 2-oxoglutarate split the TCA cycle into roughly two parts. Strain y performs the first part of the TCA cycle, converting oxaloacetate and acetylcoenzyme A (acetyl-CoA) into 2-oxoglutarate and generating NADPH. Meanwhile, strain x generates oxaloacetate, NADH, and ATP through the transformation of 2-oxoglutarate in the second part of the TCA cycle.
Neither of the strains, in this case, was able to perform all needed metabolic functions without the inflow of specific metabolites produced by the partner. In particular, exchange of 2-oxoglutarate and pyruvate was necessary for survival of this 2-strain consortium (Fig. 2). Strain x performs a portion of the TCA cycle in order to produce 2-oxoglutarate, which is converted to biomass (as well as L-glutamate and L-glutamine, which are also converted to biomass). The excess 2-oxoglutarate is utilized by strain y for biomass as well as for the production of pyruvate. Pyruvate, in turn, is converted to biomass, and its excess is utilized by strain x.
This example illustrates how, even for a relatively small network, DOLMN can provide putative division of labor strategies that could not be easily designed manually. DOLMN could be similarly applied to other core metabolic models (66) such as have been generated for a large number of organisms.
A growth landscape illustrates division of labor strategies in E. coli genomescale metabolism. We next applied DOLMN to a much larger global network, namely, genome-scale E. coli metabolism (58). In this case, individual strains found by the algorithm would represent E. coli variants with a reduced set of functionalities. Altogether, we solved for 5,347 combinations of T TR and T IN in which we found a feasible solution, including 738 for a single strain, 2,207 for two strains, and 2,402 for three strains.
To display how growth rates vary as a function of T TR and T IN , we systematically mapped the landscapes of possible 1-, 2-, and 3-strain simulations ( Fig. 3a and b; see also Fig. S1a in the supplemental material). One first observation, consistent with expectations, was that as T IN decreases (for an unconstrained number of transport reactions), individual strains reach a limit beyond which they cannot sustain growth, whereas consortia of two and three strains are still viable. In the example analyzed in Fig. 3, a 1-strain subnetwork needs at least 254 intracellular reactions to grow, whereas 2-strain subnetworks require only 215 intracellular reactions each and 3-strain subnetworks 203 intracellular reactions each (Fig. S2a). Interestingly, the 2-strain and 3-strain subnetworks have approximately the same number of intracellular reactions in common (Fig. S2b).
The observed landscapes display a fundamental nonlinear trade-off between minimizing T IN (intracellular complexity) and minimizing T TR (metabolic exchange). This nonlinearity implies that removing the same number of transport reactions at different points along the frontier of the feasible region can be compensated for by adding different numbers of intracellular reactions. For example, decreasing T TR by 2 at large T TR values can be compensated for by adding a single intracellular reaction (increasing T IN by 1), while removing the same number of transport reactions at small T TR would require a much larger compensation with intracellular reactions ( Fig. 3a and b; see also Fig. S1a). It is important to note that decreasing the T TR value negatively influences growth because it restricts not only each strain's ability to take up metabolites but also its ability to secrete metabolites. If an organism cannot secrete metabolites, it accumulates waste (which results in an infeasible FBA solution). Irrespective of the number of strains in the community, it appears that the E. coli strain subnetworks require at least 9 transport reactions in order to support growth, corresponding to the main elemental sources and other irreducible biomass requirements; this strict bound is illustrated with a gray, shaded region in the (T TR ,T IN ) landscape ( Fig. 3a to c; see also Fig. 4a, c, d, and e). Among other things, these transporters enable the strains to take up carbon, nitrogen, sulfate, and phosphate sources (Fig. S2c).
Further analysis of the landscapes for 1-, 2-, and 3-strain communities also reveals the existence of regions in which division of labor potentially provides a competitive advantage. Given that multiple strains coexisting in a consortium have to share available resources, they tend to grow more slowly than individually growing strains ( Fig. 3a and b). One notable exception is a thin strip at the boundary, in which an individual strain can grow. At this frontier for a single strain, we observed that 2-strain communities can grow more rapidly than 1-strain communities ( Fig. 3c and d). A biologically important implication of this result is the fact that the 2-strain communities would in principle have the chance to collectively outcompete the 1-strain ones. Similarly, 3-strain communities grow faster than 2-strain communities along the boundary in which 2-strain communities can grow (Fig. S1c and f) but only grow faster than 1-strain communities at a single constraint ( Fig. S1b and e). These results suggest that the number of strains that achieve the highest growth rates under a given set of circumstances might naturally increase as environmental constraints tighten. This situation could arise, for example, if the burden of protein cost in the cell were to increase or if selection processes were to gradually come to favor streamlined strains (e.g., as previously observed experimentally and reported [10,11,[14][15][16][17][18][19][20][21]35]).
In examining the union of all intracellular reactions present at a particular constraint, we observed that the number of unique intracellular reactions seen with multistrain communities is larger than the number seen with a single strain and that 3-strain communities sometimes have higher numbers of unique intracellular reactions than  2-strain communities (Fig. 3e). The larger collective metabolic networks (corresponding to the more specialized subnetworks) likely facilitate growth under conditions of strict constraints, particularly below the boundary at which 1-strain communities can no longer exist, and are likely the result of metabolic division of labor. On the basis of these observations and on the results obtained for the core model (Fig. 2), we expect that the capacity of two or more E. coli strains to survive together under conditions of tighter constraints on the number of allowed reactions is due to metabolic division of labor and cross-feeding. We explore this in the next section.
Emergence ). This is due to the fact that, as the number of allowed reactions is reduced, the two strains can grow only if they take distinct metabolic roles (i.e., perform division of labor). Interestingly, despite the large metabolic network rearrangements induced by these constraints, the 2-strain consortia are able to maintain a fairly constant growth rate until they become infeasible (Fig. 3b). These division of labor strategies are also visible in terms of the number of metabolites that the 2-strain communities have to exchange with each other in order to grow ( Fig. 4b; see also In short, multiple-strain communities can grow under stricter constraints on T IN because they distribute metabolic reactions and exchange metabolites ( Fig. S3c and d).
Overall, different 2-strain communities can vary widely in terms of the metabolites being exchanged, with molecules ranging from central carbon compounds such as acetate and pyruvate to amino acids (Fig. 5a).
In order to gain better insight into the metabolic changes that accompany the rise in the number of pairs of obligate mutualistic strains, we reduced the multidimensional space of fluxes using principal-component analysis (PCA) (see Materials and Methods). Clusters in the principal-component space would indicate common metabolic strategies, hardly detectable through visual inspection of the network themselves; paths between these clusters would additionally portray how the strain subnetworks move through this metabolic space as constraints become more stringent. We observed three clusters, indicating three major metabolic strategies ( Fig. 4d; see also Fig. S4). The largest cluster occurs at the origin and corresponds to large T IN values. That is, for fairly unconstrained intracellular metabolism (i.e., when both strains are allowed a large number of intracellular reactions), all strains (for 1-strain and 2-strain simulations) use the same metabolic strategy for core energetic requirements. In particular, they display similar uses of the TCA cycle (Fig. 4c) and of other central carbon metabolism pathways (glycolysis/gluconeogenesis, oxidative phosphorylation; see Fig. S6). These metabolic regimes correspond to standard respiratory metabolism, in line with the expected metabolic fluxes for E. coli grown on minimal glucose medium (57,58,67). However, as T IN becomes more constrained (i.e., as the number of intracellular reactions allowed decreases), the 2-strain subnetworks move away from each other, indicating that they diversify into different metabolic strategies. Note that the specific path that is followed by these networks as T IN decreases is also a function of T TR (Fig. 4d; see also Fig. S4).
To understand the different metabolic strategies associated with different clusters in the PCA, we calculated the Euclidean distance between pathways in each of the two strains in a 2-strain community and mapped it to the (T TR ,T IN ) landscape (see Materials and Methods). A striking outcome of this metabolic distance analysis was the fact that, for a broad range of T TR values (e.g., values ranging from 9 to 29), the 2-strain subnetworks had a large difference in their TCA cycle (Fig. 4c), mirroring the observation reported for the core E. coli network (Fig. 2). This strategy of splitting the TCA cycle was also supplemented by large differences in glycolysis/gluconeogenesis, which were in fact observed along the entire T IN boundary for 2-strain communities (Fig. S6), suggesting that this could represent a generalized version of the acetate utilization phenotype observed in classical evolutionary experiments (8,12,13). It is important to note that Jaccard distance values quantify differences in reaction content (Fig. S5), or in how the metabolic network was modified, whereas Euclidean distance values measure differences in reaction flux magnitude and direction (Fig. S6), or in how the modified network is being utilized.
As indicated by the increase in the number of exchanged metabolites that occurs as constraints become tighter (Fig. 4b), the pairs of metabolically differentiated strains described above can survive due to metabolic cross-feeding. For example, in a 2-strain community, one of the strains could utilize the metabolites available from the environment and could secrete by-products that can enable the other strain to survive. We again used the (T TR ,T IN ) landscape to track how constraints affect the metabolites being exchanged (Fig. 4e; see also Fig. S7 and S8). Different metabolites display drastically different patterns (Fig. S7 and S8): some metabolites are exchanged almost universally (e.g., acetate) whereas others appear only in specific subregions of the landscape (e.g., L-glutamate below the T IN boundary for 1-strain communities). Notably, succinate is shown to be exchanged predominantly at the T IN boundary of 2-strain communities (Fig. 4e), in very close correspondence to the area of the landscape where the TCA cycle is drastically split (Fig. 4c). Furthermore, succinate exchange (Fig. 4e) and the metabolic differentiation of the TCA cycle (Fig. 4c) have a correlation coefficient of 0.63 (Fig. S3f). This suggests that, based on genome-scale simulations, succinate would be one of the key intermediates for the rise of E. coli strains surviving by using complementary halves of the TCA cycle.
Fluxes of metabolic sharing can be strongly correlated. Consortia of obligate symbiotic partners are predicted to emerge in regions of the (T TR ,T IN ) landscape where 1-strain communities are infeasible. As shown above, what makes these consortia viable (i.e., what makes it feasible for the corresponding strains to produce biomass despite the strong restriction on intracellular reactions) is the possibility of metabolite exchange between the 2-strain communities.
We thus sought to perform analyses of the metabolites being exchanged across the whole (T TR ,T IN ) landscape. In the specific setup used for the in silico experiments, among a total of 143 extracellular metabolites, only 37 (25.9%) metabolites were exchanged in at least one simulation of 2-strain communities. Most (78.4%) of these exchanged metabolites were exchanged in at least 10% of all coculture simulations (Fig. 5a). One class of abundantly exchanged molecules is the set of amino acids. These solutions can be viewed as similar to the artificially imposed auxotrophies used to engineer synthetic consortia (39)(40)(41)(42)(43)68). Other frequently exchanged molecules include carboxylic acids (e.g., acetate and pyruvate) and nucleic acids (e.g., thymidine and adenine), which are known to be exchanged in natural communities (12,13,23,61).
Additional insight can be gained by exploring the relationship between exchanged metabolites, i.e., by asking whether we should expect specific pairs of metabolites to be simultaneously exchanged in the same or opposite directions between two organisms. Knowledge of such correlations/anticorrelations may be useful as a strategy for choosing biomarkers (if two organisms exchange A, they are also likely to exchange B), as an indicator of fundamental metabolic trade-offs (X can provide A only if Y provides B), or as a broad suggestion of the existence of unavoidable couplings in the interactions present in a microbial community. Similar analyses of coupling between fluxes, e.g., through linear optimization (69-71), elementary flux modes (72), or Bayesian approaches (73), have become a broadly used way of determining structural network properties that can be helpful for metabolic engineering applications. Here, we sought to estimate correlations between exchange fluxes across an ensemble of networks with different stoichiometries [i.e., all the networks in the (T TR ,T IN ) landscape], making the use of existing algorithms challenging. We thus applied the Spearman correlation to exchange reaction fluxes (Fig. 5b) in order to measure if metabolites are exchanged jointly (both taken up or secreted by a strain [positive ]) or reciprocally (one is taken up and the other is secreted by a strain [negative ]). While Spearman correlation was chosen because we do not expect fluxes to scale linearly with T TR and T IN (Fig. S9c), we also computed Pearson correlations for the same data set (Fig. S9a), obtaining qualitatively similar results. Notably, the correlations described in detail below are substantially different from those that one would observe from applying a Bayesian flux coupling algorithm (73) to the original metabolic network for two coupled E. coli strains (Fig. S9b).
Two major patterns emerged from this analysis. First, by looking at the hierarchically clustered correlation matrix, one can immediately recognize several block structures. The largest block structure seems dominated by amino acid exchange. In particular, two sets of (mostly) amino acids seem to be highly correlated within each set but to be highly anticorrelated across the sets. The first set includes L-arginine, L-histidine, L-threonine, and uridine, which are all correlated with each other, whereas the second set includes acetate, L-alanine, L-isoleucine, L-leucine, L-tryptophan, L-proline, ornithine, and L-serine. Interestingly, these anticorrelated block sets do not seem to map trivially to different precursor pathways (e.g., the upper versus lower glycolysis pathways), suggesting that other metabolic trade-offs may determine these patterns. One can also observe a second block structure dominated by amino acids that are all correlated, containing D-alanine, L-lysine, L-asparagine, L-phenylalanine, L-tyrosine, pyruvate, and-to a lesser extent-fumarate, 2-oxoglutarate, putrescine, and L-valine.
The second significant outcome of this correlation matrix is related to the TCA cycle-splitting result described above. In particular, succinate and fumarate are anticorrelated ( ϭ Ϫ0.67), indicating reciprocal exchange. These two metabolites are intermediates of the TCA cycle and are involved in sequential steps, suggesting that this anticorrelation corresponds to the metabolic division of labor strategy characterized by splitting of the TCA cycle (Fig. 2). This is further confirmed by the fact that the region in the (T TR ,T IN ) landscape where succinate is exchanged matches very closely with the region in which the 2-strain communities show very distinct forms of use of the TCA cycle ( Fig. 4c and e) and that the exchange profile of succinate is correlated to the metabolic differentiation of the TCA cycle ( Fig. S3e and f).

DISCUSSION
Although this study was purely computational, the data provide a new conceptual framework and new predictions, including specific testable modifications, and enable us to perform analyses that are currently beyond current experimental capabilities. The phenotypic space that we explore involves 5,000 in silico experiments on the full E. coli network, across 1-, 2-, and 3-strain communities, under conditions of multiple constraints on the number of transport and intracellular reactions. For each organism's model, we take into account large numbers of variables (1,075 reactions and 761 metabolites in each strain). Efficient algorithms therefore represent a key step toward exploring possible strategies for division of labor in synthetic microbial communities and toward understanding how metabolic specialization may arise in natural microbial consortia.
The spontaneous emergence of division of labor in natural microbial systems is still a poorly understood process. One possible mechanism for this type of interdependence is the Black Queen hypothesis, on the basis of which division of labor could arise in complex microbial communities through the loss, instead of the gain, of functions that can be performed by "leaky" partners (29,47). Our approach bears some analogies to the Black Queen hypothesis, in that it identifies network solutions that are reduced relative to those of an initial large network, generating networks of metabolitemediated interdependencies. However, in our optimization algorithm, we assume that all strains are simultaneously constrained by the same maximal number of reactions allowed and thus that reductions in gene numbers occur in parallel in coexisting strains, generating complementary metabolic capabilities.
Our results may be relevant for microbial ecology of natural communities, as well as for the study of synthetic microbial consortia. Ongoing efforts in synthetic ecology are mostly focused on engineering metabolic dependencies by making one strain unable to produce a terminal biomass precursor (e.g., an amino acid) which is then provided by another strain. Although this strategy has yielded new insight into how microbes interact within communities, it may not reflect the possible complexity of natural metabolic interactions. Our approach has the capacity to uncover a much broader set of potential opportunities for obligate cross-feeding, including the exchange of metabolites that are part of core metabolic processes, or "deep symbiosis" between organisms.
The split TCA cycle that we present in Fig. 2 is an example of this potential deep symbiosis, which can be related to existing metabolic diversity across microbial taxa. The TCA cycle is traditionally described as a method to generate energy, but it is also essential for the production of essential precursor metabolites for cellular biosynthesis. The classic, cyclical form focuses on energy generation, but branched variants emphasize biosynthetic precursor production. These branched forms have been observed in bacteria and are the result of either an incomplete TCA cycle (66,(74)(75)(76)(77)(78)(79) or differential regulation (76,78,80,81). The enzyme 2-oxoglutarate dehydrogenase, which catalyzes the conversion of 2-oxoglutarate to succinyl-CoA, is often lacking in organisms with an incomplete TCA cycle (75)(76)(77)(78)(79). Furthermore, 2-oxoglutarate dehydrogenase is transcriptionally repressed in E. coli under anaerobic conditions (78,80,81) and has been found to show reduced metabolic flux in E. coli evolution experiments (82,83). Due to the inability of these organisms to generate succinyl-CoA from 2-oxoglutarate, they instead reduce oxaloacetate to succinyl-CoA "in reverse." Moreover, the phosphoenolpyruvate (PEP)-pyruvate-oxaloacetate node (also known as the anaplerotic node) is the metabolic link between the TCA cycle and glycolysis/ gluconeogenesis and acts as a switch to control how carbon flux is distributed through central carbon metabolism (84).
Interestingly, the TCA cycle splitting that we observed in the model predictions occurred at the metabolites 2-oxoglutarate and oxaloacetate, both of which are used to create various amino acids. 2-Oxoglutarate is used to produce the amino acids glutamate, glutamine, proline, and arginine, whereas the amino acids aspartate, asparagine, methionine, threonine, isoleucine, and lysine can be produced from oxaloacetate. The biological significance of a split TCA cycle is also manifested in the patterns of secretion and uptake reported in the literature. In particular, both 2-oxoglutarate and pyruvate (the two exchanged metabolites) have known transporters in E. coli (85,86). These metabolites are often observed in the extracellular environment (87)(88)(89), indicating that they are available for utilization. Moreover, cross-feeding of 2-oxoglutarate has been observed between Saccharomyces cerevisiae and lactic acid bacteria (24) and in "Chlorochromatium aggregatum" (25), a phototrophic consortium of the green sulfur bacterium Chlorobium chlorochromatii and a proteobacterium (Comamonadaceae). Consequently, there is evidence that some of the strategies representing the division of labor identified by our approach are similar to the pathway configurations of existing consortia.
Our predictions were initially produced under the assumption that the different perturbations applied by our method correspond to genetic modifications. However, they could equally be interpreted as being a consequence of instances of gene downregulation instead of gene loss (16). In other words, all the solutions found by DOLMN may in principle manifest themselves in the form of phenotypic differentiation within a population of cells. In a complex multicellular system (such as the human body), this could occur in the form of division of labor among cell types in different tissues, whereas in clonal populations of microbes, this could imply phenotypic variation due to heterogeneous gene expression (90)(91)(92). Single-cell studies of microbial physiology (16), aided by genome-scale models of metabolism, could help unravel both genomic and transcriptional variability potentially associated with division of labor in the microbial world.
Experimental testing of the proposed division of labor strategies in E. coli may prove challenging due to the multiplicity of gene deletions that would have to be simultaneously performed, although conceptually driven complex redesign of metabolic networks has been successfully implemented experimentally before (93,94). Recently developed technologies, such as multiplex automated genome engineering (MAGE) (95), conjugative assembly genome engineering (CAGE) (96), and the clustered regularly interspaced short palindromic repeat (CRISPR)/Cas system (97), could in principle facilitate such an endeavor. Still, a challenge of implementing multiple targeted knockouts experimentally is the chance of encountering high-order epistatic interactions between genes that are difficult to predict computationally and that may result in nonviable strains. Thus, instead of implementing all of the genetic perturbations at once, it may be advisable to engineer increasingly complex interactions involving gradual modifications of different reactions and pathways. For example, rather than deleting genes, one could consider engineering promoters to reduce the flux through each reaction and potentially let the strains evolve in the laboratory. Alternatively, one could perform proteome perturbations through protein overexpression or energy dissipation (98).
Besides experimental testing, one could ask whether assortments of reactions similar to the ones predicted by DOLMN for our mutualistic E. coli strains can be found in existing organisms. For example, natural coexisting microbes may have identified, through long-term adaptation, metabolic strategies close to those predicted by DOLMN. This issue could be addressed by comparing the presence/absence of different metabolic enzymes in our predicted mutualistic strains to enzyme presence/absence profiles across microbial genomes coexisting in specific environments (99). A broader, related issue is whether the overall assortment of enzymes across individual species in complex communities is predictable on the basis of fundamental principles. Through a computationally improved future version of DOLMN, one could ask whether an ecosystem-level metabolic network could be correctly partitioned into metabolic networks that are representative of major observed taxa.
Conclusion. We computationally explored the possible ways in which sets of metabolic reactions can be distributed among interacting microbial strains, with the goal of better understanding the trade-off between metabolic self-reliance and mutualistic exchange. The mathematical problem of designing metabolically viable organisms and communities from a global set of possible reactions is a very difficult one. The approach, heuristics, and examples illustrated in this work show, however, that solutions identified by our algorithm are feasible and biologically interpretable. Although direct experimental testing of our predictions would require further preliminary computational work (e.g., complementing the current analysis with dynamic FBA and kinetic modeling) and laborious generation of modified strains, it is important that instances of division of labor similar to what we predicted may be found from further scrutiny of known symbiotic relationships (100,101) and experimental evolution data (102). Thus, in addition to illustrating putative nontrivial avenues for engineering communities of codependent strains (synthetic ecology design), our approach could be viewed as a step toward addressing a broader, overarching issue, namely, whether fundamental design principles can help improve understanding of microbial ecosystem diversity and the metabolic network structure of individual organisms in natural communities. Specifically, one could imagine that abundant horizontal gene transfer and long-term selection processes in ecosystems may have acted over geological time to efficiently allocate genes into mutually dependent organisms, very much in the manner in which our algorithm does. With increasing computational power and further optimized algorithms, it may become possible to extend our approach to a larger number of organisms, with the potential of providing a general theoretical scaffold for understanding how environments shape division of labor strategies and microbial diversity.

Flux balance analysis (FBA).
To mathematically formulate FBA, let S denote the stoichiometric matrix of dimensions m ϫ n, where m is the number of metabolites and n is the number of metabolic fluxes. Metabolic fluxes are defined as vector x, where x lb and x ub are the lower and upper bounds, respectively, of the metabolic fluxes. These bounds are implied by empirical evidence of irreversibility or by nutrient availability in the growth medium. The cellular objective is expressed as a vector of weight coefficients for each reaction (e.g., biomass), denoted by c, and the optimal objective value is a scalar value corresponding to Z opt . The FBA problem is formulated as follows: where 0 is the vector of all zeroes and the prime indicates transposition. Community-level flux balance analysis. In order to perform FBA capturing all reactions spanning an entire microbial community, we introduce a "universal stoichiometric matrix," also denoted by S, which expresses the stoichiometric coefficients of all metabolic reactions in the community irrespective of the organism they belong to, as previously described (63). Specifically, S ʦ ‫ޒ‬ MϫN , where M ϭ M e ϩ M i represents the number of distinct metabolites and N ϭ N e ϩ N t ϩ N i represents the number of distinct reactions (see Fig. S10a in the supplemental material). The distinct M metabolites consist of two types: M e extracellular metabolites and M i intracellular metabolites. There are 3 different types of reactions: N e extracellular reactions, N t transport reactions, and N i intracellular reactions. The availability of nutrients (extracellular metabolites) from the environment is encoded in the extracellular reactions, and intracellular reactions encode each organism's metabolism. Organisms use transport reactions to move metabolites between their intracellular compartment, which is unique to each organism in the community, and the extracellular environment, which is shared by all organisms in the community.
A method for metabolic division of labor. We first reformulate the universal stoichiometric matrix S to construct putative stoichiometric matrices for each species in the community. In particular, we construct a community stoichiometric matrix S c , whose structure is shown in Fig. S10b. The block matrices S e , S t1 , S t2 , and S i in S c are consistent with those in S. Organisms in the community share the same nutrients and extracellular reactions. Because there are K organisms in the community, we replicate the block [S t2 , S i ] that includes transport reactions and intracellular reactions K times and diagonally arrange them in S c . Similar compositions of stoichiometric matrices had used in previous work on community-level flux balance modeling (45,103,104).
After the intracellular metabolites are obtained via the transport reactions, intracellular reactions take place inside each organism. This construction leads to the community stoichiometric matrix S c ʦ ‫ޒ‬ M c ϫN c , where M c ϭ M e ϩ K ϫ M i and N c ϭ N e ϩ K ϫ (N t ϩ N i ). Notice that S c has 1 block column for extracellular reactions (N e columns) and K block columns (of dimension N t ϩ N i ), one for each organism, including all transport and intracellular reactions.
To capture design choices, we introduce a binary putative vector t ϭ (t 1 ,. . ., t N c ), where t j ʦ ͕0,1͖ and j ϭ 1,. . ., N c is a binary variable, indicating whether the j-th reaction is included in the corresponding organism (Fig. S10b). With t and S c available, we can partition S c to K individual matrices, S k , by removing column j with t j ϭ 0.
The problem of identifying t can now be formulated as the following MILP problem: where diag(x) denotes a diagonal matrix whose main diagonal consists of the elements of vector x, R is a regularization matrix, and t min and t max are appropriately defined constant vectors. Specifically, by appropriately defining R, t min , and t max , we can impose constraints on the number of internal and transport reactions for each organism. The first optimization problem. Suppose there are K organisms. The upper bounds on the number of active transport reactions and intracellular reactions in each organism are T TR and T IN , respectively. We let x biom k denote the flux of the biomass reaction for each organism k ϭ 1,. . ., K. We also let TR k and IN k denote the index sets of the transport reactions and intracellular reactions for each organism k, respectively. For example, suppose for organism k the transport reactions have indices 4 and 6 and the internal reactions have indices 2, 5, and 9. Then, TR k ϭ {4,6} and IN k ϭ ͕2,5,9͖. Both of these sets are subsets of the entire reaction index set {1,. . ., N c } for the community. The MILP (as shown in problem 2) takes the following form: Let x*,t* denote an optimal solution of the MILP problem above. We note that solving such a large-scale MILP problem, which involves hundreds or thousands of integer variables, is computationally expensive. Our experiments suggest that solving problem 3 for a community of two E. coli core models can be done relatively quickly (on the order of minutes or hours). On the other hand, solving the problem for a community model of two iJR904 E. coli models is very time-consuming. We employ certain methods to speed up finding an optimal solution. One method leverages the fact that instances with similar values for T TR and T IN have similar sets of active reactions. Specifically, as we decrease the values of T TR and T IN , we use the sets of active reactions corresponding to larger T TR and T IN values to generate feasible solutions that are offered as putative solutions to the MILP solver. This tends to drastically decrease solution times. A complementary approach uses a decomposition idea. In particular, for solving problems involving a community of K organisms, we can use solutions for K Ϫ 1 organisms and append solutions for the additional organism. This generates feasible solutions, and it is possible to search for an effective feasible solution by varying the way the K-organism community is decomposed into a (K Ϫ 1)-organism community and an additional organism.
The second optimization problem. In order to reduce levels of redundant fluxes in transport and intracellular reactions, a second optimization problem is introduced where the integer variables are fixed to the optimal solution t* of the first-stage problem (problem 3) and where the biomass fluxes are also set to the optimal values obtained by the first-stage problem (problem 3). Specifically, we use the following equations: The analysis of the DOLMN output is performed in several MATLAB scripts. All analyses are split between the core and full iJR904 E. coli model. The scripts dolmn1a_parse_iJR904.m and dolmn1b_parse_core.m apply the function algorithm2models to all of the DOLMN outputs for the full and core iJR904 E. coli models, respectively. The function algorithm2models parses C into individual metabolic models, calculates the exchange flux of each individual strain (instead of the community exchange flux), and identifies exchanged metabolites. The scripts dolmn2a_summary_iJR904.m and dolmn2b_summary_core.m restructure the data for plotting, calculate the Jaccard distance and exchange flux correlations, and perform standard PCA (MATLAB function pca). Interpolations for the constraint landscapes are performed in dolmn2a_summaryInterp_iJR904.m and dolmn2b_summaryInterp_core.m. The only exception is the flux coupling analysis, which is performed separately. All main text figures were plotted using dolmn3_figures_main.m, and all supplementary figures were plotted using dolmn4_figures_supp.m.
All data (raw and analyzed), functions, and scripts can be found in the GitHub repository (https:// github.com/segrelab/dolmn). Analysis was performed with MATLAB 2017b. MILP problems were solved by the use of Gurobi 7.0 (105).
E. coli models. E. coli core (57) and genome-scale iJR904 (58) models were retrieved from the BiGG database (106). The models were downloaded as .mat files in the COBRA (COnstraints-Based Reconstruction and Analysis) format (107). Stoichiometric matrices were reformatted as a community stoichiometric matrix S c , as previously described, and the results are shown in Fig. S10b. Reaction and metabolite names were reordered to correspond with the community stoichiometric matrix.
Calculating net metabolite exchange flux from transport reaction flux. Many metabolites have multiple transport reactions, which means that the net flux of a metabolite into or out of an organism (and thereby into or out of the environment) is typically represented by exchange reactions. However, in community flux balance analysis, organisms share exchange reactions (Fig. S10b). In order to determine if a strain subnetwork took up or secreted a metabolite in multistrain communities, we had to calculate the net metabolite exchange flux within each strain subnetwork. The calculated net metabolite exchange flux is the sum of all transport reactions (determined as follows): x e k ϭ S t1,k x t k (5) where x e k is the exchange flux of organism k, x t k is the transport flux of organism k, and S (t1,k) is the portion of the stoichiometric matrix for extracellular metabolites and transport reactions (see Fig. S10). Calculating metabolic differentiation. Two metrics were used to measure metabolic differentiation based on intracellular reactions. Jaccard distance measures differences in reaction content and quantifies diversification at the level of network topology. In contrast, Euclidean distance evaluates differences in the magnitude and direction of reaction fluxes and therefore illustrates how the network topology is being utilized differently between strains. Both Jaccard and Euclidean distance metrics were used because, at the extreme, strains could have kept the same reactions (could have a zero Jaccard distance) but could have used the reactions differently (could have a nonzero Euclidean distance), as shown in Fig. S3a and b. The built-in MATLAB function pdist2 was used to calculate Jaccard and Euclidean distances with the "jaccard" and "euclidean" metrics, respectively.
Identifying exchanged metabolites. Metabolites are exchanged if one strain secretes the metabolite (has a positive exchange flux) and the other takes it up (has a negative exchange flux). Metabolites that were part of the medium (e.g., hydrogen ions) were not considered to be exchanged even if one strain secreted the metabolite and the other took it up. We created the MATLAB function identifyEx-changedMets to determine which metabolites, if any, are exchanged between strains.
Principal-component analysis. Principal-component analysis (PCA) was performed on the intracellular flux values of 1-and 2-strains using the built-in MATLAB function pca. The biomass flux was excluded from the intracellular reactions and was instead used to normalize the intracellular flux values.
Correlation and hierarchical clustering of exchange reaction flux. The MATLAB function corr was used to calculate the Spearman and Pearson correlations of exchange reaction fluxes, normalized by biomass flux, providing information on how metabolites are exchanged together. The Spearman correlation of exchange reaction fluxes was hierarchically clustered by the inner squared Euclidean distance (Ward's method) using the MATLAB functions linkage and dendrogram.
Flux coupling analysis. Flux coupling analysis was performed using the Bayesian metabolic flux analysis (73) MATLAB function bfba, which evaluates reaction fluxes as distributions instead of a single value. We sampled FBA solutions with 200 samples of 5 independent Markov chain Monte Carlo methods for a total of 1,000 flux vector samples, which represent all possible flux configurations compatible under the given constraints of the use of minimal glucose medium. We used thinning and accepted only every 1,000th flux sample to reduce the correlation of successive Markov chain Monte Carlo samples. Solutions were obtained with Gurobi (105) as the solver, the Gibbs sampler, and a 0.01 steady-state error value on mass balance. Flux coupling was indicated with flux sampling covariances, as in the function plotfluxcov.
Data availability. All data generated and analyzed in this study and the corresponding codes are available in the GitHub repository (https://github.com/segrelab/dolmn).