Main

Changes in the composition of the human gut microbiota have been associated with the development of chronic diseases including type 2 diabetes, obesity, and colorectal cancer1. Gut bacterial functions, such as synthesis of amino acids and vitamins2, breakdown of indigestible plant polysaccharides3, and production of metabolites involved in energy metabolism4, have been linked to human health. The use of 'omics approaches to study human microbiome communities has led to the generation of enormous data sets whose interpretations require systems biology tools to shed light on the functional capacity of gut microbiomes and their interactions with the human host5.

In order to infer the metabolic repertoire of a gut metagenome data set, researchers usually map sequenced genes or organisms onto metabolic networks derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG)6, and functional annotations from KEGG ontologies7. However, this approach cannot identify the contribution of each bacterial species to the metabolic repertoire of the whole gut microbiome, nor can it infer the effects of different gut microbial communities on host metabolism.

A technique that can bridge this gap is constraint-based reconstruction and analysis (COBRA)8 using genome-scale metabolic reconstructions (GENREs) of individual human gut microbes. GENREs are assembled using the genome sequence and experimental information9. These reconstructions form the basis for the development of condition-specific metabolic models whose functions are simulated and validated by comparison with experimental results. The models can be used to investigate genotype–phenotype relationships10, microbe–microbe interactions11, and host–microbe interactions11. Numerous tools can be used to automatically generate draft GENREs but such models contain errors12 and are incomplete. Manual curation of draft reconstructions is time consuming because it involves an extensive literature review and experimental validation of metabolic functions9.

To provide an extensive resource of GENREs for human gut microbes, we developed a comparative metabolic reconstruction method that enables any refinement to one metabolic reconstruction to be propagated to others. This accelerates reconstruction and improves model quality. We generated AGORA, which includes 773 gut microbes, comprising 205 genera and 605 species. All reconstructions were based on literature-derived experimental data and comparative genomics. The metabolic predictions of two AGORA reconstructions and their derived metabolic models were validated against experimental data.

Results

Metabolic reconstruction pipeline

We devised a comparative metabolic reconstruction method (Fig. 1a,c), which is analogous to the comparative microbial genome annotation approach13 that has enabled accelerated annotation by propagation of refinements to one genome to others. First, we downloaded draft GENREs using Model SEED14 and KBase (US Department of Energy Systems Biology Knowledgebase, http://kbase.us). In both platforms, the genome sequence of an organism is automatically annotated and a metabolic reconstruction is assembled based on the identified metabolic functions. Gaps in the draft reconstruction are automatically filled, building a metabolic reconstruction whose condition-specific models can carry flux through a defined biomass objective function. We refined the draft reconstructions using rBioNet15 and performed quality control and quality assurance (QC/QA) tests, including the verification of reaction directionality and mass and charge balance (Online Methods and Supplementary Note 1) to ensure that the reconstructions meet the quality standards set by Thiele and Palsson (2010)9. We expanded the reconstructions by refining gut-microbiota-specific and central metabolic subsystems, and curated all of the reconstructions by reference to 236 publications, two reference books (Supplementary Table 1), and comparative genomics analyses (Online Methods and Supplementary Table 2). Anaerobic growth was enabled for all genome-scale models (Fig. 1e) because the human gut is usually anaerobic or microaerobic16. We tested the metabolic capabilities of each model, as defined by published data, at the genus, species, and strain level. During the generation and validation of each model, solutions to QC/QA problems or failure of the metabolic capability tests of a model were used to improve the quality of others. The propagation of such refinements among all the models was facilitated because all microbes share the same human gut environment.

Figure 1: The reconstruction refinement pipeline and properties of the metabolic models, derived from the reconstructions.
figure 1

(a) The refinement pipeline starts with an automatic draft reconstruction generation from the online platforms Model SEED14 and KBase (Department of Energy Systems Biology Knowledgebase (KBase), http://kbase.us), followed by the translation of reaction and metabolite identifiers to match those of the human metabolic reconstruction, Recon 2.04 (ref. 22). The reconstructions are then tested for QC/QA measures and 162 metabolic functions are curated based on available knowledge and genomic evidence. A microbial reconstruction can be converted into a condition-specific model by the application of condition-specific constraints. All AGORA reconstructions are available at http://vmh.life. (b) Boxplots of the number of genes, reactions, and metabolites for reconstructions belonging to the 13 phyla captured in AGORA (Supplementary Table 5). Dashed lines represent the average over all 773 reconstructions. The number of reconstructions per phylum is shown above the phylum name. Whiskers show the minimum and maximum values. Values below Q1 – 1.5 × IQR and above Q3 + 1.5 × IQR are plotted as outliers. Q1: first quartile, Q3: third quartile, IQR: interquartile range. (c) Differences in the gap-filling process of manually curated reconstructions and AGORA. In AGORA, gap filling of a certain pathway in one reconstruction is propagated to all N reconstructions that share the same gap and should perform the metabolic function in question. For manually curated reconstructions, every gap is filled with organism-specific reactions based on the available organism-specific data and experimental validations. (d) Gene essentiality accuracy predicted by the draft reconstructions, the AGORA reconstructions, and four published reconstructions (Supplementary Note 2) when compared against in vitro data sets48. (e) Table showing the number of AGORA and draft models that grow anaerobically on rich medium and the number of carbon sources, and fermentation products captured by draft reconstructions and AGORA reconstructions. (f) The number of stoichiometrically- and flux-consistent reactions in each draft versus the corresponding AGORA reconstruction. (g) Change in number of stoichiometrically and flux consistent reactions. (h) Comparison of the predictive potential of the draft reconstructions and the AGORA reconstructions. The sensitivity (true-positive rate) of known fermentation products, carbon sources, and growth requirements captured by the corresponding metabolic models.

Our pipeline increased both the proportion and total number of stoichiometrically and flux-consistent reactions17, that is, mass balanced and admitting a nonzero steady state flux (Fig. 1f,g, Supplementary Table 3 and Supplementary Figs. 1 and 2 ). The refinement process increased the predictive potential of AGORA models compared with the draft metabolic models; the AGORA models predicted gene essentiality more accurately than the draft models (Fig. 1d), and sensitivity to carbon source, fermentation product, and nutrient requirement data was greatly increased in the AGORA models with an average sensitivity of 1.00 ± 0.02 compared to 0.06 ± 0.09 for the draft models (Fig. 1h and Supplementary Table 4).

Features of reconstructions

The 773 AGORA reconstructions contained an average of 771 ± 262 genes, 1,198 ± 241 reactions, and 933 ± 139 metabolites (Fig. 1b and Supplementary Table 5). We found that taxonomic classes containing well-studied organisms, such as Gammaproteobacteria, had larger sets of genes and reactions than reconstructions of other classes. An AGORA reconstruction uses a generalized microbial biomass reaction, which summarizes the fractional contribution of a biomass precursor (e.g., amino acids and lipids) to the synthesis of a new cell, as provided in the draft reconstructions from Model SEED and KBase. The biomass reactions were not curated, because species-specific information is required for such refinements (Supplementary Note 2). Qualitative growth predictions are not affected by generalized biomass equations, whereas information on species- and condition-specific cellular composition is required for accurate quantitative prediction18. Nonetheless, the predicted average microbial doubling time of 2.3 h on a Western diet under anaerobic conditions (Fig. 1b and Supplementary Table 6) was close to reported doubling times of single microbes in the mouse gastrointestinal tract19.

When comparing eight AGORA reconstructions with the published genome-scale metabolic reconstructions for the same species, we found that the main differences in reaction content were in lipopolysaccharide biosynthesis and transport pathways (Supplementary Tables 7 and 8). Cell wall and lipopolysaccharide structures are species-specific and cannot easily be derived from gene annotations alone20,21 (Supplementary Note 2). The curation of such pathways requires experimental data that are currently not available for AGORA organisms. AGORA models were equivalent to published GENREs in terms of capturing gene essentiality, as reported in the literature (Fig. 1d). The AGORA models have been curated for carbon source utilization and fermentation product secretion; they outperformed the published models for those functions, as measured by a sensitivity analysis of the carbon source uptake and fermentation product secretion of both seven published models and AGORA models (Supplementary Fig. 3). Until now, manually curated reconstructions have been refined to fit certain applications and, consequently, curated to different extents. As they are not comparable in scope, they may not be a good choice for investigating microbe interactions. In contrast, all AGORA reconstructions have been curated for the same metabolic pathways (Fig. 1a).

We determined the overlap of microbial metabolism with human metabolism. Two cellular compartments are common between human and microbes, the cytosol and extracellular space. To compare the metabolic functions, we decompartmentalized the human metabolic reactions by placing all metabolites that occur in an organelle (e.g., mitochondria) compartment in the human metabolic reconstruction Recon 2 (ref. 22), into the cytosol, and removed duplicate reactions, resulting in 6,256 unique metabolic reactions. Collectively, the AGORA reconstructions account for 3,192 unique metabolic reactions, 695 of which are shared with Recon 2, including 162 (23%) exchange reactions. We found that 89% (5,561/6,256) of human metabolic reactions were unique to the human reconstruction, and 78% (2,495/3,192) of AGORA metabolic reactions were unique to the microbial reconstructions.

Metabolic diversity of AGORA reconstructions

The variety of AGORA reconstructions is shown in Figure 2a. To prove that our method produced metabolically distinct reconstructions for each organism, we computed the metabolic distance of every reconstruction pair (298,378 pairs; Supplementary Table 9) using the Jaccard distance between the reaction lists from each reconstruction. Metabolic distances range from zero to 1 with identical reconstructions having a metabolic distance of zero and completely dissimilar reconstructions having a metabolic distance of 1. As expected, taxonomically related bacteria shared more reactions than taxonomically distant bacteria (Supplementary Fig. 4). Bacilli and Clostridia had high metabolic distances, reflecting the metabolic and phenotypic differences between these two classes (Supplementary Fig. 4). Notably, low metabolic distances were only observed within a class, but not between members of a phylum or between taxonomic classes. Overall, the average metabolic distance was 0.48, which is consistent with other reports of metabolic and functional distances between microbes based on the presence of KEGG enzymes and KEGG orthology annotations23,24. Metabolic pathway enrichment was detected at different taxonomic levels (Fig. 2b); for example, plant polysaccharide degradation was mainly present in Bacteroidia, O-glycan degradation in the genera Akkermansia spp., Bifidobacterium spp., and Bacteroides spp., and methane metabolism was unique to four archaea. As expected, lipopolysaccharide biosynthesis was found only in Gram-negative bacteria. Only 69 reactions were common to all 773 reconstructions.

Figure 2: Taxonomic and metabolic diversity of the 773 genome-scale metabolic reconstructions.
figure 2

(a) Taxonomic tree of the 773 organisms showing the diversity of the AGORA resource. Colors correspond to the taxonomic classes. The tree was created using GraPhlAn49. (b) Hierarchical clustering of the average ratio of reactions per subsystems found in the reconstructions of the 25 taxonomic classes. The reconstructions are ordered based on phyla and taxonomic classes.

To assess whether the known functional diversity of the gut microbiota was captured in our models, we tested for the uptake of 74 different carbon sources and for the secretion of 18 fermentation products (Fig. 3) using flux variability analysis25. The known distribution of short-chain fatty acid production in the gut was well-represented in our models (Fig. 3); most models fermented sugars into short-chain fatty acids and organic acids, with acetate, succinate, formate, lactate, propionate, and ethanol being the most commonly produced metabolites. As expected, butyrate was secreted by the Fusobacteria models26 and by many Firmicutes models27, and methane secretion was specific to the Euryarchaeota. Carbon source utilization capabilities were found to be in agreement with the literature28. Notably, only certain genera (e.g., Bacteroides spp. and Akkermansia spp.) can utilize diet- and host-derived polysaccharides (Fig. 3).

Figure 3: Carbon source uptake and fermentation product secretion capabilities in AGORA.
figure 3

Number of AGORA models in each phylum that can consume the different carbon sources and secrete the tested fermentation products. The total number of models in each phylum is reported in parentheses. The models' capabilities to consume or secrete the different metabolites were determined using flux variability analysis.

Validation of AGORA model predictions

We demonstrated the predictive capability of AGORA models using two models, Bacteroides caccae ATCC 34185 (a fiber-degrading symbiont common in microbiomes of Western individuals29,30) and Lactobacillus rhamnosus GG (LGG), a common human probiotic strain31. No chemically defined medium has been reported for B. caccae. We predicted that B. caccae should be able to grow on DMEM 6429 defined culture medium supplemented with vitamin K, hemin, and arabinogalactan under anaerobic conditions. In the laboratory, B. caccae was cultured on this medium in a flask under anaerobic conditions (Online Methods, Supplementary Note 3 and Supplementary Table 10). This medium would not support growth of LGG, according to our in silico predictions, and growth was unstable in flask cultures using this medium. The reported chemically defined growth medium for LGG contains all amino acids and most vitamins32 but our in silico predictions suggested that these could be supplemented by growing LCG with B. caccae (Fig. 4). When modeled in silico in co-culture in DMEM 6429 defined culture medium supplemented with vitamin K, hemin, and arabinogalactan, both bacteria grew (Fig. 4d and Supplementary Table 11). We predicted that B. caccae would supply LGG with alanine, asparagine, and nicotinic acid, while LGG would provide lactate to B. caccae (Fig. 4d). The addition of alanine and nicotinic acid to the defined medium was sufficient in silico to enable the single growth of the LGG model (Fig. 4c). Using gas chromatography–mass spectrometry (GC-MS)-based metabolomic analysis, we confirmed the secretion of numerous metabolites by the two strains grown individually, including alanine secretion by B. caccae (Fig. 4a and Supplementary Fig. 5), thus supporting the predicted cross-feeding.

Figure 4: Comparison of in vitro experiments and in silico simulations for Bacteroides caccae ATCC 43185 and Lactobacillus rhamnosus GG ATCC 53103.
figure 4

(a) Both microbes were grown anaerobically on DMEM 6429 medium supplemented with hemin, vitamin K, and arabinogalactan. The composition of fresh medium and spent medium after cultivation was determined using GC-MS (Supplementary Fig. 5). Only statistically significant metabolite uptake and secretion is shown. A paired t-test was performed for statistical significance. **P < .005 and *P < 0.01. The same medium composition was used for in silico simulations, and uptake and secretion capabilities were predicted using flux variability analysis. (b) In silico single culture fluxes of B. caccae on DMEM medium without and with arabinogalactan. Growth rates (h−1) and predicted uptake and secretion fluxes (mmol/gDW/h) of major exchanged metabolites are shown in blue without arabinogalactan and in purple with arabinogalactan. (c) In silico single culture fluxes of LGG on DMEM medium without and with arabinogalactan. The in silico medium was supplemented with L-alanine and nicotinic acid, which were predicted to be essential for growth. Growth rates and uptake and secretion fluxes are depicted as described for b. (d) In silico co-culture fluxes of B. caccae and L. rhamnosus on DMEM medium without and with arabinogalactan. Growth rates, and uptake and secretion fluxes are depicted as described for panel b.

Pairwise interactions of models

We computed the pairwise growth interactions ('co-growth') of every pair of microbes in the AGORA resource (298,378 pairs). Each model was grown in silico on its own and as part of a pair with every other model on two different diets with and without oxygen (Online Methods and Supplementary Tables 9 and 12). Under all conditions, the most commonly predicted pairwise co-growth relationships were parasitism or commensalism (Fig. 5a). Consistent with one previous in silico study33, the presence of oxygen resulted in a decrease in commensalism and mutualism, especially for Gammaproteobacteria (Supplementary Fig. 6). Competitive and amensal interactions increased in the presence of oxygen (Fig. 5a). The effects of a typical Western diet and a diet high in fibers, such as arabinogalactan and xylan, on pairwise interactions among models were also evaluated. The high fiber diet led to a higher proportion of commensal and mutualistic interactions (Fig. 5a), which, using flux balance analysis34, was found to be due to cross-feeding of metabolites between species. Low-fiber diets are thought to modulate the microbiota composition by depleting commensal microbe-to-microbe interactions35. Based on hierarchical clustering of the ratio of pairwise interaction type per condition per taxonomical family (Fig. 5b), our simulations predict that the carbohydrate fermentation capacity defines the type of interaction between microbes when considering the family level. Cluster 1 contained saccharolytic microbes that do not produce butyrate and respiratory bacteria and was enriched in commensal interactions. Clusters 2a and 2b were enriched in butyrate and lactate fermenters and generalists related to Bacillaceae, respectively. Both subclusters were mostly negatively affected by parasitic interactions. Cluster 3 contained the majority of asaccharolytic and proteolytic families, which mainly benefitted from the pairwise interactions. The effect of carbohydrate fermentation capacity holds true for the genus level (Supplementary Fig. 6). Even though most interactions were observed over a wide range of metabolic distances, positive interactions, in which the growth rates of one or both microbes are neutral or enhanced in the pairwise simulation, occurred only among metabolically distant organisms (Supplementary Fig. 7), in agreement with previous computational studies23,24.

Figure 5: Pairwise interactions of all AGORA metabolic models.
figure 5

(a) The number and percentage of pairs that exhibit each of six different interaction types during in silico simulations under four different conditions; Western and high-fiber diets in the presence and absence of oxygen. Competition: both microbes' in silico growth rates (Supplementary Table 9) are slower in the paired simulation compared with each microbe's in silico mono-culture growth rate on the same diet (Supplementary Table 12). Parasitism: one microbe grows faster (Taker) in the paired simulation while the other microbe grows slower (Giver). Amensalism: one microbe grows slower (Affected) in the paired simulation while the other microbe's growth rate is unaffected (Unaffected). Neutralism: both microbes' growth rates are unaffected in the paired simulation. Commensalism: one microbe grows faster (Taker) in the paired simulation while the other microbe's growth is unaffected (Giver). Mutualism: both microbes grow faster in the paired simulation. (b) Hierarchical clustering (Euclidean distance) of the ratio of pairwise interaction types per condition (i.e., diet and oxygen presence) on the taxonomic family level. See a for a description of the interaction types. Three main clusters were identified, each enriched in microbes with different carbohydrate fermentation capabilities, belonging to the clustered families.

Integrating metagenomics and 16S rRNA with AGORA

We tested whether AGORA can be used to analyze metagenomic data. We retrieved strain-resolved metagenomic data from 149 healthy individuals from the human microbiome project (HMP; Fig. 6a)36. AGORA microbes mapped to, on average, 91% of the strains in the HMP individuals with comparable reaction diversity for all individuals (Fig. 6b), highlighting that AGORA is representative of the human gut microbiota.

Figure 6: Metabolic diversity of individual human gut microbiomes.
figure 6

The average number of unique reactions was computed based on 1,000 random orders of metabolic reconstruction additions. Dark and light gray areas correspond to one and two s.d., respectively. (a) Human samples of HMP, which have been mapped onto a set of reference genomes, were used to determine how many AGORA organisms are overlapping with each individual. (b) Mapped AGORA strains per HMP sample with their corresponding collective reaction diversity and the percentage of mapped reads they capture for three taxonomic groups. In AGORA, all microbial reconstructions were needed to achieve full metabolic diversity (i.e., 3,192 unique reactions). Also shown are the number of reconstructions needed to capture 75%, 90%, and 95% of the full reaction diversity. To obtain 75% and 95% of the unique reactions, 12 and 123 reconstructions, respectively, were sufficient on average. (c) 16S rRNA results from Claesson et al. (2012)37 ('ELDERMET') were used to determine how many pan-species reconstructions of AGORA are overlapping with each individual. (d) Mapped AGORA pan-species reconstructions per ELDERMET sample with their corresponding collective reaction diversity (i.e., 3,188 unique reactions). Also shown are the number of reconstructions needed to capture 75%, 90%, and 95% of the full reaction diversity. The metabolic repertoire of the individual mapped microbiomes was analyzed by principal coordinate analysis in terms of the metabolic distance between the reaction sets of the mapped pan-species reconstructions. The presence and absence of methanogens and Gammaproteobacteria per individual mapped microbiome is shown.

We then mapped published species-resolved 16S rRNA data of 164 elderly and 13 young individuals ('ELDERMET')37 onto AGORA (Fig. 6c,d). The 210 ± 39 species present in each individual mapped to 108 ± 16 AGORA pan-species reconstructions, which are the union of reactions from all strain-specific reconstructions in AGORA of one species. Two clusters were observed (Fig. 6d). The cluster with more reactions was characterized by the presence of Gammaproteobacteria (Fig. 6d). The corresponding reconstructions contain on average more reactions compared to other taxonomic classes (Fig. 1b). Principal coordinate analysis, using each individual microbiota's metabolic reaction set, revealed that the clusters separated owing to reactions associated with glycerophospholipid and cell wall metabolism (Supplementary Table 13), consistent with the Gram-negative nature of Gammaproteobacteria. The second principal coordinate was mainly associated with the presence of methanogenesis reactions that are unique to the methanogenic archaea.

Discussion

AGORA reconstructions were assembled using a comparative metabolic reconstruction approach that speeded up curation and provided knowledge-driven refinement of gut-specific metabolic microbial functions that were not present in the draft reconstructions (Fig. 1a). Our pipeline included extensive QC/QA and curation against available knowledge, which is not done by pipelines that automatically generate GENREs. The resulting models significantly outperformed the draft models in correctly capturing gene essentiality in addition to known carbon sources, fermentation products, and essential nutrients (Fig. 1d,h). Even though AGORA reconstructions do not cover all of the species-specific aspects of manually curated reconstructions (e.g., lipopolysaccharide biosynthesis), the performance of each of eight AGORA models was on par with that of their previously published, manually curated models of the same strain (Fig. 1d and Supplementary Fig. 3). This resource of reconstructions helps to address the need for literature-curated GENREs to help to analyze gut metagenomic data38.

GENREs have previously been used to design chemically defined growth media using an iterative in silico, in vitro, and metabolomics approach39. The predicted growth of B. caccae in extended DMEM medium supplemented with arabinogalactan under anaerobic conditions was validated. However, the in vitro growth of LGG was unstable, but our in silico co-culture simulations suggest that its growth could be enhanced by the presence of B. caccae. This example highlights how metabolic models can serve as a starting point for generating experimentally testable hypotheses.

Bacteria in ecosystems engage in complex trophic webs based on interspecies interactions40, which cannot be inferred from microbial abundance41. We explored pairwise interactions between gut microbiome models under four conditions. We found that central metabolic traits were predicted to define co-growth interactions as a function of diet composition and oxygen availability (Fig. 5 and Supplementary Figs. 6 and 7). Inflammation of the digestive tract can disrupt the intestinal cell barrier, thereby increasing the oxygen level in the normally anaerobic intestine42 and reducing species variety in the gut microbiome43. A high-fiber diet might protect against the depletion of positive microbe–microbe interactions caused by the presence of oxygen. Negative interactions have been found to dominate stable microbial communities44. Based on our simulations, we propose that a small number of positive interactions may be sufficient to maintain a healthy microbial community.

Metabolic models have been used to map and analyze omics data for single organisms45. Here, we report that AGORA enables such analyses for a much larger set of human gut microbial communities, as most of metagenomic sequence reads and 16S rRNA data from a typical microbiome can be mapped onto our models (Fig. 6), resulting in metabolically diverse microbiota reconstructions that can be used to construct and simulate individual-specific metabolic models. The metabolic overlap between the human metabolic reconstruction and AGORA was enriched in exchange reactions, which supports the hypothesis that co-evolution driven by cross-feeding between the host and the microbes has occurred.

AGORA reconstructions are especially suited to studies in which multiple reconstructions are coupled together to simulate microbial interactions. However, the existence of consistent biases, owing to the semi-automated model generation means that additional strain-specific refinements may be necessary in order to use AGORA models in organism-specific applications, for example, bioengineering. AGORA reconstructions are missing non-dietary microbial functions, such as xenobiotic metabolism, which have not yet been extensively studied in human gut microbes. Because the focus of the curation has been on gut microbial functions, the resource is best suitable for studies on dietary effects on the human microbiota.

One open question in microbiome research is which functions microbes carry out and how those functions interface with host metabolism and affect host phenotypes. About a quarter of AGORA metabolic reactions were also present in Recon 2 (ref. 22), the human metabolism reconstruction, highlighting the complementarity of host and microbial metabolisms. So far, a handful of studies have attempted to predict the metabolic effect of different microbiomes on host metabolism based on topological network approaches46,47, which cannot address functional links in the human-gut microbiota axis. AGORA enables superior simulations to address mechanistic questions about host–microbe co-metabolism. We envisage the combination of AGORA models into strain-resolved microbial community models and predictions of how those community models interact with human metabolic models22 could be used to systematically investigate host–microbiome interactions11.

Methods

Access to AGORA reconstructions.

AGORA is an ongoing resource of GENREs and that keeps abreast of human gut microbial knowledge and can be easily accessed via the Virtual Metabolic Human (VMH) database website (http://vmh.life), which allows querying each reconstruction's content and data gathered from the literature search performed in this study. Any AGORA reconstruction can be downloaded in SBML format from the VMH website on the “Downloads” page. The website also hosts the human metabolic reconstruction, Recon 2 (ref. 22). New content can be added to an AGORA reconstruction manually or automatically, for example, using rBioNet15, which is compatible with the COBRA toolbox8 and ensures all QC/QA measures defined by the community as described by Thiele and Palsson (2010)9. The “Feedback” tab provides contact information to the VMH developers regarding improvements to any resources available on the VMH webpage, where additions to any resource will be upheld by the VMH developers.

Genome selection and draft reconstructions.

In a previous study50, we retrieved 301 draft reconstructions from Model SEED14. Based on lists of human gut microbes reported by Qin et al. (2010)30 and by Rajilić-Stojanović and de Vos (2014)51, we obtained 472 additional draft reconstructions from Model SEED and KBase (Department of Energy Systems Biology Knowledgebase, http://kbase.us), both of which use the RAST annotation server52 to annotate the genomes and build the draft metabolic networks14. All reconstructions were downloaded in SBML format and imported into Matlab (Mathworks, Inc., Natick, MA, USA) using the COBRA Toolbox8. Each reconstruction was refined using the rBioNet extension15 to the COBRA Toolbox. We manually translated reaction and metabolite names into the VMH nomenclature (Supplementary Tables 14 and 15).

In silico simulations.

All simulations were performed in Matlab using the COBRA Toolbox8 (https://opencobra.github.io/) and the linear programming solver CPLEX (IBM, Inc.) through the Tomlab interface (Tomlab, Inc.).

The curation process.

Note that we refer to a model, which was derived from the corresponding reconstruction, whenever simulations under a specified condition were carried out.

Reaction directionalities. To ensure consistency with published reconstructions, the direction of each reaction in a draft reconstruction was set in agreement with the VMH database. This curation prevented fluxes from occurring in thermodynamically unfavorable directions. In several cases, the change of directionality resulted in blocked reactions and/or resulted in a nonzero flux through the biomass objective function (BOF)18. For example, the reaction alpha,alpha-trehalose-phosphate synthase (VMH ID: TRE6PS) has been reported to be irreversible53, but was reversible in each draft reconstruction and was required for the synthesis of UDP-glucose. We corrected this by adding the enzyme that synthesizes UDP-glucose (UTP-glucose-1-phosphate uridylyltransferase, VMH ID: GALU) after setting the reaction TRE6PS to be irreversible. In a similar manner, we manually identified solutions to the other blockages and added appropriate corrections to the reconstructions. Note that when an AGORA model is used to represent a bacterium within a particular part of the intestine, context-specific parameters (temperature, pH, ionic strength, cytoplasmic electrical potential difference, and metabolite concentrations) should be used when checking for consistency between the direction of each reaction and those obtained by context-specific thermodynamic estimates54 (Supplementary Note 1).

Anaerobic growth. The majority of intestinal microbes are strict or facultative anaerobes, while strict aerobes colonizing the gut are rare16. A total of 192 draft-reconstruction-derived models could not carry flux through the BOF on anaerobic rich medium (all exchange reactions open, aside from the oxygen exchange reaction). Anaerobic growth was enabled by adding oxygen-independent reactions for gene products known to be functional under anaerobic conditions. For example, L-aspartate oxidase (EC 1.4.3.16) functions under both aerobic and anaerobic conditions55; therefore the anaerobic, fumarate-using L-aspartate oxidase reaction (VMH ID: ASPO5) was added to those reconstructions containing the oxygen-using L-aspartate oxidase reaction (VMH ID: ASPO6).

Removal of infeasible flux loops. Futile cycles are sets of reactions that result in thermodynamically infeasible fluxes and are a common problem in reconstructions with many transport reactions9. Each draft model had an unfeasibly high export flux of protons from the cytosol, resulting in biologically implausibly high ATP production (average flux of 933 ± 229 mmol/gDW/h in the absence and of 933 ± 227 mmol/gDW/h in the presence of oxygen, on Western diet, Supplementary Table 12). We identified futile cycles by constraining each reaction flux to zero individually and computed the flux through the ATP demand reaction using flux balance analysis (FBA)34. No flux was forced through the BOF. Each deleted reaction lowering the ATP demand flux was inspected manually and replaced by an appropriate irreversible reaction in all reconstructions containing that futile cycle. If such change prevented the model from producing biomass, the change was reversed and another reaction eliminating the futile cycle was identified (Supplementary Fig. 8). After the curation, the average ATP production flux was 19 ± 13 mmol/gDW/h in the absence and 38 ± 23 mmol/gDW/h in the presence of oxygen.

Curation of fermentation pathways. An extensive literature search on the distribution and the structure of fermentation pathways in the considered gut microbes was performed on the genus level and, where possible, on the species level (Supplementary Note 4). Drawing on information from two books and 112 publications, we curated 28 fermentation pathways, of which 20 were either absent from or nonfunctional in all draft reconstructions. Published information on fermentation products was available for 765 out of 773 reconstructed microbes (Supplementary Table 1).

Curation of carbon source utilization pathways. A thorough literature search was performed to assess the carbon source utilization on a species and strain level (Supplementary Note 4). Pathways for 95 distinct carbon sources were added to the reconstructions based on evidence from two books56,57 and 199 publications (Supplementary Table 1). The draft reconstructions captured 35 of the 95 carbon sources but did not contain any of the 31 oligo- and polysaccharides (Fig. 1e). Published information about carbon source utilization capabilities was available for 731 out of 773 reconstructed microbial strains (Supplementary Table 1). To verify the functionality of new fermentation and carbon source utilization pathways, flux variability analysis (FVA)25 was performed on each refined model with all exchange reactions open and no flux forced through the BOF.

Comparative genomics analysis. We used the results of two previous studies on respiratory reductases58 and B-vitamin synthesis pathways59. We extended the previous analyses to the 612 of our 773 organisms that were available in PubSEED60. In the case of missing annotations, a similarity search was performed with a BLAST algorithm implemented in PubSEED (cutoff = e−20). In ambiguous situations, phylogenetic trees and genomic context for the corresponding proteins were analyzed as follows. The neighbor-joining approach implemented in PubSEED was used (default parameters) to construct the phylogenetic trees. Analysis of the genomic context was done using the tools available in PubSEED. Incorrect and inaccurate PubSEED annotations were edited and the corresponding genes were added to the relevant subsystems. Reactions were formulated for all pathways as needed and added to the reconstructions. Additionally, gene-protein-reaction associations (GPRs) of the corresponding reactions were corrected based on the refined organisms' gene annotations (Supplementary Note 5). If a reaction was present in a draft reconstruction but the associated gene was not identified in the organism by comparative genomics, the reaction was removed if the deletion did not disable growth on rich medium.

Respiratory pathways and quinone biosynthesis. The presence of biosynthetic pathways for three main respiratory quinones (ubiquinone, menaquinone, and 2-demethylmenaquinone) as well as for proton-driven ATP synthases were genomically analyzed (Supplementary Table 16 and Supplementary Note 6). Quinone-dependent reactions were added in agreement with the repertoire of quinones synthesized by the corresponding organism61. If no known quinone biosynthesis pathways were found in a genome, we assumed that the organism utilizes extracellular menaquinone and added the corresponding transport and exchange reactions62. All annotations are available in PubSEED60 (http://pubseed.theseed.org/, under the subsystems “Respiration HGM”, “Respiration HGM New”, “Quinones biosynthesis HGM”, “Quinones biosynthesis HGM New”, “ATP Synthases HGM New”, and “ATP Synthases HGM”).

B-vitamin biosynthesis. Eight B-vitamins (i.e., biotin, cobalamin (B12), folate, niacin, pantothenate (B6), pyridoxine, riboflavin, and thiamin) were considered. Based on the genomic predictions and available experimental data, biosynthesis pathways for the B-vitamins were curated in the reconstructions (Supplementary Tables 16,17,18). For the cases where genomic predictions contradicted published data, the experimental data on growth requirements was used for the curation of our reconstructions. If a vitamin biosynthesis pathway was present in a reconstruction that should not synthesize the vitamin based on genomic and experimental data, we added a transport and exchange reaction for the vitamin and removed the reaction(s) that had been gap-filled by the Model SEED or KBase pipelines.

Central metabolic pathways. In order to close gaps and improve gene-protein-reaction associations in central metabolic pathways, a comparative genomic analysis was performed for (i) glycolysis and gluconeogenesis, (ii) the pentose phosphate pathway, (iii) the Entner-Doudoroff pathway, (iv) the citric acid cycle and the glyoxylate shunt, (v) the biosynthesis of purine and pyrimidine nucleotides, (vi) amino acid biosynthesis pathways, and (vii) the N-acetylglucosamine utilization for polysaccharide biosynthesis (Supplementary Table 2). In total, one to 96 (median 34) reactions participating in the 28 central metabolic pathways, represented by 20 PubSEED subsystems, were added to 627 reconstructions.

Nutrient requirements. To ensure that each model could grow on biologically plausible in silico growth media, manual curation of in silico growth requirements was systematically performed by (i) removing unlikely growth requirements, (ii) gap-filling biomass precursor biosynthesis pathways based on comparative genomics (see above), and (iii) curating and validating the models against experimentally determined growth requirements reported in the literature. A preliminary defined medium was identified serving as a starting point for curation of the nutrient requirements (Supplementary Note 7). Microbial growth requirements for 64 nutrients, including amino acids, vitamins, and nucleobases, were identified for 244 bacteria from one book57 and 72 peer-reviewed papers (Supplementary Table 1). False-positive predictions (nutrients that were reported to be nonessential in literature, but required for in silico growth) and false-negative predictions (nutrients that were reported to be essential, but nonessential in silico) were inspected and eliminated where possible (Supplementary Note 7). The curation resolved 413 false-positive and 245 false-negative predictions for 64 metabolites in 244 microbial reconstructions, increasing the prediction sensitivity from 0.32 to 0.68 and specificity from 0.92 to 0.98. All compounds identified as essential for at least one reconstruction after the curation were added to the in silico diets (Supplementary Table 12).

Leak test. We ensured that no metabolite in a model could be produced from nothing, which would indicate mass or charge imbalanced reactions. Therefore, the lower bounds for all exchange and sink reactions were set to zero to prevent any influx of metabolites. A demand reaction was then added for each metabolite in the reconstruction and maximized. A metabolite could be produced from nothing if the objective value was greater than zero and the responsible reaction(s) was corrected.

Mycoplasma and Ureaplasma sp. The reconstructions of Mycoplasma pneumoniae, Mycoplasma hominis, Ureaplasma parvum, and Ureaplasma urealyticum, which do not contain a cell wall57 required further refinement to the respective BOF. We removed cell wall components from the BOF of all four reconstructions. Since the Mycoplasma genus requires cholesterol for growth57, we added cholesterol to the BOF substrate lists of both Mycoplasma sp. reconstructions as well as transport and exchange reactions for cholesterol. To the two Ureaplasma sp. reconstructions, urea transport and exchange reactions were added as this genus requires urea57.

Organization of the pipeline. The reconstruction curation, validation, and content expansion steps described above were integrated into one pipeline (Fig. 1a). As all draft reconstructions stemmed from Model SEED or KBase, issues in one draft reconstruction were thus systemic for a subset or all reconstructions, and could be corrected in a consistent manner by propagating curation and QC/QA insights gained for one draft reconstruction to the remainder (Fig. 1c). This 'comparative reconstruction' approach allowed for the curation of hundreds of draft reconstructions at once.

Stoichiometric and flux consistency. The rank of a stoichiometric matrix is an objective measure of the comprehensiveness of a reconstruction as it represents the number of linearly independent constraints on a steady-state reaction flux. The matrix rank was computed with numerical linear algebra. A set of stoichiometrically consistent (mass balanced) reactions is mathematically defined by the existence of at least one, such that, where is a vector of the molecular mass of molecular species and is a stoichiometric matrix. We computed the largest stoichiometrically consistent subset of each draft and AGORA stoichiometric matrix using numerical linear optimization17. We say a matrix S is net flux consistent if there exist matrices and such that, where each row contains at least one nonzero entry. This condition ensures that a reaction admits a nonzero net flux in some flux distribution. Flux consistency was tested using numerical linear optimization63 as described in Fleming et al.17.

Gene sequence acquisition. We retrieved the nucleotide gene sequence of each microbe using the Perl API from the Model SEED and KBase web interface. We expanded these files with the appropriate gene sequences for respiration, quinone biosynthesis, B-vitamin synthesis, and central metabolic pathways were retrieved from the web interface of the PubSEED platform60. The final compiled gene sequences for each organism can be found in the VMH database.

Diet definitions.

We defined two different diets, a Western diet and a high fiber diet (Supplementary Table 12). The diets varied in fat, simple sugar, starch, and fiber content. Additionally, both diets contained amino acids, vitamins, minerals, water, methanol64, and other metabolites, each of which was required for a nonzero biomass reaction flux in at least one of the 773 models (Supplementary Note 7).

Metabolic distance.

The metabolic distance (MD) between two microbes was calculated using the Jaccard distance, such that MD = 1–|RiRj|/|RiRj|, where Ri is the list of reactions from reconstruction i and Rj is the list of reactions present in reconstruction j. MD of 1 means that the two reconstructions share no reactions, and MD of zero means that the two reconstructions have identical reaction lists.

Carbon source and fermentation phenotypes.

We set the lower bound of the BOF in a model to 0.001 h−1 to ensure its minimum growth, performed flux variability analysis (FVA)25, and inspected minimal and maximal possible flux values through the exchange reactions. A model was considered to be able to take up a carbon source if the minimal possible flux through a carbon source exchange reaction was negative (≤−10−6 mmol/gDW/h). Similarly, a model was considered to be able to secrete a fermentation product if the maximum possible flux through the fermentation product exchange reaction was positive (>10−6 mmol/gDW/h). Even though several amino acids can be used as carbon sources they were excluded from this analysis because amino acids are directly required in the BOFs.

Gene essentiality analysis.

Gene essentiality data were available for six AGORA microbes (Fig. 1d)48. We retrieved the gene sequences for the six draft reconstructions from the Model SEED and KBase platforms. Published reconstructions of the same strains were available for four of these microbes65,66,67,68. Their gene sequences were retrieved from the NCBI database69. We used the BLAST search option of the database (http://www.essentialgene.org/)48 to identify the in vitro–validated essential genes for each microbe. The in silico gene essentiality analysis was carried out under rich medium conditions for the draft, AGORA, and published models, by constraining the flux through all reactions associated with a deleted gene to 0 mmol/gDW/h and maximizing the BOF. A gene was considered essential in silico if its deletion resulted in a biomass reaction flux of zero.

Pairwise model simulations.

Pairwise simulations were performed on every possible pair of the 773 AGORA metabolic reconstructions (298,378 pairs). Microbial models were paired by introducing a common lumen compartment, as described elsewhere33, in which each model could secrete or from which it could take up metabolites. Dietary compounds were added to the lumen and byproducts were removed. To prevent biologically implausible solutions, in which microbes benefit the paired microbe without producing any biomass, coupling constraints were added to the joined models65. Briefly, all reactions in a model were stoichiometrically coupled to its BOF, thereby enforcing a nonzero flux through the BOF if reaction fluxes were nonzero. Using FBA, growth of the microbial pair was maximized under both diets (Western and high-fiber diet), aerobically and anaerobically (Supplementary Table 9). The maximal possible BOF of the individual microbe in the pair was determined by inactivating all reactions belonging to the other paired microbe. A minimal microbial BOF flux for each microbe of 0.001 h−1 was enforced. A model was considered to grow faster in the co-growth simulation when its paired growth rate was more than 10% higher than the individual growth rate of the same microbe under the same condition (Supplementary Table 12). A model was considered to grow slower in the co-growth simulation when its paired growth rate was more than 10% lower than the individual growth rate of the same microbe under the same condition.

Random order microbial assembly.

We selected randomly a reconstruction to obtain its reaction list and appended it to the growing unique reaction list until all 773 strain-resolved reconstructions were considered once (Fig. 6b). We monitored the number of reactions added by each new reconstruction. We repeated this procedure 1,000 times. On the species level, we randomly selected a species and obtained the unique reaction list of all strain-resolved reconstructions belonging to that species. We appended the list of reactions until all 605 species captured by AGORA were considered once (Fig. 6d). We monitored the number of reactions added by each new species. We repeated this procedure 1,000 times.

Comparison with Recon 2.

The human metabolic reconstruction, Recon 2.04 (ref. 22), containing 5,063 metabolites and 7,440 reactions, was retrieved from http://vmh.life. A decompartmentalized Recon was created by placing all reactions occurring in one of the seven intracellular compartments into the cytosol. The extracellular compartment was retained. Duplicate reactions and reactions with identical sets of metabolites occurring on both sides of the chemical equation were removed, resulting in 6,256 unique metabolic reactions.

Metagenomic and 16S rRNA analysis.

Information on metagenomic reads from gastrointestinal tract samples mapped onto a set of reference genomes was downloaded (http://hmpdacc.org/) for 149 healthy US individuals aged 18–40 (ref. 5). We matched the strain names of the mapped reference genomes to 245 AGORA reconstructions (Supplementary Table 19) and identified the unique reaction set (i.e., metabolic diversity) for each individual. Using the read depth information, we calculated the read number covered by AGORA.

For the ELDERMET37 data, annotated species lists for 177 individuals from processed 16S rRNA data were retrieved from MG-RAST70 (http://metagenomics.anl.gov/linkin.cgi?project=154). An individual's unique reaction set was obtained by mapping the reported species onto the pan-species reconstructions (Supplementary Table 19). Principal coordinate analysis was performed on the metabolic distance between each individual's reaction set.

In vitro cell cultures.

B. caccae ATCC 43185 and L. rhamnosus GG ATCC 53103 (LGG) were precultured for 20 h in Brain Heart Infusion Broth (BHIS; Sigma), supplemented with 1% hemin under anaerobic conditions and shaking at 37 °C (Supplementary Note 3). After washing and resuspending in 10 ml of 0.9% w/v NaCl solution, they were inoculated in DMEM 6429 supplemented with 1% hemin and 3.33% vitamin K, with or without arabinogalactan (Sigma; 9.4 g/l), and maintained under anaerobic conditions. B. caccae and LGG were cultured for 33 and 44 h on average, respectively. Cells were harvested for cell counting by centrifugation (4,700g) and 750 μL aliquots of supernatant were removed for subsequent metabolite extraction. The aliquots were snap-frozen and placed at −80 °C until dedicated analysis. The cultures were confirmed using 16S rRNA sequencing.

Metabolomic analysis.

The extraction and GC-MS measurement of short-chain fatty acids was based on a protocol from Moreau et al.71 (Supplementary Note 8 and Supplementary Table 20). Extracellular polar metabolites from the supernatant samples and external concentration curves for each compound of interest were extracted applying a liquid–liquid extraction (Methanol/Water). GC-MS analysis was performed using an Agilent 7890A GC coupled to an Agilent 5975C inert XL Mass Selective Detector (Agilent Technologies). All GC-MS chromatograms were processed using MetaboliteDetector software, v3.020151231Ra72 (Supplementary Note 8). In addition, absolute quantitative values for lactic acid, glutamine, glutamic acid, and glucose were acquired using a 2950D Biochemistry Analyzer (YSI) (Supplementary Note 8).

In silico simulations of B. caccae and LGG.

The single models as well as the pairwise model of B. caccae and LGG were subjected to an in silico medium mimicking the supplemented DMEM 6429 medium without and with arabinogalactan (Supplementary Table 11). Single and combined growth was predicted as described above. The potential of B. caccae and LGG to consume and produce metabolites in silico was computed using FVA25 and compared with GC-MS measurements. To identify the cross-feeding between the two species at optimal growth, FBA was performed with simultaneous growth as the objective function while minimizing the sum of internal fluxes.

Data availability.

Data sharing is not applicable to this article as no data sets were generated or analyzed during the current study.