Interactions between Culturable Bacteria Are Predicted by Individual Species’ Growth

ABSTRACT Predicting interspecies interactions is a key challenge in microbial ecology given that interactions shape the composition and functioning of microbial communities. However, predicting microbial interactions is challenging because they can vary considerably depending on species’ metabolic capabilities and environmental conditions. Here, we employ machine learning models to predict pairwise interactions between culturable bacteria based on their phylogeny, monoculture growth capabilities, and interactions with other species. We trained our models on one of the largest available pairwise interactions data set containing over 7,500 interactions between 20 species from two taxonomic groups that were cocultured in 40 different carbon environments. Our models accurately predicted both the sign (accuracy of 88%) and the strength of effects (R2 of 0.87) species had on each other’s growth. Encouragingly, predictions with comparable accuracy could be made even when not relying on information about interactions with other species, which are often hard to measure. However, species’ monoculture growth was essential to the model, as predictions based solely on species’ phylogeny and inferred metabolic capabilities were significantly less accurate. These results bring us one step closer to a predictive understanding of microbial communities, which is essential for engineering beneficial microbial consortia. IMPORTANCE In order to understand the function and structure of microbial communities, one must know all pairwise interactions that occur between the different species within the community, as these interactions shape the community’s structure and functioning. However, measuring all pairwise interactions can be an extremely difficult task especially when dealing with big complex communities. Because of that, predicting interspecies interactions is a key challenge in microbial ecology. Here, we use machine learning models in order to accurately predict the type and strength of interactions. We trained our models on one of the largest available pairwise interactions data set, containing over 7,500 interactions between 20 different species that were cocultured in 40 different environments. Our results show that, in general, accurate predictions can be made, and that the ability of each species to grow on its own in the given environment contributes the most to predictions. Being able to predict microbial interactions would put us one step closer to predicting the functionality of microbial communities and to rationally microbiome engineering.

1. The authors use outputs from principal component analyses as features in their model (introduced in lines 77 and 349). I believe the authors are referring to Figure S9, though this figure is not referenced anywhere in the manuscript.
2. The authors contextualize their work by discussing the use of metabolic models (among other methods) and that these methods rely solely on genomic information (beginning at Line 41). While many studies indeed use models generated automatically from genome annotations, it is known that experimental curation is an important step in generating reliable quantitative predictions (PMID: 20057383), and has been applied extensively in model generation (PMID: 31391098, 28266498, 29692801, 21483480). As this is in line with the authors' observations on the essentiality of species monoculture growth, I would recommend rephrasing this statement.
3. Line 71: "in order to predict how species affect each other:" please specify what is being affected.
4. Line 73: Please change "growth" to "growth yield." 5. Line 79: Please change to "principal components." 6. Line 96: For a general audience, a description of tree-based models (and specifically of ensemble models like XGBoost) would be very useful. 7. Line 110: "most errors in sign prediction occurred for effects whose strength was close to 0." This is very interesting -were there inconsistencies in interaction sign for these examples in the original dataset? Are there other factors generally associated with these weaker interactions? 8. Line 139: Please provide citations for "which were previously shown to be predictive of interactions." 9. Figure 4: A calculation of statistical significance between the NMRSE distributions would be useful in more fully comparing each model's performance. 10. The model's predictions are based on species' growth yields in mono-and co-culture. While this is appropriate for this experimental system that contained limiting amounts of resources, natural microbiomes are open systems subject to nutrient cycling and periodic resource replenishment. I understand this study is framed as a proof of concept, but it would be valuable for the authors to comment on whether they might expect other ecologically relevant quantities (e.g., growth rate) to have similar degrees of predictive power in a non-laboratory setting. This could help guide the selection of features for future models intended for more complex microbiomes. 11. Line 335: For those not familiar with the original dataset, please clarify the degree to which each species was represented in the mono-and co-culture datasets.
12. Line 361: "highest accuracy for qualitative predictions" -is this appropriate given the imbalanced nature of the dataset (i.e. more negative interactions)?
13. Figure S2B: The total N should be specified or the figure should be normalized to the total number of effects.
Reviewer #2 (Comments for the Author): The manuscript "Interactions between culturable bacteria are predicted by individual species growth" by Nestor et al aims at building a predictive model for pairwise interactions between bacteria. The predictive model is based on a rather unique collection of 7500 experimental measurements of pairwise growth of 20 species grown on various carbon sources that was built as part of an earlier study (Kehe at al 2021). The current study is a natural continuation, using this data as a training set for a predictive model. The authors first demonstrate the performances of their model and then analyze the contribution of each of the different features on the performance. The importance if this work is in providing a guideline for the design of strategies for the culturing of currently uncultured species. Moreover, such predictive model and the accompanied statistical analysis are of considerable importance for understanding the key aspects for community engineering and for the design of synthetic consortia. Though the text is generally well written, several parts are not sufficiently clear and additional information can be useful. Specific request for clarifications are detailed below.
1 . Tables showing the distribution of positive/negative interactions and the TP/TN/FP/FN for each category for the different  methods used for the one-way interactions as well as similar information regrading the distribution of the different types of pairwise interactions will provide a clearer description of the data as well a straightforward way of estimating how well the model performs considering different interaction types. Whereas the text analyses the effect of various features on prediction capacity, the effect of the type of interaction on model performances should also be discussed (at least the type/directionality of interactions). 2. Score of feature contribution to the model is not clearly explain. Each simulation has its own SHAPE value (Figure 3), but how was the feature's score determined? Also, the color bar is indicative of the SHAPE value of its simulation, however, it is located across the y axis which is confusing. 3. Models that were trained using information regarding each species' inferred metabolic pathways did not achieve higher prediction accuracy than models that used only phylogenetic information. This is explained by being inferred from the 16S sequences using picrust rather than being independent of phylogeny. However, metabolic PCs were inferred based on experimental performances rather than being based on phylegentic data, yet with inferior performances. Do the metabolic PCs having independent contribution to predictions quality?
The work by Nestor et al. uses a machine learning model to predict interspecies interactions from bacterial monoculture data, among other organism-specific traits. The authors leverage an extensive set of pairwise and monoculture data to characterize their model, as well as to test the contribution of different data sources on its performance. A thorough comparison with additional models is also presented. Overall, this is a clearly written and timely article that adds to the growing body of literature on machine learning methods for predicting microbiome behavior. Importantly, it also underscores how monoculture growth data is essential for correct predictions of interactions. There are a few points that in my view require further clarification, which are outlined below along with some suggestions on how the manuscript may be improved.
1. The authors use outputs from principal component analyses as features in their model (introduced in lines 77 and 349). I believe the authors are referring to Figure S9, though this figure is not referenced anywhere in the manuscript.
2. The authors contextualize their work by discussing the use of metabolic models (among other methods) and that these methods rely solely on genomic information (beginning at Line 41). While many studies indeed use models generated automatically from genome annotations, it is known that experimental curation is an important step in generating reliable quantitative predictions (PMID: 20057383), and has been applied extensively in model generation (PMID: 31391098, 28266498, 29692801, 21483480). As this is in line with the authors' observations on the essentiality of species monoculture growth, I would recommend rephrasing this statement.
3. Line 71: "in order to predict how species affect each other:" please specify what is being affected.
4. Line 73: Please change "growth" to "growth yield." 5. Line 79: Please change to "principal components." 6. Line 96: For a general audience, a description of tree-based models (and specifically of ensemble models like XGBoost) would be very useful.
7. Line 110: "most errors in sign prediction occurred for effects whose strength was close to 0." This is very interesting -were there inconsistencies in interaction sign for these examples in the original dataset? Are there other factors generally associated with these weaker interactions?
8. Line 139: Please provide citations for "which were previously shown to be predictive of interactions." 9. Figure 4: A calculation of statistical significance between the NMRSE distributions would be useful in more fully comparing each model's performance.
10. The model's predictions are based on species' growth yields in mono-and co-culture. While this is appropriate for this experimental system that contained limiting amounts of resources, natural microbiomes are open systems subject to nutrient cycling and periodic resource replenishment. I understand this study is framed as a proof of concept, but it would be valuable for the authors to comment on whether they might expect other ecologically relevant quantities (e.g., growth rate) to have similar degrees of predictive power in a non-laboratory setting. This could help guide the selection of features for future models intended for more complex microbiomes.
11. Line 335: For those not familiar with the original dataset, please clarify the degree to which each species was represented in the mono-and co-culture datasets.
12. Line 361: "highest accuracy for qualitative predictions" -is this appropriate given the imbalanced nature of the dataset (i.e. more negative interactions)?
13. Figure S2B: The total N should be specified or the figure should be normalized to the total number of effects.
We thank the reviewers for their comments. We have revised the manuscript based on these comments (changes are highlighted within the revised manuscript document). In addition, we have added a short paragraph to the discussion regarding limitations of our models concerning natural communities. Below we provide a detailed point-by-point reply to the Reviewers' comments: Note: line numbers correspond to the pdf file.
Editor's comments: Please add a brief discussion on the limitations of the approach when concerning natural communities whereby only a limited subset of species can be cultured, and potential artefacts of laboratory cultivations.
Thank you, this is indeed important to discuss. We have added a paragraph regarding the limitations of our work when considering natural communities: The work by Nestor et al. uses a machine learning model to predict interspecies interactions from bacterial monoculture data, among other organism-specific traits. The authors leverage an extensive set of pairwise and monoculture data to characterize their model, as well as to test the contribution of different data sources on its performance. A thorough comparison with additional models is also presented. Overall, this is a clearly written and timely article that adds to the growing body of literature on machine learning methods for predicting microbiome behavior. Importantly, it also underscores how monoculture growth data is essential for correct predictions of interactions. There are a few points that in my view require further clarification, which are outlined below along with some suggestions on how the manuscript may be improved.
1. The authors use outputs from principal component analyses as features in their model (introduced in lines 77 and 349). I believe the authors are referring to Figure S9, though this figure is not referenced anywhere in the manuscript.
Thank you for pointing out this omission. We have added a reference to Figure S9 (Note that it is Figure S1 in the revised manuscript) in lines 98 and 399.
2. The authors contextualize their work by discussing the use of metabolic models (among other methods) and that these methods rely solely on genomic information (beginning at Line 41). While many studies indeed use models generated automatically from genome annotations, it is known that experimental curation is an important step in generating reliable quantitative predictions (PMID: 20057383), and has been applied extensively in model generation (PMID: 31391098, 28266498, 29692801, 21483480). As this is in line with the authors' observations on the essentiality of species monoculture growth, I would recommend rephrasing this statement.
Thank you for the detailed comment, this is an important clarification. We have rephrased the statement: "These approaches are appealing since they rely primarily on genomic information". (Line 61) In addition, we have added the following sentences to the discussion to emphasize the contribution of monoculture growth yields measurements to genome-based approaches: "The ability of the affected species to grow in monoculture in a given carbon environment was the feature that contributes the most to prediction. This is consistent with the fact that methods such as metabolic modelling, which are primarily based on genomic information, require refinement using experimental measurements such as monoculture growth (38-42)".  3. Line 71: "in order to predict how species affect each other:" please specify what is being affected.
We have rephrased this statement to clarify that species' growth is being affected:

"In order to predict how species affect each other's growth, we have used additional information, beyond the interspecific interactions, regarding the species' phylogeny and their monoculture yield in each of the 40 carbon environments". (Lines 88-90)
4. Line 73: Please change "growth" to "growth yield." 5. Line 79: Please change to "principal components." Thank you, we made the changes suggested in both comments (4 + 5) 6. Line 96: For a general audience, a description of tree-based models (and specifically of ensemble models like XGBoost) would be very useful.
While a detailed introduction to tree-based models is beyond the scope of this manuscript, we agree that it is useful to mention the main concepts and ideas behind models such as XGBoost.
To do so, we have added the following short description in the text:

"Briefly, XGboost uses an ensemble of decision trees, where each tree makes predictions by iteratively splitting the data based on individual features. XGboost implements a form of gradient boosting such that each new tree focuses on the samples where the previous trees had the highest error rates. XGboost is widely used since it provides accuracy and efficiency even on large datasets containing many features (34)". (Lines 117-122)
7. Line 110: "most errors in sign prediction occurred for effects whose strength was close to 0." This is very interesting -were there inconsistencies in interaction sign for these examples in the original dataset? Are there other factors generally associated with these weaker interactions?
The true value of effect (that the model is trying to predict) is the median of all replicates of the same pair of species in the tested environment. We chose to classify interaction as negative if the effect is less than or equal to 0. However, there is often variability between replicates, and when the median effect is close to 0 there might be replicates with negative effect and ones with positive effect. Therefore, in these cases there is indeed some uncertainty about the true value of the effect that is being predicted.
Weak interactions are common between species that both grow very poorly in monoculture. In these cases, both species typically grow poorly both in monoculture and in coculture, making the measured effects more susceptible to measurement noise. Beyond that, we could not identify specific factors that are associated with these weaker interactions.
8. Line 139: Please provide citations for "which were previously shown to be predictive of interactions." A citation was added to the sentence: "Surprisingly, using information regarding species' predicted metabolic pathways, which were previously shown to be predictive of interactions (28), instead of information regarding monoculture growth did not improve the predictive ability over a model that only used the species' phylogeny ( Fig. S6) 9. Figure 4: A calculation of statistical significance between the NMRSE distributions would be useful in more fully comparing each model's performance.
We have performed Tukey's HSD test and found that the increase in NRMSE due to the removal of monoculture data is indeed statistically significant. For the novel carbon environments (Fig.  4D), removing the coculture data results in a smaller, yet also statistically significant increase in the NMRSE. We added the results of Tukey's HSD to Figure 4, and included more detailed information regarding both tests in a new supplementary table (Table S3). . Qualitatively similar results were found also for predictions of effect sign (Fig. S7). Pvalues for subplots B and C were calculated using Tukey-HSD test (see Table S3 at 10.6084/m9. figshare.21856578 for additional information regarding the p-values).  10. The model's predictions are based on species' growth yields in mono-and co-culture. While this is appropriate for this experimental system that contained limiting amounts of resources, natural microbiomes are open systems subject to nutrient cycling and periodic resource replenishment. I understand this study is framed as a proof of concept, but it would be valuable for the authors to comment on whether they might expect other ecologically relevant quantities (e.g., growth rate) to have similar degrees of predictive power in a non-laboratory setting. This could help guide the selection of features for future models intended for more complex microbiomes.

Figure 4. The accuracy of predicting one-way effect depends strongly on the availability of
Thank you for the detailed comment, this is indeed important to discuss. As noted in our response to a similar comment from the Editor, we have added a paragraph regarding the limitations of our work when considering natural communities: "While our results demonstrate that bacterial interactions are predictable under simple laboratory conditions, it is still not clear to what extent this predictability extends to nonlaboratory settings and to natural communities. First, many microbes are hard to isolate and culture in the lab (45,46), and therefore monoculture growth yield will typically not be available for many environmental microbes. Since monoculture growth yield was the most informative feature for interaction predictions, not having this information would likely significantly reduce predictability. Moreover, the number of available resources natural communities are exposed to is larger than what our models were trained on, which were simple environments containing a single carbon source. This may make predicting interactions more challenging as species may occupy different niches and both grow well in monoculture without negatively influencing each other's growth. Lastly, our model predicts pairwise interactions, and does not account for the presence of "higher order" interactions (47). Therefore, even if our model accurately predicts the interactions between all species pairs of a natural community, the presence of additional species in the environment may modify these pairwise interactions".  The predictive power of machine learning models for other quantities, such as growth rate, is still unknown, but it would be very interesting to examine in the future. We speculate that, similar to interactions, it will be possible to accurately predict additional ecological parameters in simple laboratory conditions, but that predicting these parameters in natural settings will face similar challenges as the ones discussed above for predicting interactions.
11. Line 335: For those not familiar with the original dataset, please clarify the degree to which each species was represented in the mono-and co-culture datasets.
For monoculture, the dataset included all the species grown in all carbon environments. The coculture data includes ~93% of all possible combinations of species and carbon environments.
Combinations that were not included are those for which the original dataset included less than 4 replicates. We added these details to the method section, and additional information regarding the specific missing coculture data was added as supplementary text (Table S5): 12. Line 361: "highest accuracy for qualitative predictions" -is this appropriate given the imbalanced nature of the dataset (i.e. more negative interactions)?
This is a very important comment, thank you. Indeed, all models should be hyper-tuned according to a score which is more appropriate to use when dealing with imbalanced data. We retrained all our models in order to maximize the MCC (Matthew's correlation coefficient) and update all results using the newly trained models. Two of threshold models (monoculture growth threshold and metabolic distance threshold) were retrained as well in order to maximize the MCC (and not the accuracy) on the train set. Using the MCC for cross-validation had only a small effect on model performance (MCC of the test set changed from 0.636 to 0.658). In addition, the type of the best-performing model (XGboost) didn't change. We have updated the manuscript accordingly.
13. Figure S2B: The total N should be specified or the figure should be normalized to the total number of effects.
We have added the total N to the figure (Note that this is Figure S3B in the revised manuscript).
Reviewer #2 (Comments for the Author): The manuscript "Interactions between culturable bacteria are predicted by individual species growth" by Nestor et al aims at building a predictive model for pairwise interactions between bacteria. The predictive model is based on a rather unique collection of 7500 experimental measurements of pairwise growth of 20 species grown on various carbon sources that was built as part of an earlier study (Kehe at al 2021). The current study is a natural continuation, using this data as a training set for a predictive model. The authors first demonstrate the performances of their model and then analyze the contribution of each of the different features on the performance. The importance if this work is in providing a guideline for the design of strategies for the culturing of currently uncultured species. Moreover, such predictive model and the accompanied statistical analysis are of considerable importance for understanding the key aspects for community engineering and for the design of synthetic consortia. Though the text is generally well written, several parts are not sufficiently clear and additional information can be useful. Specific request for clarifications are detailed below.
1. Tables showing the distribution of positive/negative interactions and the TP/TN/FP/FN for each category for the different methods used for the one-way interactions as well as similar information regrading the distribution of the different types of pair-wise interactions will provide a clearer description of the data as well a straightforward way of estimating how well the model performs considering different interaction types. Whereas the text analyses the effect of various features on prediction capacity, the effect of the type of interaction on model performances should also be discussed (at least the type/directionality of interactions) Examining the effect of the interaction type of prediction accuracy is indeed an interesting point. Thank you for this suggestion. We found that positive effects are harder to predict and that mutualisms (+/+) are particularly challenging -they are more often classified as parasitisms (+/-) than as mutualisms. We have included these results to the main text: "Moreover, it appears that predictions involving positive effects were less accurate: true negative effects were correctly classified more frequently than positive ones (95% true negative rate vs 66% true positive rate, Fig. S4A) and effects classified as negative were more likely to be true than effects classified as positive (90% negative predictive value vs 80% positive predictive value, Fig. S4A