Machine Learning Reveals Time-Varying Microbial Predictors with Complex Effects on Glucose Regulation

ABSTRACT The incidence of type 2 diabetes (T2D) has been increasing globally, and a growing body of evidence links type 2 diabetes with altered microbiota composition. Type 2 diabetes is preceded by a long prediabetic state characterized by changes in various metabolic parameters. We tested whether the gut microbiome could have predictive potential for T2D development during the healthy and prediabetic disease stages. We used prospective data of 608 well-phenotyped Finnish men collected from the population-based Metabolic Syndrome in Men (METSIM) study to build machine learning models for predicting continuous glucose and insulin measures in a shorter (1.5 year) and longer (4 year) period. Our results show that the inclusion of the gut microbiome improves prediction accuracy for modeling T2D-associated parameters such as glycosylated hemoglobin and insulin measures. We identified novel microbial biomarkers and described their effects on the predictions using interpretable machine learning techniques, which revealed complex linear and nonlinear associations. Additionally, the modeling strategy carried out allowed us to compare the stability of model performance and biomarker selection, also revealing differences in short-term and long-term predictions. The identified microbiome biomarkers provide a predictive measure for various metabolic traits related to T2D, thus providing an additional parameter for personal risk assessment. Our work also highlights the need for robust modeling strategies and the value of interpretable machine learning. IMPORTANCE Recent studies have shown a clear link between gut microbiota and type 2 diabetes. However, current results are based on cross-sectional studies that aim to determine the microbial dysbiosis when the disease is already prevalent. In order to consider the microbiome as a factor in disease risk assessment, prospective studies are needed. Our study is the first study that assesses the gut microbiome as a predictive measure for several type 2 diabetes-associated parameters in a longitudinal study setting. Our results revealed a number of novel microbial biomarkers that can improve the prediction accuracy for continuous insulin measures and glycosylated hemoglobin levels. These results make the prospect of using the microbiome in personalized medicine promising.

epidemic of T2D and improve public health, an understanding of the first stages of this disease is necessary for preventive actions. Recently, the bacterial communities residing in our intestines have become a topic of interest as a potential way to prevent the development of glucose dysregulation. The microbiome has been shown to modulate a variety of physiological functions, such as gut permeability, inflammation, glucose metabolism, and fatty acid oxidation, supporting an important role of the microbiome in the pathophysiology of T2D (2).
Numerous studies have already reported changes in the gut microbiome in subjects with T2D or prediabetes compared to healthy individuals (3)(4)(5). Although there is information that the abundance of bacteria such as Roseburia and Bifidobacteria is altered in subjects with T2D (2), compelling evidence that supports the use of gut microbiome as a predictive tool for T2D is lacking, as a majority of the findings are based on crosssectional studies. However, in order to assess the microbiome as a prognostic tool for T2D, prospective studies are needed.
T2D is a heterogeneous disease with multiple pathophysiological pathways involved (6). Thus, in order to fully understand the role of the microbiome in the risk of T2D, a case-control design might not be sufficient. As the progression of the disease is a continuous process, detailed data about metabolic outcomes such as continuous glucose and insulin measurements could help to unravel the disease mechanisms involving the microbiome.
Together with heterogeneity in the first stages of T2D, the gut microbiome itself is known to be highly personalized (7,8). Variability in continuous metabolic outcomes and gut microbiome lead to difficulties in reproducing the results obtained and raises the need for robust modeling strategies. Machine learning methods have been shown to capture various complex association patterns from different data types. Although machine learning has become popular in microbiome studies as well, the ability of the algorithms to provide robust results remains unclear (9,10).
We now report the application of a random forest algorithm on microbiome data to predict multiple continuous metabolic outcomes that influence the development of T2D in a longitudinal study setting. We identify microbial biomarkers for the metabolic outcomes and describe their effects on the predictions using interpretable machine learning techniques. In addition, we show that there are significant differences in the identified biomarkers between long and short follow-up periods. We also show how the modeling procedure significantly influences the results.

RESULTS
Study design. We used prospective data of well-phenotyped Finnish men collected from a population-based Metabolic Syndrome in Men (METSIM) study. A comprehensive machine learning strategy was implemented to identify microbial biomarkers and their effect on numerous metabolic traits. A graphical overview of the study design and modeling procedure is shown in Fig. 1. Random forest models were trained to predict the metabolic outcomes of interest in the follow-up using the baseline microbiome (MB), metabolic outcomes (MO), and additional covariates (CoV) such as body mass index and age as predictors. To evaluate the effect of the microbiome, models including microbial predictors were compared to models excluding microbial predictors. In order to assess the temporal changes in biomarker selection and predictive performance, independent prospective models were trained for the 18-month and 48-month follow-up period. To evaluate the model generalizability and stability, model training was repeated 200 times with a different train-test split made each run. Permutation feature importance metrics were used to identify microbial biomarkers. Finally, accumulated local effects methodology was used to plot the effect of the microbial biomarkers for predicting the corresponding metabolic trait.
Model stability and generalizability. In the first step, we tested whether we could improve the prediction of metabolic outcomes using microbiome data as an additional predictor. The human gut microbiome is known to be highly variable and personalized (7,8). Thus, estimating the robustness of the predictive models is essential. The problem with microbiome data based on our experience is that the performance of the model might be highly dependent on the initial data split for training and test sets. The models were run 200 times with different initial splits to assess the impact of the data split. Table 1 summarizes the obtained results.
These results highlight the variability in performance estimates occurring due to the data split. Out of 200 data splits, the number of models that took advantage of using microbial predictors varies around 100, which implies that the data split plays an important role in the outcome. Our results suggest that for the 18-month time frame, microbiome as a predictor can improve the prediction accuracy for secretion index, glycosylated hemoglobin (HbA1c), and 2-h insulin levels. For secretion index, models including microbial predictors outperformed simpler models in 61% of the cases, for 2-h insulin in 70.5% of the cases, and for HbA1c in 64.5% of the cases. The  Remarkably, the variation in differences in RMSE between the model including microbial predictors and the model excluding microbial predictors over the 200 runs is large. Due to the high variability, the potential of improvement in prediction accuracy when microbiome data are used remains largely unclear.
Novel predictive microbial biomarkers for metabolic outcomes. In order to find microbial markers that are predictive for the metabolic outcomes, average feature importance scores over 200 runs were compared. Figure 2 shows the average importance score of the top 50 microbial predictors for metabolic outcomes that took advantage of using microbial predictors. It can be seen that certain microbial predictors significantly stand out for each metabolic outcome and time frame combination.
For an 18-month time frame ( Fig. 2A, Table S2), the most important microbial predictors for 2-h insulin include genus Methanobrevibacter and numerous genera from the phylum Firmicutes, such as [Ruminococcus] torques group, UC5-1-2E3, Subdoligranulum, and Christensenellaceae R-7 group. Predictors for HbA1c are the genus Ruminiclostridium 5, the genus Paraprevotella, an unclassified member of the family Muribaculaceae, and members of Clostridiales vadinBB60 group. An unclassified member of the family Muribaculaceae together with Papillibacter and Oscillospira are significant predictors for secretion index.
For the 48-month time frame (Fig. 2B, Table S3), top predictors for 2-h insulin include uncultured Rhodospirillales and UC5-1-2E3. Distinguishable genera according to the average importance score are also Family XIII AD3011 group, Shuttleworthia and Odoribacter. Significant predictors for fasting insulin are uncultured Rhodospirillales, uncultured Prevotellaceae, and the genus Alistipes. For secretion index, the genus Enterorhabdus together with Asteroleplasma prove to be the most important predictors, with Family XIII AD3011 group slightly standing out.
There is overlap in the most important microbial markers found for predicting different metabolic outcomes. In the 18-month follow-up period, unclassified Muribaculaceae is a significant predictor for secretion index and HbA1c. For the 48-month follow-up period, Family XIII AD3011 group is a predictor for secretion index and 2-h insulin, and uncultured Rhodospirillales is an important predictor for fasting insulin and 2-h insulin. Additional overlap can be seen among the top 10 microbial predictors according to the average permutation importance score (Tables S2 and S3).
Interpreting the effect of microbial biomarkers on the predictions. Together with finding the relevant biomarkers, understanding how they influence the predictions is necessary. This task is complicated for most of the machine learning algorithms, which is why they are considered "gray-box" or "black-box" methods. Recently, much attention has been put into explaining the predictions of such models. Here, we implemented accumulated local effect (ALE) plots that aim to describe the effect of a certain predictor on the metabolic outcome independently of the remaining predictors (11). Accumulated local effect plots for the previously highlighted most significant microbial biomarkers are shown in Fig. 3. Accumulated local effect plots for the top 10 microbial predictors are shown in Fig. S1 and S2. In most cases, ALE plots show nonlinear associations between a microbial predictor and metabolic outcome of interest. Although large variability in the effect estimates between the different data splits can be seen, the shape of the effect stays relatively stable for all microbial predictors.
Considering the 18-month time frame (Fig. 3A, Fig. S1), higher centered log-ratio (CLR)transformed abundances of genera from the Lachnospiraceae family-[Ruminococcus] torques group and UC5-1-2E3-lead to higher predictions for 2-h insulin. High CLRtransformed abundances of the genera Subdoligranulum, Methanobrevibacter, and Christensenellaceae R-7 group lower the predictions for 2-h insulin. For HbA1c, higher CLR-transformed abundance of Ruminiclostridium 5 leads to higher predictions. In contrast, high CLR-transformed abundances of bacteria from the family Muribaculaceae, members of Clostridiales vadinBB60 group, and Paraprevotella reduce the levels of HbA1c. For secretion index, the prediction might depend on the presence-absence of the unclassified genus from the family Muribaculaceae, because the ALE plot stays relatively stable after an initial decrease from the minimum values of CLR-transformed abundances. High CLR-transformed abundances of Oscillospira and Papillibacter decrease the predictions for secretion index.
Considering the 48-month follow-up period (Fig. 3A, Fig. S2), high CLR-transformed abundances of the genera Firmicutes Family XIII AD3011 group, Odoribacter, and unclassified Rhodospirillales lead to lower predictions for 2-h insulin. In contrast, extremely high CLR-transformed values of the genus UC5-1-2E3 lead to higher predictions. Shuttleworthia seems to show a presence-absence effect, as the drop from the lowest CLR-transformed values lowers the predictions for 2-h insulin. For fasting insulin, higher CLR-transformed abundances of unclassified Rhodospirillales and Alistipes lower the predictions. In contrast, high CLR-transformed abundances of an unclassified genus from the Prevotellaceae family leads to higher predictions for fasting insulin. Interestingly, extremely low values of Alistipes lead to higher predictions for fasting insulin than when Alistipes levels are within 2.5% and 97.5% quantiles. A similar effect for the genus Asteroleplasma on secretion index can be seen as extremely high CLR-transformed abundance of Asteroleplasma leads to drastically higher predictions. The genus Enterorhabdus might show presence-absence effects, with the presence of Enterorhabdus leading to decreased predictions. Lastly, high CLR-transformed abundance of the genus Family XIII AD3011 group leads to higher predictions.
Comparison of microbial predictors in different time points. Independently modeling the two scenarios with various follow-up times allowed us to compare the most relevant predictors to see if the effect and choice of microbial biomarkers remains the same. Considering metabolic outcomes that the microbiome data helped to predict, only one microbial predictor for the same metabolic outcome was shared (Fig. 3B). The genus UC5-1-2E3 from the Lachnospiraceae family was found to be among the top predictors for 2-h insulin in the 18-month and 48-month time frame. Among the top 10 predictors for each target variable, Escherichia-Shigella was also shared for 2-h insulin ( Fig. S1 and S2).
The shape of the effect for UC5-1-2E3 stays relatively stable, with extreme values for the genus showing higher predictions for both follow-up periods. This suggests that the genus UC5-1-2E3 could be considered a robust biomarker for predicting 2-h insulin. Nevertheless, all other genera from the top microbial predictors were specific for a certain time frame.

DISCUSSION
We used machine learning to predict multiple metabolic outcomes (continuous glucose and insulin measures, HbA1c) over time periods of various lengths using the gut microbiome as a predictive measure. Furthermore, the modeling strategy carried out allowed us to understand the variability in performance estimates and biomarker selection. We described how high variability and personalization of the human gut microbiome leads to large variations in the performance estimates. We showed that microbial predictors can improve the prediction accuracy for continuous insulin measures and glycosylated hemoglobin in addition to conventional risk factors, additionally highlighting differences in short-and long-term cases. Finally, we identified microbial biomarkers that contribute to the improved performance and described their effect on the outcome.
Most of the current studies describing the role of bacteria in diabetes have been case-control studies, with diabetes being a binary trait defined by setting a cutoff to some continuous glucose measure (3,4,12). Type 2 diabetes, however, is a disease preceded by a long-lasting prediabetic state, and the development of the disease is a continuous process (13). Detailed phenotyping is definitely a strength of this study, as it allows us to study the first stages of disease progression. Our results suggest that bacteria provide a means for predicting changes in insulin secretion and insulin response to glucose intake. A causal effect of microbiome-produced short-chain fatty acids (SCFA) has been confirmed with respect to various insulin measures, primarily insulin secretion (14). Figure S3 shows that 2-h insulin levels first increase in subjects with prediabetes, defined by the WHO classification, as a compensatory mechanism to keep glucose levels in the normal range. Thus, 2-h insulin values are among the first indicators for the development of diabetes. Therefore, our results provide valuable insight into the potential application of the microbiome as a predictive measure for T2D and highlight the need for detailed phenotyping in order to fully understand the role of the microbiome in this disease.
Recently, Gou et al. (12) used a similar interpretable machine learning strategy and found bacteria that effectively differentiated type 2 diabetes cases from healthy controls in the Chinese population. Additionally, they built a microbiome risk score (MRS) and showed the causal role of identified bacteria on diabetes development after fecal microbiota transplantation to mice. The microbial predictors found do not show significant overlap with our findings. Only Alphaproteobacteria found by Gou et al. can be considered overlapping. We found one taxon from the class Alphaproteobacteria-an uncultured genus from the order Rhodospirillales-to predict fasting insulin and 2-h insulin in a 48-month time frame. We found a higher CLR-transformed abundance of an unclassified Rhodospirillales genus decreasing type 2 diabetes risk, which is consistent with the findings of Gou et al. Multiple reasons might explain the observed inconsistencies. Importantly, our study was specifically designed to find prospective predictors for continuous measures. Another possible difference is the cohort structure. Our study included men exclusively, compared to 33.1% in Gou et al. The effect of sex on the gut microbiome is not clear but cannot be ruled out (15,16). Also, the metagenomic analyses of European women and Chinese subjects have shown differences, which is why geographic differences in microbiome are also a possibility (3,4).
Rhodospirillales, one of the strongest predictors in the current study, was found to be predictive for fasting and 2-h insulin in a 48-month follow-up. The order Rhodospirillales consists of bacteria that are known to produce acetic acid (17), which has been shown to improve insulin sensitivity (18,19). Several other detected microbial predictors have been previously described elsewhere as being associated with T2D or glucose regulation. Zhou et al. (20) showed that the genus Odoribacter was negatively associated with steady-state plasma glucose, which is consistent with our results for predicting 2-h insulin. Krych et al. carried out a study on mice and identified Muribaculaceae (previously classified as S24-7) to be protective against T2D (21), which corresponds to the protective effect for HbA1c seen in our study.
Previously inconsistent associations have also been reported. We found a higher CLR-transformed abundance of Alistipes to predict lower values for fasting insulin, which is not consistent with the results obtained by Wu et al. (22), who showed positive associations with type 2 diabetes. Subdoligranulum has been found to be enriched in type 2 diabetes cases (23), which is inconsistent with our results, as higher CLR-transformed abundance predicts lower values for 2-h insulin. Similar to the work by Gou et al. (12), the main reasons behind these inconsistencies are likely study design and population structure. We note that microbiome composition has been shown to be associated with numerous factors not considered confounders in our study due to data availability (24,25). Thus, inconsistencies with previously reported results and added predictive value for metabolic traits could be explained by uncontrolled covariates.
We are not aware of any population with a similar follow-up period and where microbiome data are available and an oral glucose tolerance test has been carried out at the baseline and at the follow-up. Therefore, we could not replicate our findings in other populations using similar study design.
How machine learning techniques can best utilize microbiome data is still an open question (26). Therefore, the true potential of the gut microbiome for predicting T2D remains unknown. Additionally, taking the compositional nature of microbiome data into account is crucial for all types of analysis and machine learning applications (26). Previous studies have shown the advantage of using log-ratio transformations for overcoming the limitations of working with compositional data. For example, Quinn and Erb (27) and Tolosana-Delgado et al. (28) showed how centered log-ratio (CLR)-transformed data can outperform raw proportions. Moreover, Tolosana-Delgado et al. (28) showed how pairwise log-ratio transformation can greatly outperform CLR transformation when a random forest algorithm is used. Thus, novel methods and strategies for handling compositionality might substantially improve the prediction accuracy for continuous metabolic outcomes. Also, shotgun metagenomics can provide more accurate taxonomic resolution than 16S sequencing (29).
It needs to be highlighted that even in the best scenario, when predicting 2-h insulin values in the 18-month scenario, incorporating microbial predictors did not improve prediction accuracy in 29.5% of the data splits. The high variability in performance estimates shows the necessity for robust modeling strategies to achieve reliable and generalizable performance. Our data clearly indicate that conventionally used 10-fold cross-validation might not be sufficient to obtain generalizable models when sample sizes stay relatively small compared to the number of microbial features. Taken together, the variability of model performance estimates in microbiome studies can be large and needs to be given attention in order to gain a proper understanding of the predictive ability of the microbiome. To reduce the variability in performance estimates and increase prediction accuracy, future studies can take advantage of combining shotgun sequencing with the bestperforming modeling strategies, including additional microbiome-associated covariates, and integrating clinical and microbiome data from multiple time points as predictors.
Conclusions. In summary, our findings provide a clear indication that the microbiome, together with conventional risk factors, can be used to predict multiple metabolic outcomes. The detailed clinical characterization and longitudinal study design of the METSIM cohort make it particularly useful for understanding host-microbiome relationships. We have identified a number of novel microbial biomarkers which could predict metabolic traits associated with the prediabetic state. Our data provide a significant resource for further studies to determine the causal relationship of the identified biomarkers to the progression of T2D. Therefore, the prospect of using the microbiome in personalized medicine is promising.

MATERIALS AND METHODS
Study population and characterization. METSIM (Metabolic Syndrome in Men) is a randomly selected cohort of men from eastern Finland, aged 45 to 73 years, who have been carefully phenotyped Aasmets et al.
for different metabolic traits such as T2D, hypertension, and obesity. We investigated a subset of the METSIM cohort that took part in the METSIM follow-up study and from whom stool samples were collected (n = 608). The data resource consists of samples taken from three time points-at baseline (baseline of METSIM 5-year follow-up study), at 18-month follow-up, and at 48-month follow-up. At each time point, the subjects went through a 1-day outpatient visit, during which they provided blood samples after an overnight fast, and various parameters such as height, weight, and blood pressure were measured and an oral glucose tolerance test (OGTT) was performed. Additionally, at the baseline visit, the subjects were interviewed about their history of diseases and drug usage. The full study protocol and data resources are described in Laakso et al. (30). All subjects have given written informed consent, and the study was approved by the Ethics Committee of the University of Kuopio and was in accordance with the Helsinki Declaration.
In contrast to case-control studies, continuous "metabolic outcomes" (MO) were used as target variables in the modeling framework. The advantage of using continuous metabolic outcomes is that the phenotype is more distinct and there are no borderline cases with similar abilities of handling glucose as there likely are in the case-control setting (6). In total, two glucose measures, two insulin measures, glycosylated hemoglobin, and three calculated glucose regulation indices were considered (Fig. 1). Glycosylated hemoglobin (HbA1c), fasting insulin, 2-h insulin, fasting glucose, and 2-h glucose were measured according to the study protocol (30). The Matsuda insulin sensitivity index was calculated according to reference 31. The insulin secretion index was calculated as Secretion index = AUC Insulin(0-30min) /AUC Glucose(0-30 min) , where the area under curve (AUC) was calculated using the trapezoidal formula. The disposition index was calculated as Disposition index = Secretion index · Matsuda. The Matsuda insulin sensitivity index and the insulin secretion index have been previously shown to be best estimates for insulin sensitivity and insulin secretion in the METSIM cohort (32). Summary statistics for metabolic outcomes and additional covariates considered predictors in the machine learning models are shown in Table S1.
Microbiome data collection, sequencing, and data processing. Stool samples were collected at the baseline visit during the evaluation at the University of Kuopio Hospital and immediately stored at 280°C. Microbial DNA was extracted using the PowerSoil DNA isolation kit (MoBio Laboratories, Carlsbad, CA, USA) following the manufacturer's instructions. The fecal microbiota composition was profiled by amplifying the V4 region of the 16S rRNA gene with 515F and 806R primers as previously described (33). PCR products were quantified with a Quant-iTTM PicoGreen double-stranded DNA (dsDNA) assay kit (Thermo Fisher). Samples were combined in equal amounts (;250 ng per sample) into pools and purified with the UltraClean PCR clean-up kit (MoBio). Sequencing was performed on an Illumina HiSeq 3000 instrument.
Raw demultiplexed data were imported into the open-source software QIIME2 version 2019.7 using the q2-tools-import script with the CasavaOneEightSingleLanePerSampleDirFmt input format (34). DADA2 software was used for denoising (35). DADA2 uses a quality-aware model of Illumina amplicon errors to attain an abundance distribution of sequence variance, which has a difference of a single nucleotide. The q2-dada2-denoise-single script was used to truncate the reads at position 123; trimming was not applied. Chimera removal was done with the "consensus" filter, in which chimeras are detected in each sample individually, and sequences established as chimeric in a certain fraction of samples are removed. After the denoising step, amplicon sequence variants (ASVs), equivalent to operational taxonomic units (OTUs), were aligned using MAFFT (36), and the phylogeny was constructed with FastTree (37). Taxonomy assignment was done using the q2-feature-classifier with the pretrained naive Bayes classifier based on reference reads from the SILVA 16S V3-V4 v132_99 database with a similarity threshold of 99%. Seven samples did not pass quality control during the sequencing process and were removed from further analysis.
The average number of reads per sample was 1,351,289, and samples with less than 100,000 reads were excluded from further analysis. The rest of the samples were aggregated to the genus level, which resulted in 553 genera. An additional filtering procedure was carried out to include only the most common genera for the prediction task. Genera that appeared in at least 50% of the samples were included in the final modeling task, 172 in total.
Due to the nature of sequencing, read counts are uninformative and must be considered relative to the total sum of reads for a given sample (38). In order to compensate for the compositional nature of the data, centered log-ratio (CLR) transformation was used as first proposed by Aitchison (39): Zero replacement was carried out using the R package zCompositions (40). Random forest implementation and statistical analysis. For modeling, we used samples with microbiome data available at the study baseline that did not include missing values on any of the metabolic parameters considered. In addition, subjects who had reimbursement for drug treatment of diabetes were excluded. This resulted in 529 participants for the 18-month follow-up visit and 482 participants for the 48-month follow-up visit.
All random forest models were implemented in R using the caret package and fast implementation of the random forest algorithm named ranger (41). Data sets were repeatedly split in a 75/25 ratio to training/test data sets, respectively, using a different seed each time. Models were tuned on training data using 10-fold cross-validation and random hyperparameter search with 100 hyperparameter combinations.
Performance of the models was evaluated on the test data set using root-mean-square error (RMSE). In the case of random forest models, out-of-bag (OOB) error is also widely used to evaluate model performance. Although using out-of-bag error for evaluation can increase the sample size for model training, it has been shown that in some cases the OOB error is largely overestimated and unreliable (42). Thus, for robust estimates, test data were used for evaluation. Permutation feature importance was used for selecting the microbial biomarkers. For explaining the obtained random forest models, accumulated local effects (ALE) plots were implemented using the R package DALEX (43). ALE plots aim to describe the effect of a certain predictor on the metabolic outcome independently of the remaining predictors (11).
A one-tailed binomial test was carried out to test whether the probability of the model including microbial predictors outperforming the model excluding microbial predictors is greater than 0.5. The Bonferroni correction was applied to assess significance (eight metabolic outcomes and two time points; P , 0.05/16). Data availability. Individual-level 16S RNA sequencing data are available in the Sequence Read Archive (SRA) under BioProject number PRJNA644655. All remaining phenotype data in this study are available upon request through application to the METSIM data access committee. R codes used for the analysis are available at https://doi.org/10.5281/zenodo.4422486.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.