Gut metabolome and microbiota signatures predict response to treatment with exclusive enteral nutrition in a prospective study in children with active Crohn’s disease

Background Predicting response to exclusive enteral nutrition (EEN) in active Crohn’s disease (CD) could lead to therapy personalization and pretreatment optimization. Objectives This study aimed to explore the ability of pretreatment parameters to predict fecal calprotectin (FCal) levels at EEN completion in a prospective study in children with CD. Methods In children with active CD, clinical parameters, dietary intake, cytokines, inflammation-related blood proteomics, and diet-related metabolites, metabolomics and microbiota in feces, were measured before initiation of 8 wk of EEN. Prediction of FCal levels at EEN completion was performed using machine learning. Data are presented with medians (IQR). Results Of 37 patients recruited, 15 responded (FCal < 250 μg/g) to EEN (responders) and 22 did not (nonresponders). Clinical and immunological parameters were not associated with response to EEN. Responders had lesser (μmol/g) butyrate [responders: 13.2 (8.63–18.4) compared with nonresponders: 22.3 (12.0–32.0); P = 0.03], acetate [responders: 49.9 (46.4–68.4) compared with nonresponders: 70.4 (57.0–95.5); P = 0.027], phenylacetate [responders: 0.175 (0.013–0.611) compared with nonresponders: 0.943 (0.438–1.35); P = 0.021], and a higher microbiota richness [315 (269–347) compared with nonresponders: 243 (205–297); P = 0.015] in feces than nonresponders. Responders consumed (portions/1000 kcal/d) more confectionery products [responders: 0.55 (0.38–0.72) compared with nonresponders: 0.19 (0.01–0.38); P = 0.045]. A multicomponent model using fecal parameters, dietary data, and clinical and immunological parameters predicted response to EEN with 78% accuracy (sensitivity: 80%; specificity: 77%; positive predictive value: 71%; negative predictive value: 85%). Higher taxon abundance from Ruminococcaceae, Lachnospiraceae, and Bacteroides and phenylacetate, butyrate, and acetate were the most influential variables in predicting lack of response to EEN. Conclusions We identify microbial signals and diet-related metabolites in feces, which could comprise targets for pretreatment optimization and personalized nutritional therapy in pediatric CD.


Introduction
Treatment with exclusive enteral nutrition (EEN) induces clinical remission in 80% of children with active Crohn's disease (CD), but fewer patients show normalization of gut inflammatory biomarkers, such as fecal calprotectin (FCal) at treatment completion [1,2].There is currently strong interest in stratified or personalized medicine; particularly for conditions in which response to therapies is variable, such as EEN in CD.The evolution of machine learning and the progress of high-throughput sequencing have begun to answer important questions in the etiology and management of inflammatory bowel disease (IBD), particularly through the integration of multiple parameters such as disease phenotype, blood and immune function markers, and the intestinal microbiota and its metabolites [3][4][5][6].Such technologies and system biology may help predict therapeutic outcomes and lead to a novel understanding of underpinning mechanisms of disease pathogenesis.Integrating this approach into routine clinical care could ultimately allow patient stratification to guide treatment decisions, pretreatment optimization and therefore a more efficacious and cost-effective approach to patient care.
The literature exploring predictive factors of EEN response is sparse and mostly focused on clinical parameters such as disease phenotype [2] and disease severity during the initial period of treatment [7].Hence, the objective of this study was to analyze, for the first time, to our knowledge, an extensive set of pretreatment factors as predictors of response to EEN.We included disease phenotype and characteristics, anthropometry, dietary intake, routine disease markers, inflammatory cytokines, plasma inflammation-related proteomic markers and diet-related bacterial metabolites, and the metabolome and microbiota in feces; all before EEN initiation.Although widely applied in clinical practice, disease activity indices, such as the weighted pediatric Crohn's disease activity index (wPCDAI) [8], only very broadly correlate with histological and endoscopic activity [9,10].We therefore chose FCal, which is more sensitive to detecting endoscopic activity in IBD [9,11] and thus might serve as a more appropriate biomarker for assessing response to EEN in the absence of endoscopy and as a potential "treat-to-target" biomarker.

Subjects
Children with active CD receiving EEN (Modulen IBD) for 8 wk were recruited prospectively at the Royal Hospital for Children, Glasgow, between October 2014 and May 2017.The clinical outcomes of the entire patient cohort were published previously [1].In the current study, we included only participants who completed EEN and provided paired fecal samples at treatment initiation and completion (Supplemental Table 1 and Supplemental Figure 1).Exclusion criteria included use of antibiotics and probiotics 1 mo prior to EEN initiation, and concomitant use of other induction therapy during EEN.Patients who were administered antibiotics and probiotics during their course of EEN were subsequently excluded too.Clinical and anthropometric parameters, dietary intake, cytokines, inflammation-related blood proteomics, fecal diet-related metabolites, and metabolome and microbiota parameters were explored as predictors of response to treatment with EEN.

Disease characteristics and clinical parameters
Data on routine blood inflammatory markers [for example, Creactive protein (CRP), FCal, demographics, anthropometry, wPCDAI, Bristol stool chart score, and disease phenotype] were collected prospectively [12].FCal was measured in house at the end of the study (Calpro).The primary outcome utilized for prediction of EEN response was normalization of FCal concentration (250 μg/g plus 10% to account for in-house measurement assay inter-and intravariation) at EEN completion.

Dietary intake
Prior to initiation of EEN, the intake of macronutrients, fiber, and energy was estimated using the Scottish collaborative food frequency questionnaire (FFQ) for children [13].The 148 food items in the FFQ were grouped under 16 food groups with frequency of portion consumption per day.Macronutrient intake was expressed as a percentage of total energy intake, except for fiber which was expressed as g/1000 kcal/d.Food group intake was standardized as portion/1000 kcal/d.

Plasma inflammatory cytokines and inflammation-related proteomics
The absolute concentration of 19 cytokines and relative concentration of 92 inflammation-related proteomic markers were measured in plasma with the Meso Scale Diagnostics platform (Meso Scale Diagnostics) and Olink assays (Olink Proteomics), respectively.

Diet-related bacterial metabolites and fecal metabolomics
The entire bowel motion was collected fresh in an empty, disposable container.Immediately following defecation an anaerobic sachet (Anaerocult A, Merck) was placed above the recipient container to reduce oxygen concentration and the samples were placed in a cool bag along with icepacks.Within 4 h of defecation, samples were transported to the laboratory, homogenized using mechanical kneading with a blender and aliquots were stored for downstream analyses.Aliquots for the measurement of short-chain fatty acids (SCFAs, stabilized with NaOH 1M) and total (stabilized with zinc acetate 0.11M) and free sulfide (stabilized with NaOH 1.25M) were stored in À20 C, whereas aliquots intended for LC-MS, Proton nuclear magnetic resonance ( 1 H NMR) metabolomics and microbiota analysis were stored in À80 C. In feces, the concentration of short (SCFA), branched, and medium chain fatty acids was measured using gas chromatography coupled with a flame ionization detector (GC) (Supplemental Methods), and ammonia was measured on the day of sample collection with an automated analyzer (HI 96715, Hanna Instruments), and free/total sulfide in stored samples with colorimetric assays [14].Fecal pH and fecal water content (%) were also measured [15]. 1 H NMR metabolomics of fecal samples was performed using a 500 MHz spectrometer, using a One dimension Nuclear Overhauser Effect Spectroscopy pulse sequence with water suppression (Supplemental Methods).Comparisons were carried out in annotated and quantified metabolites [16].

Fecal microbiota
Genomic DNA was extracted from feces within 2 mo of sample collection, as described previously and in Supplemental Methods [17].Total bacterial load in feces was measured by qPCR (forward primer: CGG TGA ATA CGT TCC CGG and reverse primer: TAC GGC TAC CTT GTT ACG ACT T) using Taqman chemistry [14].The V4 region of the 16S rRNA gene was sequenced (MiSeq) in fecal samples using 2 Â 250-bp paired-end reads.The V4 region was amplified (forward primer: GTGCCAGCMGCCGCGGTAA and reverse primer: GGAC-TACHVGGGTWTCTAAT) using fusion Golay adapters barcoded on the reverse strand.Barcoded amplicons were purified using the Zymoclean Gel DNA Recovery Kit (D4001, Zymo Research).

Bioinformatics
Operational taxonomic units (OTUs) were constructed from the raw 16S rRNA sequencing data at a similarity of 97% using the VSEARCH pipeline (https://github.com/torognes/vsearch/wiki/VSEARCHpipeline)[18].Paired reads were merged and quality filtering was performed with a maximum expected error value of 0.5.Sequences longer than 275 bp and shorter than 225 bp were discarded, reads were dereplicated across all samples and singleton sequences were filtered out.Chimeras were identified and eliminated using the VSEARCH implementation of the UCHIME de novo algorithm after preclustering at 98%.The UCHIME reference-based chimera detection method was then applied using the "Gold" ChimeraSlayer database [19].OTUs were generated by clustering the remaining sequences at 97%.Taxonomic classification to genus level was performed using the Ribosomal Database Project Naïve Bayes Classifier algorithm in conjunction with the SILVA (version 123) database [20,21].
In plasma proteomic and fecal metabolome analysis, data below the limit of detection were replaced by half of the minimum detected value.For multivariate analysis, the fecal metabolite concentrations from 1 H NMR analysis initially expressed as μg/g of wet fecal sample, were further normalized by total sum, log-transformed, and scaled using unit variance scaling, whereby the data are mean-centered and divided by the standard deviation for each variable.

Statistical analysis
Statistical analyses were carried out with Excel (version 2310, Microsoft Corporation) and R statistical software (version 4.2.3,R Foundation) with RStudio (version 2023.03.0,Posit Software).For nonparametric data, Mann-Whitney U tests were used for all comparisons between groups, and Spearman's rank correlation test was performed for correlations.Continuous variables were summarized with medians and IQR, and categorical parameters were presented by counts and frequencies.
Fecal microbial community structure was analyzed using the vegan R package, and responders and nonresponders were compared in terms of α diversity, using the Chao1 richness estimate and Shannon diversity index, and β diversity, using Bray-Curtis dissimilarity, nonmetric multidimensional scaling and β dispersion [22].For taxon abundance analysis, we normalized the dataset using total sum scaling normalization combined with centered log-ratio transformation.We removed low abundant features by keeping taxa that accounted for >0.01% of all reads.All microbial diversity and taxon abundance analyses were carried out at the OTU level.
The plasma inflammation-related proteomic profile was evaluated using the performance of principal components analysis (PCA).Separation between groups was assessed with the use of permutation analysis of variance (ANOVA) tests on Euclidean distance matrices.Discriminant proteins were identified with the use of Mann-Whitney U tests.Results of differential analysis for all datasets were corrected for multiple testing with the Benjamini-Hochberg method.P values below 0.05 were considered statistically significant.For multivariate analysis of the 1 H NMR metabolome data, PCA and Orthogonal Projections to Latent Structures Discriminant Analysis ordination were applied to identify differences between groups using MetaboAnalystR.Levels of individual metabolites were compared using Mann-Whitney tests between groups.

Random forest modeling
For the prediction of EEN response from the various datasets, we generated random forest (RF) models, which use a machine learning algorithm widely applied for classification and prediction purposes on multiomics data [23].RF analysis was performed using the R package, randomForest [24], separately for microbiota, 1 H NMR metabolome, cytokines, inflammation-related proteomics, SCFA, dietary intake, routine clinical datasets, and inflammatory biomarkers (for example, CRP and FCal) at EEN initiation.Variable optimization was applied using the FeatureTerminatoR R package which minimizes the number of variables in the model without reducing model performance.For all models, 50,000 decision trees were grown and candidate variables at each split were set to default.To account for class imbalances, the data were stratified by response type.The importance of each feature in the model was assessed as mean decrease in Gini impurity index, which shows the change in classification accuracy between a model with and without the variable of interest, with the Gini impurity representing the probability that a specific sample will be classified incorrectly when labeled randomly.Model significance was determined after running a permutation test 1000 times using the rf.significance function in the rfUtilities R package [25].Finally, a receiver operating characteristic (ROC) curve was plotted, and the AUC was calculated with the R package pROC.In multicomponent analysis, all datasets were combined in a single RF model.In multicomponent analysis, missing data were replaced by the corresponding variable medians.

Ethics
The study was approved by the West of Scotland Research Ethics Committee (14/WS/1004) and registered at clinicaltrials.gov(NCT02341248).All patients/carers provided informed consent.All authors had access to the study data, and reviewed and approved the final manuscript.

Participant characteristics and clinical parameters
Of the 66 children with active CD, 54 completed EEN (Supplemental Figure 1).For 37 of 54 (69%) patients, FCal was measured at both baseline and EEN completion and these children were included in the current study.At EEN initiation, all patients had FCal levels of >250 μg/g.Of these, 15 of 37 (41%) displayed FCal levels of 250 μg/ g after EEN and were classified as responders.The rest of the patients with FCal levels of >250 μg/g were classified as nonresponders (n ¼ 22/37, 59%).There were no differences in disease characteristics between patients included in this study and those in the complete cohort (Supplemental Table 1).Pretreatment wPCDAI, FCal, anthropometry, disease phenotype, use of immunosuppressants, and routine inflammatory biomarkers in blood were not different between the 15 responders and the 22 nonresponders (Table 1).An RF using participants' characteristics and clinical parameters failed to differentiate between responders and nonresponders (permutation test P ¼ 0.158).

Pretreatment dietary intake in responders and nonresponders
Treatment responders reported a higher baseline intake of confectionery and ice cream products compared with nonresponders [median (IQR), responders: 0.55 portions/1000 kcal/d (Q1: 0.38, Q3: 0.72) compared with nonresponders: 0.19 portions/1000 kcal/d (Q1: 0.01, Q3: 0.38); P ¼ 0.045] (Table 2).No other significant differences were observed in macronutrient or food group intake between the 2 groups, except for a marginally significant (P ¼ 0.09) higher fiber intake in nonresponders (Table 2).An RF model, using the intakes of food groups (portions/1000 kcal) and macronutrients (% of energy intake) as input variables, yielded an accuracy of 67% with sensitivity of 55%, specificity of 75%, positive predictive validity (PPV) of 60%, and negative predictive validity (NPV) of 71%, (permutation test P ¼ 0.006) to predict response (Figure 1).The foods that contributed the most to the model were confectionery, which were higher in responders, and fruit, starch, and fiber, which were all higher in nonresponders.
influential OTUs, which all had a higher abundance in nonresponders than responders, were assigned to Bacteroides, Lachnospiraceae, Ruminococcaceae, and Anaerococcus (Figure 3E).

Multicomponent prediction of responses to EEN
Last, a multicomponent RF model was generated by including the entire study's clinical parameters and omics datasets in the model, except for SCFA measured with GC to avoid replication of the same data measured with 1 H NMR. The final model yielded an accuracy of 78%, sensitivity of 80%, specificity of 77%, PPV of 71%, and NPV of 85% (permutation test P ¼ 0.001) (Figure 4).
The most influential variables in this model were OTUs from Lachnospiraceae, Ruminococcaceae, and Bacteroides, phenylacetate, butyrate, and acetate, all of which were higher in nonresponders.Conversely, OTUs from Acidaminococcus and Collinsella, which were higher in responders, were the most influential variables of response to EEN.The concentration of the 3 differential metabolites (that is, butyrate, acetate, and phenylacetate) correlated significantly with the relative abundances of several differential OTUs from the same model and, most importantly, retained the direction of their effect in predicting responses to EEN (Supplemental Figure 4).

Discussion
The present study found that pretreatment dietary, microbiota and metabolomic gut signatures can predict with high accuracy 8 of the 10 patients who will show an FCal response to EEN; thus, closing the gap routine clinical parameters were unable to fill.Pretreatment butyrate, acetate, and phenylacetate concentrations were at higher levels in nonresponders; almost double those of responders.These findings are  counterintuitive, because butyrate is associated with anti-inflammatory pathways in intestinal mucosa [26], and the levels of several butyrate-producing species have been consistently reported to be reduced in CD [27].It is unlikely the increased pretreatment levels of fecal acetate and butyrate in nonresponders stem from diminished absorption because of more extensive inflammation in colonocytes, as the same strength of association was not observed across all SCFA measured and neither FCal levels nor wPCDAI differed between the 2 study groups.Previous research found a decrease in fecal butyrate during EEN [14,15], but it remains unclear whether this is simply an epiphenomenon of the lack of fiber in EEN feed composition [28], or if it is causally involved in its mechanisms of action [29].In a recent study, fecal butyrate levels associated positively with higher FCal levels during early food reintroduction post-EEN [30] and another research group showed that unfermented β fructans exacerbated inflammation in certain patients with active IBD [31].This unexpected positive relationship between fiber intake, SCFA production, at treatment initiation, and colonic inflammation at completion of EEN requires further investigation.Phenylacetate was also a strong predictor of FCal responses to EEN, both in single and multicomponent models.Phenylacetate is a catabolic product of phenylalanine and other aromatic compounds which is further metabolized by selected species including Escherichia coli.Its exact role in CD has not been described but it might comprise a biomarker of metabolism of certain pathobionts, such as adherent and invasive E. coli, which have been implicated extensively in the pathogenesis of CD [32].The fact that only diet-related bacterial metabolites and organisms which are major fiber fermenters in the gut differentiated between responders and nonresponders, in the current study, further underlines the importance of dietary factors in the management of active CD and possibly its underlying etiology.
Patients with higher microbiota diversity in feces benefited the most from EEN. Reduced bacterial richness is a consistent finding in fecal and mucosal samples of patients with CD, compared with healthy controls [27] and has been correlated with the extent of intestinal inflammation, assessed using FCal, in pediatric CD [33].In contrast, no differences in α diversity metrics were observed between patients who had a >50% decrease in FCal levels and others who did not, in another study [34] suggesting that specific organisms may be more important than crude metrics of community structure.Indeed, in multicomponent  analysis using machine learning, the influence of microbiota richness in predicting response to EEN became less important and certain bacterial taxa became more important predictors.Although microbial signals identified could have prognostic value to screen patients whose gut inflammation will improve with EEN, and therefore personalize treatment options, their role in the primary disease pathogenesis is difficult to decipher within the current study.Randomized controlled trials can address such mechanistic questions where FCal responses to EEN will be measured after dietary or pharmacological manipulation of prognostic microbial species and metabolites identified in the present study, prior to EEN initiation.Based on the findings of this present study a potential intervention might be a diet low in precursors of phenylacetate production, such as a diet low in phenylalanine or protein.Nonetheless, common species previously associated with the pathogenesis of CD [27], such as Akkermansia muciniphila, E. coli, and Veillonella, were not identified as predictors of EEN response in this study.It is also possible that we have observed different subsets of patients with different underlying microbial origins of CD, despite their similar immunity and disease phenotype.One subset of patients in which disease is driven by E. coli pathobionts which produce phenylacetate and in which EEN works by reducing their abundance, and another subset of patients where disease is driven by other butyrate-producing microbes such as Ruminococcaceae, for which EEN works by depleting fiber substrate they require for growth.
The main limitations of the current study include small sample size meaning that for some analyses statistical power may have been low, particularly when the original study was designed to test different primary outcomes, plus the lack of independent replication in larger studies.The results of these studies may also be relevant only to the local Scottish population of children with CD, treated for 8 wk with Modulen IBD.Hence, replication of the current findings in a cohort of patients of different ethnic background and with use of other EEN feeds may be required before generalization of study findings can be made.Using a multiomics approach, we identified pretreatment microbial species and diet-related metabolites associated with improvement in colonic inflammation during EEN.Should these microbial signals be replicated in independent multicenter research, this would open opportunities for personalized nutritional therapy in CD.Important dietrelated metabolites identified here can be measured quickly, noninvasively, and can be further modified with dietary, pharmacological, or other microbiome-modifying treatments prior to an EEN course.

TABLE 2
Pretreatment dietary intake in responders and nonresponders to treatment with exclusive enteral nutrition Random forest classification between responders and nonresponders using the intakes of macronutrients, fiber, and food groups from food frequency questionnaires.Starch was reported as percentage energy intake and the other variables per 1000 kcal/d.The top 15 variables with the highest mean decrease in Gini impurity and ROC curve are shown.Non-RS, fecal calprotectin nonresponders; ROC, receiver operating characteristic; RS, fecal calprotectin responder.
Random forest classification between responders and nonresponders using a multicomponent model with fecal OTUs, 1 H NMR metabolites, Chao1 richness estimate of the fecal microbiome, cytokine, proteomic, diet-related metabolites, dietary intake, disease, and routine clinical datasets.The top 15 variables with the highest mean decrease in Gini impurity and ROC curve are shown.1HNMR, proton nuclear magnetic resonance; non-RS, fecal calprotectin nonresponder; OUT, operational taxonomic unit; ROC, receiver operating characteristic; RS, fecal calprotectin responder.