Novel Biomarkers, Including tcdB PCR Cycle Threshold, for Predicting Recurrent Clostridioides difficile Infection

ABSTRACT Traditional clinical models for predicting recurrent Clostridioides difficile infection do not perform well, likely owing to the complex host-pathogen interactions involved. Accurate risk stratification using novel biomarkers could help prevent recurrence by improving underutilization of effective therapies (i.e., fecal transplant, fidaxomicin, bezlotoxumab). We used a biorepository of 257 hospitalized patients with 24 features collected at diagnosis, including 17 plasma cytokines, total/neutralizing anti-toxin B IgG, stool toxins, and PCR cycle threshold (CT) (a proxy for stool organism burden). The best set of predictors for recurrent infection was selected by Bayesian model averaging for inclusion in a final Bayesian logistic regression model. We then used a large PCR-only data set to confirm the finding that PCR CT predicts recurrence-free survival using Cox proportional hazards regression. The top model-averaged features were (probabilities of >0.05, greatest to least): interleukin 6 (IL-6), PCR CT, endothelial growth factor, IL-8, eotaxin, IL-10, hepatocyte growth factor, and IL-4. The accuracy of the final model was 0.88. Among 1,660 cases with PCR-only data, cycle threshold was significantly associated with recurrence-free survival (hazard ratio, 0.95; P < 0.005). Certain biomarkers associated with C. difficile infection severity were especially important for predicting recurrence; PCR CT and markers of type 2 immunity (endothelial growth factor [EGF], eotaxin) emerged as positive predictors of recurrence, while type 17 immune markers (IL-6, IL-8) were negative predictors. In addition to novel serum biomarkers (particularly, IL-6, EGF, and IL-8), the readily available PCR CT may be critical to augment underperforming clinical models for C. difficile recurrence.

A major challenge of Clostridioides difficile infection (CDI) is its tendency to reinfect despite antibiotic treatment. C. difficile colonizes the gut by producing spores that survive treatment and serve as a reservoir for recurrent growth of vegetative cells and reinfection, often related to microbiota disruption from anti-C. difficile or other antibiotics (1). Most recurrent CDI episodes are due to the previous strain (2)(3)(4) and typically occur within 2 to 8 weeks after the initial infection. Recurrence risk following an initial CDI episode is approximately 20% (5), a risk which increases upon subsequent infections (6), resulting in approximately 170,000 recurrent episodes of C. difficile infection annually in the United States (7).
Several new treatments are available that effectively prevent recurrent CDI, namely, fidaxomicin (8), bezlotoxumab (9), and fecal microbiota transplant (FMT) (10). Unfortunately, uptake of these interventions is poor (11) despite their adoption by consensus guidelines for CDI management (12), likely owing to their high costs (e.g., bezlotoxumab costs .$4,000 per vial [13] and the cost of fidaxomicin is 3Â that of vancomycin [14]) and the lack of welldefined risk strata for recurrent C. difficile infection.
Accurately predicting which C. difficile infections will recur at the time of infection using conventional clinical factors has proven difficult. Existing clinical-only tools that attempt to stratify recurrence risk among CDI patients perform poorly on external validation, including at least 2 models that performed worse than chance (i.e., area under the receiver operating characteristic curve [AUROC], 0.42 to 0.43) (15).
Increasing evidence suggests that the host immune response is a critical factor that dictates future outcomes in CDI. For example, effective T-helper type 17 immune cells (defined by their interleukin 17  production) are required for the development of immunity against recurrent C. difficile infection (16). In addition, neutrophil-mediated inflammation, promoted by IL-8, is considered deleterious, while type 2 (or eosinophil-mediated) immunity is protective against severe C. difficile infection. Similarly, the C. difficile stool organism burden (which can be inversely approximated using the quantitative tcdB gene PCR cycle threshold [C T ] [17]; i.e., low C T equals a high burden) has shown mixed associations with C. difficile severity (18); however, the utility of PCR C T for predicting recurrent C. difficile infection is unknown. In this study, we examined representative markers of C. difficile pathogenesis, including immune cytokines, anti-toxin antibodies, PCR C T (as a marker of stool organism burden), and stool toxins (A/B and binary toxin) (19) for predicting recurrent C. difficile infection using Bayesian machine learning.

RESULTS
Biomarker data are summarized in Table 1, before preprocessing, with sample means separated based on patients who died, developed recurrent infection, or survived without recurrence. The posterior marginal probability density functions for the means of individual predictors, separated by 90-day outcome (recurrent CDI, CDI- a Sample means are shown for raw (unscaled) parameter data in the subset of 163 observations with all available features, separated by patients who died (n = 15), developed recurrent infection (n = 21), or survived without recurrence (n = 127). Two-sided 95% confidence intervals (CIs) for the means were calculated using the Student's t-distribution. For purposes of this analysis, 3 patients who both recurred and died within 90 days were factored as having died. Stool toxins and toxin B neutralization capacity were factored as 0 (negative) or 1 (positive), so their means represent the proportions of patients with positive results. WBC, white blood cell count; MIF, macrophage migration inhibitory factor; EGF, endothelial growth factor; HGF, hepatocyte growth factor; TNF-a, tumor necrosis factor-a. associated mortality, recurrence-free survival) are shown in Fig. 1. For distributions that significantly overlapped (e.g., soluble ST-2 receptor [sST-2], IL-1b, anti-toxin B IgG), those variables by themselves would not useful for predicting that outcome. Lower age, PCR C T , IL-6, IL-8, and IL-17A and higher IL-16, EGF, and CCL-5 appeared to provide the best univariate class separation for recurrent infection (versus recurrence-free survival and death).
The top 8 features, each with a probability of $0.05 based on model averaging, were used to train the final Bayesian logistic regression model using all 257 observations, including 32 recurrent CDI events (graphical model shown in Fig. S1 in the supplemental material). means of all variables (following standardization), given recurrence-free survival, death, or recurrent CDI within 90 days. We assume here that individual variables have Gaussian (normal) distributions with unknown means. Probabilities for the sample mean were then calculated over a range of possible values for each feature and class using Bayes' theorem and plotted to represent a probability distribution, with its integral (or area under the curve) approximately equal to 1. The y axis can be interpreted as the relative likelihood that the mean of the sample was equal to the corresponding x value. For purposes of this analysis, 3 patients who both recurred and died within 90 days were factored as having died.
Markov chain Monte Carlo (MCMC) sampling (Metropolis-Hastings algorithm) was implemented to sample the posterior distributions of the coefficients using 30,000 samples, discarding the initial 15,000 samples as burn-in. Good convergence was seen with all coefficients (see trace plots in Fig. S2). The results of the final fitted Bayesian multivariable logistic regression model are shown in Table 3. Increased levels of IL-6, PCR C T , IL-8, HGF, and lower EGF, eotaxin, IL-10, and IL-4 were independently associated with lower adjusted odds of recurrent CDI. However, no singular feature was a statistically significant independent predictor in the presence of the other features. Sampling estimates of the posterior probabilities of individual patients are shown in Fig. 2. The AUROC of the final model was 0.66 (the ROC curve is shown in Fig. S3). To visualize the isolated effects of individual biomarker fluctuations on the risk of recurrence, we plotted posterior predictive regression lines for each biomarker while holding the other predictors constant at their standardized means (Fig. 3).
For the PCR C T analysis, archived PCR data were available for 1,660 (of 2,126 total) hospitalized cases of C. difficile infection that occurred between November 2013 and April 2021 among 1,412 individual patients, of whom 250 (15.1%) suffered a recurrent episode of infection within 180 days. The PCR C T measurements ranged from 17.7 to  37.0 cycles. A Kaplan-Meier curve depicting recurrent C. difficile infections grouped by PCR C T quartiles is shown in Fig. 4. According to a univariate Cox regression, the hazard ratio for PCR C T was 0.95 (interpreted as a 5% relative lower risk of recurrent infection for each 1 standardized unit increase in PCR C T ), which was statistically significant (95% confidence interval, 0.93 to 0.98; P , 0.005). The AUROC for PCR C T alone for recurrent CDI within 90 days was 0.60 ( Fig. S4). At the optimal Youden cutoff of #27.2 cycles, PCR C T achieved a sensitivity of 0.70, a negative predictive value of 0.89, a specificity of 0.43, and a positive predictive value of 0.18.

DISCUSSION
The eight features identified as important by model averaging for predicting recurrent CDI can be classified into three groups: bacterial burden (PCR C T ), type 17 immunity (IL-6, IL-8, IL-10), and type 2 immunity (EGF, eotaxin, HGF, IL-4). Considering the parameters with the largest effects on the posterior probability of recurrent infection (i.e., IL-6, PCR C T , EGF, IL-8), bacterial burden and type 2 immunity clearly emerged as positive predictors and type 17 immunity as a negative predictor of recurrent CDI. This is in contrast to our understanding that type 17 immunity is deleterious (20, 21) and type 2 immunity protective (22)(23)(24) against severe disease and C. difficile-associated mortality.
CDI recurrence is thought to represent a disruption in the process of inflammation, intestinal damage/repair, bacterial clearance, and the return to homeostasis. We hypothesize that type 17 effectors function as a double-edged sword, clearing C. difficile infection and reducing the risk of recurrence at the price of damaging intestinal tissue and contributing to disease severity (25). We also hypothesize that type 2 immunity, generally considered anti-inflammatory and a driver of epithelial healing (26,27), may be reducing early gut damage but at the cost of not fully clearing the infection and increasing the opportunity for recurrence. Although we showed that early (within 48 h of diagnosis) type 2 immunity portends future recurrence, late type 2 activity is characteristic of a healed gut (27), with limited probability of recurrence (e.g., increased type 2 activity was shown 60 days postsuccessful FMT) (28).
Interestingly, the second highest probability factor for predicting recurrent CDI was not an immune biomarker, but the PCR C T . The negative coefficient for PCR C T suggests that a higher stool organism burden is associated with a higher risk of recurrence. Using a much larger, PCR-only data set, we found that C T performed remarkably well as a sole predictor of recurrent infection, possibly as a downstream indication of the balance between type 17 and type 2 immunity. PCR C T information is not widely accepted for use in the context of CDI management (12) but may be clinically underutilized. To date, low C. difficile PCR C T (reflecting high stool organism genomic equivalents) has been shown to predict toxin A/B enzyme immunoassay (EIA) (29) and cell cytotoxicity neutralization positivity (30), increased C. difficile-associated diarrhea/pain (31), longer duration of diarrhea (32), hypervirulent ribotype 027 (33), and disease severity/mortality (33,34) (although other studies have failed to correlate C T with disease severity [35,36]). Garvey et al. previously reported an association between low C T and treatment failure requiring a change of therapy or recurrent infection within 30 days (37), but to our knowledge, PCR C T has not been appreciated as an important predictor of recurrent infection. Quantitative PCR is used by .70% of U.S. hospitals (38), and because C T values are calculated for all clinical PCR results at CDI diagnosis, C T could theoretically be employed relatively easily by itself or alongside other markers to better define risk. Other traditional markers for severe CDI (age, white blood cell count [WBC], stool toxin A/B positivity [39]) were found to be unimportant for recurrence alongside the other biomarkers. We also showed that the presence of stool binary toxin (which is produced by 027 and other hypervirulent strains) was not associated with recurrent infection; this observation runs counter to one theory that increasing rates of recurrence are related to the emergence of hypervirulent strains (6).
C. difficile toxin B alone has emerged as the main virulence factor in CDI. Previous studies have demonstrated that patients who lack a robust anti-toxin B IgG response following infection are more likely to develop recurrent disease (40)(41)(42); however, these findings have not always been consistent, likely due to the heterogeneity of methods and early versus late timing when measuring humoral immunity (43,44). We showed that the early (within 48 h) anti-toxin B IgG response (perhaps reflecting prior colonization or more advanced infection) was nonpredictive for recurrence, but it may be a strictly late marker of protection. Similarly, bezlotoxumab (a monoclonal antibody against toxin B) did not significantly affect early outcomes (symptom duration, severity, mortality) in a randomized placebo-controlled trial (mean time to antibody, 3 days from start of treatment) (45), and subsequent trials failed to demonstrate consistent benefits of anti-toxin antibodies in treating acute infection (46).
Unlike predicting CDI severity, where a multitude of clinical severity scoring systems exist with reasonable performance, such as ATLAS (47,48), there are currently no wellestablished tools to predict the recurrence of C. difficile infection. We used a naive Bayesian approach to evaluate a rich set of novel predictors shown to be important in CDI pathogenesis (19,20). The performance of our biomarker-based model for C. difficile recurrence was comparable to the best-performing clinical-based model described by D'Agostino et al. (AUROC, 0.64) (49); however, there is clearly room for further improvement. Bayesian model averaging allowed us to objectively quantify the probabilities of individual predictors to include in our final model. One issue with traditional (frequentist) methods of prediction is that individual point estimates of risk can be unknowingly overconfident. We used Bayesian inference because it allows researchers and clinicians to better account for uncertainty when forecasting C. difficile recurrence. Credible intervals of our posterior estimates varied considerably from patient to patient, highlighting the importance of a probabilistic approach to understanding patient outcomes.
This study has limitations. Our list of measured serum cytokines was chosen based on our knowledge of C. difficile immunopathogenesis but is not exhaustive. In the Bayesian logistic regression model, no single predictor was statistically significant in the presence of the other predictors, suggesting that all the biomarkers must be present or that there was an insufficient sample size. The rates of recurrent C. difficile infection in both cohorts (32/257 [12.8%], biorepository; 250/1,660 [15.1%], PCR only) were relatively low compared with some reports in the literature, which could be explained by incomplete capture of recurrent cases (which required patients to be diagnosed within the same health system laboratory). It is also possible that some recurrent CDI episodes detected by PCR were in fact reflective of asymptomatic C. difficile colonization (which ranges from 3% to 21% in the general population [50]). While our institution has a history of robust diagnostic stewardship and clinical decision support to reduce inappropriate C. difficile testing that could help to minimize this issue (51,52), it remains an important caveat. Missing data were primarily due to insufficient discarded stool or sera (or C T data that failed to archive, in the case of the PCR analysis), which was presumably introduced randomly; however, missing data bias could not be excluded. Finally, all performance measures were within-sample and do not guarantee generalizability; further studies are needed to validate PCR C T alone and alongside other novel biomarkers before they can be adopted for routine clinical use.

MATERIALS AND METHODS
Stool and cytokine measurements. Details of the UVA CDI biorepository and serum cytokine and stool toxin assays have been previously published (19,20). At the University of Virginia Health (645-bed, tertiary-care academic hospital) between 2013 and 2016, discarded stool and sera were collected from hospitalized adult patients within 48 h of a diagnosis of CDI, excluding those diagnosed with CDI in the preceding 90 days. Serum cytokine measurements were obtained using Luminex assays (R&D Systems, Minneapolis, MN), with the exceptions of sST-2, IL-23, and IL-17A, which were measured using an enzyme-linked immunosorbent assay (ELISA). Stool toxin A/B and binary toxin were detected using qualitative ELISA (TechLab, Blacksburg, VA). Basic clinical, PCR, and outcome data (i.e., age, white blood cell count, mortality, recurrence) were collected retrospectively from the electronic medical record.
Anti-toxin B humoral immunity. Anti-toxin B antibodies were quantified using ELISA, whereby 96well plates were coated overnight with 1 mg/mL of purified toxin B (TechLab). The plates were incubated and washed, and goat anti-human IgG and Fcg fragment specific antibody (Jackson Laboratory; 109-035-098) were added, followed by TMB substrate (Thermo Scientific; 34028). Optical densities were read at 450 nm, and endpoint titers were interpolated using a cutoff optical density of 0.1. The antitoxin neutralization capacity was determined using a cell cytotoxicity neutralization assay. Serially diluted sera (1/ 40 to 1/2,560) were incubated with purified toxin B (TechLab; diluted to 1 ng/mL) for 1 h at 37°C; the mixtures were then incubated with CHO-K1 cells (ATCC CCL-61) for 24 h and inspected by light microscopy to determine cytopathic effects. Neutralization capacity was determined if any serum dilution protected 100% of CHO-K1 cells.
Bayesian logistic regression. PCR C T and recurrence-free survival analysis. For the PCR-only analysis, all inpatient qualitative PCR results coupled with available quantitative archived cycle threshold values were collected from the clinical microbiology laboratory GeneXpert PCR machine (Cepheid, Sunnyvale, CA) between November 2013 and April 2021 (in addition to follow-up positive recurrent CDI results within at least 180 days). Hospitalized index CDI episodes were defined as the first available positive PCR result accompanied by anti-C. difficile treatment (with oral vancomycin, metronidazole, or fidaxomicin) within the ensuing 10 days, and recurrent infection was defined as a repeat positive PCR result occurring .10 days after the index infection. Hospital medication administration and mortality data were collected from the electronic medical record. A Cox proportional hazards regression model was developed to measure the effect of PCR C T on days to recurrent CDI, whereby death was treated as a censoring event. Analyses were performed using Python version 3.9.13 and the following packages: Statsmodels (Bayesian model averaging) (53), GraphViz (graphical model) (54), pymc3 (MCMC sampling) (55), Scikit-learn (performance metrics, ROC analysis) (56), and lifelines (Cox regression, Kaplan Meier analysis) (57).
Ethics statement. Collection of patient samples (biorepository cohort) and analysis of clinical data (including from the C T data cohort) were approved by the University of Virginia Institutional Review Board (IRB-HSR 16926 and 20082, respectively), with waivers of informed consent.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 1 MB.
diagnostic tests for C. difficile toxins. All other authors report no conflicts of interest relevant to this article.