Differences in outcome according to Clostridium difficile testing method: a prospective multicentre diagnostic validation study of C difficile infection

Summary Background Diagnosis of Clostridium difficile infection is controversial because of many laboratory methods, compounded by two reference methods. Cytotoxigenic culture detects toxigenic C difficile and gives a positive result more frequently (eg, because of colonisation, which means that individuals can have the bacterium but no free toxin) than does the cytotoxin assay, which detects preformed toxin in faeces. We aimed to validate the reference methods according to clinical outcomes and to derive an optimum laboratory diagnostic algorithm for C difficile infection. Methods In this prospective, multicentre study, we did cytotoxigenic culture and cytotoxin assays on 12 420 faecal samples in four UK laboratories. We also performed tests that represent the three main targets for C difficile detection: bacterium (glutamate dehydrogenase), toxins, or toxin genes. We used routine blood test results, length of hospital stay, and 30-day mortality to clinically validate the reference methods. Data were categorised by reference method result: group 1, cytotoxin assay positive; group 2, cytotoxigenic culture positive and cytotoxin assay negative; and group 3, both reference methods negative. Findings Clinical and reference assay data were available for 6522 inpatient episodes. On univariate analysis, mortality was significantly higher in group 1 than in group 2 (72/435 [16·6%] vs 20/207 [9·7%], p=0·044) and in group 3 (503/5880 [8·6%], p<0·001), but not in group 2 compared with group 3 (p=0·4). A multivariate analysis accounting for potential confounders confirmed the mortality differences between groups 1 and 3 (OR 1·61, 95% CI 1·12–2·31). Multistage algorithms performed better than did standalone assays. Interpretation We noted no increase in mortality when toxigenic C difficile alone was present. Toxin (cytotoxin assay) positivity correlated with clinical outcome, and so this reference method best defines true cases of C difficile infection. A new diagnostic category of potential C difficile excretor (cytotoxigenic culture positive but cytotoxin assay negative) could be used to characterise patients with diarrhoea that is probably not due to C difficile infection, but who can cause cross-infection. Funding Department of Health and Health Protection Agency, UK.


PCR-ribotyping
PCR-ribotyping was performed on all isolates following the Clostridium difficile Ribotyping Network of England and Northern Ireland protocol 3 .

Statistical Methods
For sample size calculations for the laboratory assessment we assumed testing algorithm sensitivity and specificity of 90% and 99·5%, respectively, with 4·5% of samples positive; thus, 8,000-10,000 specimens will estimate sensitivity within 3% and specificity within 0·2%.
The rationale is that for a randomly sampled (negative, positive) reference test pair, the AUROC is the probability that the test (t) ranks a true positive as more likely infected than a true negative (i.e. tp > tn). For each test algorithm we used 1000 bootstrap samples for the AUROC, estimated via randomly selected record pairs, (tp, tn); the proportion of pairs where tp > tn is the AUROC. For testing the significance of difference between two AUROCs we used the distributional form of their difference coming from 2000 bootstrap samples. The Boostrap sample size of 1000 was chosen for consistency in the estimates with a standard error within 0·1% of the estimate

Additional results
The toxin EIA 1 and Xpert assays were not used first line in the testing phase, and so represent a smaller and partially selected dataset (table 1); the toxin EIA 1 assay was used during the testing phase at one site (n=2558) as this was the routine test there. In the training phase (n=6753) 389, 559 and 704 samples were CTA, CC and NAAT positive.
Episodes with missing clinical or death data are more likely to be female, older and have been in hospital for longer (p<0.0001): median age 74 (missing) vs 68 years of age (with data); 62% female (missing) vs 53% (with data), median los (at testing) 6 days (missing) 5 days (with data).
As some patients were tested more than once, we needed to check for within-patient correlation in the results. We did this by means of multi-level analysis through a logistic regression model where the outcome was either one of the gold standard tests (cytotoxigenic culture or cytotoxin test) and found that the random effects model invariably fitted the data significantly better than the random effects model, with large intraclass correlation coefficients (ICC). We compared negative twice the difference in log-likelihoods between models against the Chi-square distribution to assess the fit to the data, as advocated by Twisk 4 . When we used the deduplicated data set, where samples for the same episode were removed, the ICC became non-significant. Repeating the analyses using just the deduplicated set of episodes (within a 28 day window), did not lead to important changes in the results, although the standard errors were slightly larger". A "real world" clinical laboratory will be using multiple samples for each patient and this was an additional reason for keeping the main table in the manuscript showing the results for the full samples as received by the laboratory.

Inter-laboratory variation
The monthly quality assurance samples yielded no discordant results between laboratories. There were variations in performance of each assay across the laboratories during the training phase shown by AUROC analysis (table 4). The largest inter-site variation was seen with the toxin EIA 1 (coefficient of variation = 8·08% and 7·35% for CTA and CC, respectively). The correlation between the positivity rates of each assay over the time of the study was fairly consistent (figure 4). Time series plots showing the positivity rate of each assay during the study by site showed the same stacking pattern of the assays and confirmed that intra-and inter-site variability mirrored that seen in the study as a whole (data not shown).
Previous studies have often been single centre and so have been unable to determine inter-laboratory variation in performance between commercial assays and have been subject to variable strain distributions, which may introduce bias. Inter-site variation in this study was demonstrated by AUROC analysis (table 4). Indeed, if performed only at St George's (n= 1593) for example, toxin EIA 1 had the highest AUROC of all single assays in the testing phase compared with CTA, in contrast with overall results. The reasons for this variability are manifold, and could include the prevalence of PCR-ribotypes 5 . However, only ten PCR-ribotypes represented 63% of all study isolates, and PCR-ribotypes 014 and or 015 were found amongst the three most common types at each site. Comparing positivity rates for each assay at each centre showed the same trends in variability between sites. This indicates that intrinsic assay factors are likely to be affecting performance.