Comparison of metagenomic and traditional methods for diagnosis of E. coli enteric infections

ABSTRACT Diarrheagenic Escherichia coli, collectively known as DEC, is a leading cause of diarrhea, particularly in children in low- and middle-income countries. Diagnosing infections caused by different DEC pathotypes traditionally relies on the cultivation and identification of virulence genes, a resource-intensive and error-prone process. Here, we compared culture-based DEC identification with shotgun metagenomic sequencing of whole stool using 35 randomly drawn samples from a cohort of diarrhea-afflicted patients. Metagenomic sequencing detected the cultured isolates in 97% of samples, revealing, overall, reliable detection by this approach. Genome binning yielded high-quality E. coli metagenome-assembled genomes (MAGs) for 13 samples, and we observed that the MAG did not carry the diagnostic DEC virulence genes of the corresponding isolate in 60% of these samples. Specifically, two distinct scenarios were observed: diffusely adherent E. coli (DAEC) isolates without corresponding DAEC MAGs appeared to be relatively rare members of the microbiome, which was further corroborated by quantitative PCR (qPCR), and thus unlikely to represent the etiological agent in 3 of the 13 samples (~23%). In contrast, ETEC virulence genes were located on plasmids and largely escaped binning in associated MAGs despite being prevalent in the sample (5/13 samples or ~38%), revealing limitations of the metagenomic approach. These results provide important insights for diagnosing DEC infections and demonstrate how metagenomic methods can complement isolation efforts and PCR for pathogen identification and population abundance. IMPORTANCE Diagnosing enteric infections based on traditional methods involving isolation and PCR can be erroneous due to isolation and other biases, e.g., the most abundant pathogen may not be recovered on isolation media. By employing shotgun metagenomics together with traditional methods on the same stool samples, we show that mixed infections caused by multiple pathogens are much more frequent than traditional methods indicate in the case of acute diarrhea. Further, in at least 8.5% of the total samples examined, the metagenomic approach reliably identified a different pathogen than the traditional approach. Therefore, our results provide a methodology to complement existing methods for enteric infection diagnostics with cutting-edge, culture-independent metagenomic techniques, and highlight the strengths and limitations of each approach.


Gene
Primer sequence 5 -3' Size  Pathogen genes were considered detected when sequencing depth was above 0.1 for short read mapping.If at least half of the pathogenic genes of a single type were identified but coverages for these were below 0.1, or if less than half were below 0.1 with some reads at 0.1 and above (mixed), the pathogen was considered detected but low coverage and have a "LC" superscript maker (see table S4 for numbers, 39).Single genes with coverages less than 0.1 for any pathotype were confirmed with recruitment plots and calls were made based on coverage.For DAEC detection in the metagenome, 3 or more genes were required to be recovered to be considered "detected".In the isolate, DAEC was considered detected if there were more than 3 genes recovered at an average depth of 0.1 or above.qPCR metrics for each pathotype tested per sample are also shown where available.Cells with "NT" indicate the marker was not tested for, "ND" indicates the marker was not detected via qPCR.

Supplementary Table 4-Read coverages of pathogenic genes from isolates and metagenomes
Comparison of pathogenic gene recovery at the read level of isolate and metagenome pairs.Trimmed short read sequences of isolate WGS and metagenomic shotgun sequencing were blasted against the pathogen gene reference sequence files using Magic Blast, then filtered using custom Python scripts (see methods).All recovered genes are represented in the table at untruncated depth, with "-" indicating no recovery at the read level.Read detection was done using same criteria as described in Table S2.
In one case, isolate B68_1 had a single DAEC gene recovered at above a depth of 0.1 but this was not considered as "detected" since more than 3 genes were required.This, and other DAEC cases where the detection criteria were not met, are marked with asterisks.See Table S2

Table 3 -
Supplementary Table2-pathogenic diarrheagenic E. coli genes used to assign pathotypes in the bioinformatic analyses E. coli isolate PCR, isolate WGS, and metagenome sequence pathotype identification, and qPCR results in copies/ng DNA for the 35 samples used for full comparisons.

coli isolate ID MGID Isolate pathotype PCR result Pathotype of pathogen genes mapped to isolate assembly Pathotype of isolate short reads mapped to pathogen genes Pathotype of metagenome short reads mapped to pathogen genes
E.

Metagenome read coverage of pathogen genes Isolate read coverage of pathogen genes SupplementaryTable 6 :
for details on pathogen genes.Comparison of isolate genomes and MAGs in the 13 samples with both diarrheal E. coli isolates and high-quality E. coli MAGs.Isolate pathotype, metagenome pathotype based on read mapping, and MAG pathotype provided for comparison.ANI distances between MAG and isolate indicate relatedness between MAGisolate pairs (% ANI).Pathogen gene coverages in the metagenome and isolate are also provided for comparison.The rpoB clonality section shows relatedness based on rpoB of MAGs and isolates.100% rpoB ANI between MAG an isolate indicates fully clonal pairs.The "CRR" labels in the final two columns stands for "competitive read recruitment" and indicates the coverages of MAG-isolate pairs in the metagenome from a competitive read recruitment assay.

Summary of rpoB clonality and abundance between MAG and isolate Summary of pathotype analysis and abundance between MAG and isolate Supplementary table 7
-Primers and annealing temperatures used for qPCR assays and their source references.

table 8 -
Summary of qPCR assay performance.