Integrated Meta-omics Reveals a Fungus-Associated Bacteriome and Distinct Functional Pathways in Clostridioides difficile Infection

Our data suggest a potential role for fungi in the most common nosocomial bacterial infection in the United States, introducing the concept of a transkingdom interaction between bacteria and fungi in this disease. We also provide the first direct measure of microbial community function in Clostridioides difficile infection using patient-derived tissue samples, revealing antibiotic-independent mechanisms by which C. difficile infection may resist a return to a healthy gut microbiome.

antibiotic-associated diarrhea due to other causes (1)(2)(3)(4). Our group has previously observed (5) that CDI is associated with the enrichment of microbial taxa with both mucinolytic activity as well as with the capacity to produce xenobiotic compounds with possible antibiotic-like effects, all of which may perpetuate the dysbiotic state and, thus, CDI itself. Based on work focused on cooccurrence network modeling, our group has also observed (5) a possible transkingdom interaction in CDI between fungal and bacterial consortia, while CDI-negative control patients with diarrhea demonstrated virtually no fungal enrichment. Additionally, predictive metagenomics tools revealed the enrichment of pathways in CDI associated with lipopolysaccharide and arachidonic acid synthesis as potential proinflammatory mediators, as well as revealing enrichment of pathways associated with glycan and xenobiotic biosynthesis, further explaining why C. difficile is selectively advantaged to exploit new environmental niches created through compositional and functional perturbations introduced in dysbiotic states.
Previous studies on CDI have generally been restricted to the amplification of small-subunit RNA (16S rRNA gene) to identify the bacterial communities enriched in CDI. Utilization of internal transcribed spacer (ITS) rRNA sequencing as a target for fungal community profiling is usually omitted from these studies, leading to a paucity of data regarding the role of the mycobiome in CDI (6). In addition, to date there has been limited application of metagenomics (MG) and no application of metatranscriptomics (MT) to the study of CDI in humans, in part due to their high cost. MG allows for the sequencing of genes beyond 16S rRNA, enabling a description of the underlying genetic potential of a given microbial community, while MT analysis reveals functionally active bacteria within an ecosystem. When combined, these techniques identify the organisms, and their activities, most important to a disease state (7).
In this study, we provide the first published merged omics data comparing inpatients with diarrhea, with and without CDI, by using whole-metagenome shotgun sequencing and metatranscriptomics. We discuss several novel functional pathways by which the CDI disease state may persist, independent of issues of antibiotic resistance. Additionally, we examine a role for fungal organisms in CDI.

RESULTS
Description of study population. The mean age of CDI patients was 65.3 Ϯ 17 years, while non-CDI subjects had a mean age of 60 Ϯ 18 years (P ϭ 0.32 by t test). There was no difference in gender or the mean number of chronic comorbidities harbored by patients in either cohort (P Ͼ 0.05), with the most common chronic medical conditions being systemic hypertension, non-insulin-dependent diabetes mellitus, and coronary artery disease. There was also no difference in the incidence of antibiotic use prior to stool samples being collected, either in terms of presence or absence of antibiotics or in terms of number of antibiotics (P Ͼ 0.05). Metadata can be found in Fig. S1 in the supplemental material. Bacterial and fungal community profiling is described in the supplemental material (Text S2).
Insignificant differences in fungal and bacterial species richness and evenness observed between CDI ؉ and CDI ؊ cohorts. Considering the 16S data set, bacterial community species richness and evenness was not found to be significantly different between CDI ϩ and CDI Ϫ cohorts (observed richness, P ϭ 0.076; Heip's evenness, P ϭ 0.841) despite an observed lower average species richness within CDI ϩ samples (CDI ϩ observed species mean, 93.408 Ϯ 23.12; CDI Ϫ observed species mean, 121.76 Ϯ 58. 26). A significant decrease in alpha diversity within CDI ϩ samples was observed using the PD whole-tree metric (P ϭ 0.015). Fungal community species richness and evenness were not significantly different between CDI ϩ and CDI Ϫ cohorts (observed richness, P ϭ 0.26; Heip's evenness, P ϭ 0.112). A summary of alpha diversity comparison statistics can be found in Table 1.
Significantly different bacterial and fungal community structures exist within CDI ؉ and CDI ؊ individuals. Principal coordinate analysis (PCoA) plots revealed significant differences in bacterial and fungal community composition between CDI ϩ and CDI Ϫ cohorts. This is demonstrated by distinct clustering between the disease states ( Fig. 1A and B) and was significant considering both 16S and ITS data sets (16S analysis of similarity [ANOSIM], P ϭ 0.022; ITS ANOSIM, P ϭ 0.038). Multivariate association with linear model (MaAsLin) analysis of bacterial abundance data is summarized in Table 2 and revealed a total of seven significantly enriched bacterial biomarkers and no significantly enriched fungal biomarker within the CDI Ϫ cohort (Tables 2 and 3). Within the CDI ϩ cohort, nine bacterial and two fungal significantly enriched biomarker taxa were identified. Bacterial taxa, including Faecalibacterium and Collinsella, were enriched within the CDI Ϫ cohort. Bacterial taxa within the Clostridiaceae, Peptostreptoccocaceae, and Enterococcus were identified as significantly enriched biomarkers within the CDI ϩ cohort. We identified the fungal genera Byssochlamys and Helotiales as significantly enriched within the CDI ϩ cohort.
Network analysis reveals negative interactions between fungi and commensal, butyrate-producing bacteria within the gut of CDI ؉ individuals. Bipartite networks for CDI ϩ individuals displayed negative cooccurring relationships between fungi and commensal gut bacteria (Fig. 2). Candida and Byssochlamys were present at relatively high abundances within CDI ϩ individuals and formed strong negative relationships with several bacterial taxa, including Coprococcus, Blautia, and Comamonadaceae, as well as an unassigned fungus. The same unassigned fungus also had a strong positive  S2). A single positive interaction between an unassigned fungus and Bacteroides was observed within the CDI Ϫ cohort and marks the only transkingdom relationship within the network. Also, there is a positive interaction between two Candida species that do not have any relationship with any of the commensal gut bacteria. Metatranscriptomic data reveal greater differentiation between CDI ؉ and CDI ؊ individuals than did metagenomic comparisons. To visualize overall differences in metagenome and metatranscriptome functional gene profiles between disease states, partial least-squares discriminant analysis (PLS-DA) was conducted. Counts per million (CPM)-normalized reads per kilobase KEGG orthology (KO) counts from respective metagenome and metatranscriptome data sets were used to create a PLS-DA model comparing CDI status, which resulted in clustering between disease states for both data sets ( Fig. 3A and B). Greater differentiation between CDI ϩ and CDI Ϫ cohorts was observed within the metatranscriptome PLS-DA model than in the model generated based on metagenome data. The metatranscriptomic PLS-DA model yielded a superior area under the receiver operating characteristic curve (AUROC) measure (AUROC of 1.0) than the metagenomic data set (AUROC of 0.93), indicating an increased ability for the model to differentiate between CDI ϩ and CDI Ϫ states when considering metatranscriptome expression data compared to metagenomic gene abundance information.
Biomarker analysis reveals increased expression of biofilm formation and quorum-sensing gene pathways within CDI ؉ individuals. Linear discriminant analysis effect size (LEfSe) analysis of metagenomic abundance data and metatranscriptomic expression data revealed bacterial and functional gene pathway biomarkers within CDI ϩ and CDI Ϫ individuals. Considering the metagenomic data set, ten significantly enriched (linear discriminant analysis [LDA] Ͼ 2.0; P Ͻ 0.05) biomarker taxa were identified within the CDI ϩ cohort, including Clostridioides difficile, Escherichia coli, an unclassified Peptostreptococcaceae member, and the Enterobacteriaceae (Fig. S3). Akkermansia munciniphila, Faecalibacterium prausnitzii, Coprococcus, Alistipes shahii, Collinsella, and the Verrucomicrobiaceae were significantly enriched within the CDI Ϫ co-  hort. Comparable results between the 16S comparison and increased taxonomic resolution were observed in the metagenomic LEfSe analysis between CDI ϩ and CDI Ϫ individuals. For example, within the 16S data set, we classified the enrichment of Clostridioides difficile with confidence at the Peptostreptococacceae taxonomic rank, whereas within the metagenomic data set we are able to obtain species-level classification of Clostridioides difficile enrichment. The Faecalibacterium result was confirmed by shotgun metagenomics as Faecalibacterium prausnitzii and is enriched within the CDI Ϫ cohort for both 16S and metagenomic data sets. The metatranscriptome data set yielded 10 and six active biomarker taxa within the CDI ϩ and CDI Ϫ cohorts, respectively (Fig. S4). Escherichia coli, Clostridium clostridioforme, and the Enterobacteriaceae were among the enriched active bacteria within the CDI ϩ cohort (LDA Ͼ 2.0; P Ͻ 0.05). Within the CDI Ϫ cohort, the only active taxa identified as significantly enriched belonged to the Verrucomicrobia phylum.
To define functional pathways driving shifts in metagenome profiles and metatranscriptome expression data between CDI ϩ and CDI Ϫ cohorts, MaAsLin analysis was conducted on summarized KEGG expression data to identify associations between enriched bacterial and functional gene pathways and collected patient metadata. While considering the metagenome data set, 31 significantly enriched summarized functional KEGG pathways were identified within CDI ϩ and CDI Ϫ cohorts (Table 4 and Fig. S5). Phosphotransferase systems (2.16-fold increase in CDI ϩ individuals), two-component systems (1.43-fold increase in CDI ϩ individuals), flagellar assembly (2.52-fold increase in CDI ϩ individuals), ABC transporters, and bacterial chemotaxis were the top five level 3 gene pathway biomarkers of CDI ϩ status within the metagenome data set. N-glycan biosynthesis, carbon fixation pathways in prokaryotes, aminoacyl tRNA biosynthesis (1.28-fold increase in CDI ϩ individuals), citrate cycle-tricarboxylic acid (TCA) cycle, and protein processing in endoplasmic reticulum were observed as the top five biomarker pathways of CDI Ϫ individuals.
Within the metatranscriptome data set, we identified 29 enriched KEGG pathways ( Table 5). ABC transporters (1.44-fold increase in CDI ϩ individuals), two-component system (1.85-fold increase in CDI ϩ individuals), flagellar assembly (5.11-fold increase in CDI ϩ individuals), phosphotransferase system (1.57-fold increase in CDI ϩ individuals), and ascorbate and adorate metabolism (3,47-fold increase in CDI ϩ individuals) were the  top five expressed gene pathways identified within the CDI ϩ cohort. Additional enriched pathways, including quorum sensing (1.19-fold increase in CDI ϩ individuals), bacterial chemotaxis (3.66-fold increase in CDI ϩ individuals), linoleic acid metabolism (12.19-fold increase in CDI ϩ individuals), and biofilm formation-Pseudomonas aeruginosa (3.31-fold increase in CDI ϩ individuals), were also observed within the CDI ϩ expression data. Protein processing in the endoplasmic reticulum, valine, leucine, and isoleucine biosynthesis, histidine metabolism, one-carbon pool by folate, and carbon fixation in photosynthetic organisms (1.55-fold increase in CDI Ϫ individuals) were the top five expressed level 3 KEGG pathways identified as enriched within the CDI Ϫ cohort.
Both metatranscriptomic and metagenomic analyses yielded similar results regarding identified biomarker taxa within CDI ϩ and CDI Ϫ individuals. Taxa within the Enterobacteriaceae and Clostridiaceae were identified to be enriched within the CDI ϩ group for both data sets, whereas the CDI Ϫ group yielded enrichment of the Verrucomicrobia taxa for both the metagenome and metatranscriptome. When considering functional KEGG pathways associated with CDI ϩ individuals, 12 pathways are common to both the metatranscriptomic and metagenomic analysis. More specifically, the two-component system and the phosphotransferase system were among the top five gene pathway biomarkers of CDI ϩ status within the metagenome and metatranscriptome data set. Only the metatranscriptome revealed association of elevated quorumsensing gene expression with CDI ϩ status, as this pathway was not significantly differential considering the metagenome data described alone.  To better understand the taxa contributing to enriched CDI ϩ expression pathways, a taxa contribution plot was generated for selected level 3 enriched KEGG pathways (Fig. 4). The majority of genes found mapped to the bacterial chemotaxis, flagellar assembly, and linoleic acid metabolism pathways within CDI ϩ individuals originated from the E. coli genome (52%, 85%, and 80%, respectively). Pseudomonas aeruginosa was also observed to have distinct contributions to bacterial chemotaxis (22%), biofilm formation (39%), and flagellar assembly pathways (13%). Quorum sensing, signal transduction, sulfur relay system, and two-component systems were predominantly mapped to unclassified bacteria (47%, 42%, 22%, 41%, respectively). Additionally, Alistipes finegoldii and Escherichia coli yielded substantial contributions to quorumsensing genes within CDI ϩ individuals (23% and 11%, respectively). When specifically visualizing Clostridioides expression data of gene pathways within CDI ϩ and CDI Ϫ cohorts, an increase in expression of genes related to two-component systems (4.76fold increase in CDI ϩ individuals) and bacterial secretion systems (2.27-fold increase in CDI ϩ individuals) were observed within CDI ϩ individuals (Fig. S7). Further, an increase in expression of genes related to biofilm formation (2.45-fold increase in CDI ϩ individuals) within the clostridia of CDI ϩ individuals were observed.
To visualize differences in average gene expression between CDI ϩ and CDI Ϫ cohorts, Pathview plots were generated (Fig. 5A to E). CDI ϩ patients demonstrated enriched two-component systems, linoleic acid metabolism, quorum sensing, flagellar assembly, and biofilm formation pathways. Of note, an increased average expression of two-component osmotic upshift response genes was observed within CDI ϩ individuals; this included increased average expression of the envZ osmolarity sensor kinase (3.84-fold increase in CDI ϩ individuals) (Fig. 5A). Within the linoleic acid metabolism pathway, phospholipase pldA, involved in the conversion of lecithin to linoleic acid, was observed to be overexpressed within CDI ϩ individuals (7.10-fold increase in CDI ϩ individuals) (Fig. 5B). An increase in average expression of genes within the qseC and qseE (10.69-and 39.81-fold increase, respectively) quorum-sensing pathway was observed within the CDI ϩ cohort (Fig. 5C). Increased average expression of flagellar assembly genes fliD, flgE, flgD, flgB, flgC, fliH, fliQ, fliR, flgM, fliJ, fliS, fliT, and motB was observed within the CDI ϩ cohort compared to levels in CDI Ϫ samples (Fig. 5D). Average expression of genes related to biofilm formation was observed to be elevated within the CDI ϩ cohort, including curli fimbriae biosynthesis genes csgB (40.8-fold increase in CDI ϩ individuals) and csgA (52.42-fold increase in CDI ϩ individuals) (Fig. 5E).

DISCUSSION
Our study demonstrates that CDI ϩ patients have distinct compositional and functional elements that distinguish them from CDI Ϫ subjects, with CDI characterized by a significant enrichment of fungal taxa not observed in C. difficile-negative diarrheal patients. The lack of any difference in preoperative antibiotic use between the cohorts suggests that the enrichment of fungal organisms in CDI is not simply an epiphenomenon caused by antibiotics with fungi coincidentally filling spatial niches left unoccupied by diminished bacterial populations. This finding lends further support to the concept that the dysbiosis associated with CDI has an important contribution from fungal organisms. Direct measurement of gut bacterial community function through metatranscriptomic profiling revealed multiple enriched pathways in CDI involving biofilm formation, bacterial chemotaxis, flagellar assembly, quorum-sensing proteins, xenobiotic production, and various bacterial two-component systems. As discussed below, these data suggest that the dysbiotic state of CDI is associated with community functions that collectively resist a return to a healthy microbial community structure. This information could provide another explanation for the high persistence and recurrence rates of CDI through mechanisms independent of antibiotic resistance. Many of these enriched functional pathways can be traced to E. coli and to Pseudomo- nas, adding to the current literature that describes these common Gram-negative pathogens having an important role in CDI. Our work provides an evaluation of a subject complementary to prior publications regarding bile acid and carbohydrate metabolism in CDI (8)(9)(10)(11)(12)(13). We characterize the microbial community structure that characterizes disease-specific features of the dysbiosis in CDI, and in particular, we describe the frequent cooccurrence of several fungal organisms that contribute to that dysbiosis. We also describe the first metatranscriptomic data derived from humans with CDI, providing for the first time several mechanisms that potentially can explain the frequent clinical observation of treatment failures in the medical care of CDI.
The adherence of C. difficile to colonic mucosa contributes both to growth of the organism (14) in the large intestine and intoxication of the host (15). Compositional and functional changes were observed in this study that would contribute to mucosal adherence of C. difficile. The loss of commensal organisms provides a selective advantage to C. difficile by creating spatial niches serving as an early contributor toward C. difficile colonization. Additionally, our data indicate that E. coli and Pseudomonas aeruginosa are majority contributors to pathways dedicated to biofilm formation, suggesting a previously unappreciated sessile microbial biofilm community that may be important in CDI. The role of biofilms in bacterial infections is well known, although the role of biofilms in CDI is poorly studied, including a dearth of information regarding the regulatory mechanisms for biofilm production by C. difficile. The presence of a biofilm may contribute to treatment failures with C. difficile-directed antibiotics; for example, subinhibitory doses of metronidazole have been demonstrated (16) to induce the production of biofilm by C. difficile. Based on multivariate association with linear model (MaAsLin) analysis, biofilm production in this study was simultaneously related both to the use of antibiotics as well as to CDI ϩ status, indicating that antibiotics in general, and CDI in particular, represent two related but distinct environmental cues for biofilm production (Table S1). Whether C. difficile-directed antibiotics also serve as an environmental cue promoting biofilm formation by E. coli and Pseudomonas is unknown but warrants further study as a potential contributor to treatment failures.
The enrichment of potentially pathogenic fungal organisms with bacteria known to be associated with human disease states introduces a view of CDI suggesting the potential for a complex transkingdom interaction between bacteria, fungi, and the human host, one mediated in part by quorum sensing. CDI subjects in this study demonstrated significant pathway enrichment for qseC, encoding a sensor histidine kinase (17) that serves as part of a bacterial two-component system functioning as a bacterial adrenergic receptor. qseC allows for bacterium-host hormonal signaling, with bacteria sensing the production of host norepinephrine as well as allowing for prokaryote-eukaryote signaling through autoinducer-3 (AI-3). In a study focused on enterohemorrhagic Escherichia coli (EHEC) and Citrobacter rodentium, Moreira and colleagues (18) provided the first description of host stress hormones (norepinephrine and epinephrine), which directly affect gut physiology through colonic adrenergic receptors, also having a direct effect on bacterial gene expression, with qseC and qseE being necessary for C. rodentium successfully colonizing the murine intestine. This only recently appreciated cross talk by which fight-or-flight host hormones signal differential expression of virulence pathways in gut bacteria adds a new dimension to understanding how host stress and inflammatory responses affect the course of gut infections.
These data and recent publications (5,19) by our group suggest that there is a fungus-associated bacteriome involved in CDI. The consistency over several studies with which a significant enrichment of fungal organisms is observed in human subjects with CDI, but which is not observed in control groups with C. difficile negative diarrhea, suggests that important transkingdom interactions between bacteria and fungi are present in CDI. There is evidence from studies on the rhizosphere (20), on coculture systems (21), and on human subjects with Crohn's disease (22) indicating that fungi are frequently associated with a particular and environment/disease-specific bacteriome, and that the cooccurrence of these fungal and bacterial organisms is more than a coincidental occupying of the same spatial niche. These prior studies have provided Fungal Bacteriome and Multi-omics in CDI evidence that fungi, when viewed as hosting an associated bacterial community, help to not only protect bacteria from the effects of antibiotics (20) but also potentially allow obligate anaerobic bacteria to grow (21) under aerobic conditions. The mechanisms by which this occurs have not been elucidated, and further, the potential benefits to fungi in this relationship are not clear. In fact, one prior study suggested that p-cresol production by C. difficile in culture inhibits hypha formation by C. albicans, which prevents Candida from forming a biofilm, suggesting an exploitative relationship between C. difficile and C. albicans.
When combined with what is already known about CDI, the metatranscriptomic data from the present study provides novel insights that may be valuable for understanding how microbial functional pathways contribute to the development of CDI. The use of antibiotics decreases bacterial density and diversity, creating new spatial and metabolic niches that favor the mucosal adherence and population expansion of C. difficile. A potential enhancement to this process is the production of biofilm by C. difficile as well as by E. coli and Pseudomonas, which in part may be a response not just to the antibiotics which helped to cause CDI but even those directed at C. difficile itself. At some point thereafter, this is followed by C. difficile intoxication promoted due to multiple factors, including C. difficile reaching a stationary growth phase (23) as well as due to quorum-sensing proteins (24), the latter of which may involve pathways by which host stress hormones induce expression of bacterial virulence pathways (17). These changes increase inflammation, which further favors the CDI disease state (25) through toxin-dependent and toxin-independent pathways. This leads to osmotic changes in the colon due to loss of colonocyte barrier function, making mucosal adherence more difficult for many bacteria and reducing a large proportion of the remaining and diminished bacterial population to a planktonic state, with their resources diverted toward functions such as osmotic regulation and run-and-tumble locomotion and away from population recovery. Concurrent with all of this is the enrichment of pathways for xenobiotic compound production, from bacterial and possibly fungal sources, potentially reinforcing a dysbiosis favoring CDI. Although Fig. S6 in the supplemental material describes the pathways with the strongest enrichment, it does not provide an exhaustive description of the pathogenesis of CDI, including the influence of the host inflammasome on this disease. As part of our team's ongoing evaluation of these data, these pathways are currently a focus of animal research by our group to define their role in CDI.
This is the first tiered study of CDI using a combination of amplicon sequencing and matched metagenomics and metatranscriptomics. A smaller sample size is a limitation with this study, one reflecting cost constraints with metagenomics and metatranscriptomics. Our stringent inclusion criteria and the absence of differences in antibiotic exposure between those with and without CDI help to strengthen our conclusions despite this limitation. There is evidence that fungi are enriched in CDI, suggesting that there is a transkingdom interaction between fungi and bacteria in this disease; if this is demonstrated in further studies, it would introduce the concept of a fungus-associated bacteriome in CDI. A potential role for E. coli and Pseudomonas in CDI is also provided in this study, especially in terms of biofilm production, and the roles of host stress response as well as inflammation are further described as gut-related factors leading to the creation of niches potentially favoring C. difficile. In concert, these data describe mechanisms by which the causal dysbiosis for CDI may resist reversal, providing a possible, new explanation of CDI treatment failures. Future studies by the authors will investigate the roles of E. coli and Pseudomonas in CDI using animal models, as well as investigating whether the addition of antifungal therapy to C. difficile-directed antibiotics improves treatment success.

MATERIALS AND METHODS
The Institutional Review Board of The Pennsylvania State Milton S. Hershey Medical Center approved this study. Each patient consented using an IRB-approved consent form prior to the collection of their stool sample.

Patients.
A total of 49 inpatients admitted to the first author's institution at the time of sample accrual for the treatment of various medical ailments were enrolled in this study between March 2017 and June 2017. Patients who were at least 18 years of age were eligible, with no upper age limit. As with prior studies from our group (14), patients receiving chemotherapy within 60 days of enrollment, those with inflammatory bowel disease, those with a history of a positive C. difficile test within 60 days of potential enrollment, those empirically started on C. difficile-directed antibiotics prior to stool testing, and those within 30 days of a mechanical bowel preparation were ineligible for inclusion.
Also similar to previous studies from our group (19), only diarrheal stools were collected for analysis in this study. As part of their routine clinical care, each patient with a clinical suspicion for CDI had a stool sample sent by their treating physician to PSHMC Clinical Microbiology for C. difficile testing using a commercially available nucleic acid amplification test designed to detect a highly conserved sequence within the tcdA gene. C. difficile positive and negative stool samples were preserved, after clinical testing, in a -80°C freezer until patients consented for inclusion in this study. Table 6 provides a description of the number of stools subjected to amplicon sequencing (16S and ITS rRNA gene amplicon sequencing), as well as shotgun metagenomics and metatranscriptomics sequencing (described in the supplemental material). Briefly, fecal DNA extracts were subject to 16S rRNA gene and ITS2 Illumina tag PCR, pooled in equimolar ratios, gel purified, and sequenced on the Illumina MiSeq (16S rRNA libraries) and NextSeq (ITS libraries) platforms. Bacterial (16S rRNA gene) and fungal (ITS) sequences were quality filtered, clustered into operational taxonomic units (OTUs), and normalized using both the USEARCH and QIIME pipelines. Alpha and beta diversity calculations, as well as multivariate statistics, were performed as described in the supplemental material (Text S1). Fecal DNA was subjected to metagenomics library preparation using the Illumina Nextera XT kit. RNA extracted from fecal samples was converted to double-stranded cDNA, and libraries were prepared using the NuGEN Ovation kit. Samples that yielded detectable concentrations of high-integrity RNA and DNA were selected to maximize the number of matched metagenome/metatranscriptome samples for this study. Metagenomics and metatranscriptomics libraries were quantified, pooled, purified, and sequenced on the Illumina Hiseq4000 platform.
16S rRNA gene data processing. Forward and reverse reads were merged using USEARCH7 (26) with a minimum overlap set to 200 bp. Using USEARCH7, paired sequences were quality filtered at a maximum expected error of 0.5% and were subsequently truncated at a length of 249 bp. Filtered reads maintained an average Phred Q score of 40.5 postfiltering. OTUs were picked de novo using the UPARSE algorithm (26) at 97% similarity. Taxonomy was assigned using UCLUST within QIIME 1.9.1 (27) using the Greengenes 16S rRNA gene database (13-5 release, 97%) (28). Results were compiled into a biological observation matrix (biom) format OTU table with a total count of 3,768,584 after singleton removal. All 49 samples produced the minimum number of quality-filtered sequences (Ͼ5,000 sequences per sample) for downstream analysis. A range of 12,010 to 178,904 sequences per sample was observed. Alpha diversity rarefaction curves were created within the QIIME 1.9.1 package (27) using an unrarified OTU table. Multiple rarefactions were performed on the 16S rRNA OTU table from all samples using a minimum depth of 100 sequences to a maximum depth of 12,010 sequences, with a step size of 794 for 20 iterations. Rarefactions then were collated and compared between disease states considering observed species, Chao1, PD whole-tree, and Heip's evenness diversity metrics. Alpha diversity comparisons were conducted using a two-sample t test and nonparametric Monte Carlo permutations (n ϭ 999) within QIIME-1.9.1. 16S rRNA OTU tables were normalized using metagenomeSeq's cumulative sum scaling (CSS) algorithm (29) for beta diversity and biomarker analysis. Beta diversity analyses were performed using a weighted UniFrac distance matrix and visualized within a three-dimensional principal coordinate analysis (PCoA) plot in EMPeror (29,30). Analysis of similarity (ANOSIM) tests for significance were calculated within QIIME 1.9.1 to determine significance of clustering between disease cohorts.
Bipartite cooccurrence networks of bacterial and fungal taxonomy within C. difficile-positive (CDI ϩ ) and -negative (CDI Ϫ ) samples were constructed as described in Lamendella et al. (19) with the following changes in protocol. For a taxon node to be included in the network, it needed to occur in at least 50% of the samples and have a Spearman's rho threshold of at least 0.90. Spearman's correlations were paired with two dissimilarity measures, Bray-Curtis and Kullback-Leibler, to calculate the distances between nodes. An edge selection of 75 was utilized for all generated networks. All three statistical measures ensured minimal significant correlations due to outliers or error in data composition. To adjust for multiple testing corrections, the Benjamini-Hochberg correction was used to adjust P values in the last network processing step. Bacterial and fungal taxa were labeled down to the lowest identified taxonomic ranking. Taxonomic nodes were sized by relative abundance and colored by phylum.
ITS data processing. Single-end ITS sequences were quality filtered at an expected error of less than 0.5% using USEARCH v7 (26). After quality filtering, reads were analyzed using the QIIME 1.9.1 software package (16). Of the 49 processed ITS libraries, 38 samples yielded at least 5,000 sequences for downstream processing. A total of 17,789,158 sequences were obtained after quality filtering and chimera analysis. Open-reference OTUs were picked and chimeras were removed using the UPARSE algorithm within USEARCH at 97% identity, and taxonomy assignment was performed against the UNITE database using BLAST within QIIME-1.9.1. OTUs with taxonomy assigned were organized into a BIOMformatted OTU table, which was then summarized within QIIME 1.9.1. Alpha diversity analyses were conducted within the QIIME 1.9.1 sequence analysis package using an unrarified OTU table. Multiple rarefactions were conducted on sequences across all samples to a maximum rarefaction depth of 11,990 sequences, with 20 iterations at each step with a step size of 790. Alpha diversity was then collated and compared between disease states considering observed species richness, Chao1, and Heip's evenness metrics. PCoA plots and ANOSIM tests for significance were generated from a weighted Jaccard distance matrix made within QIIME 1.9.1 from a CSS-normalized OTU table (29). Metagenomic and metatranscriptomic taxonomic/functional gene profiling and comparative analysis. Taxonomic classification and bacterial species relative abundances were calculated with MetaPhlAn2 using default settings (31). Unassigned taxa were discarded for downstream taxonomic comparisons, and species consisting of less than 0.01% of identified marker sequences across all samples were discarded to reduce data noise. Taxonomic output from all metagenomes and metatranscriptomes were merged into two respective .tsv tables for downstream analysis of each data set. Functional gene annotation and quantification of filtered sequence data were conducted using HUMAnN2 (http:// huttenhower.sph.harvard.edu/humann2; v.0.9.9) against the Uniref90 functional gene database (32). Default HUMAnN2 settings were utilized for functional gene annotation. Generated reads per kilobase (RPK) counts of Uniref90 annotations were regrouped as KEGG orthologies (KO) and underwent CPM normalization to account for differences in sequencing depth between samples.
CPM-normalized KO counts were grouped into KEGG pathways within HUMAnN2 using a custom python script for LDA effect size (LEfSe) plotting. For each respective metagenome and metatranscriptome data set, relative abundances of taxonomic profiles and functional pathways were multiplied by 1 million and formatted as described in Segata et al. (33). Comparisons were made with "disease" as the main categorical variable ("class"). Alpha levels of 0.05 were used for both the Kruskal-Wallis and pairwise Wilcoxon tests. LDA scores greater than 2.0 are displayed for taxonomy and functional (KEGG) pathways. Resulting taxonomic and functional gene pathway biomarkers between CDI Ϫ and CDI ϩ individuals were then plotted in cladogram structures for the metagenome and metatranscriptome data sets, respectively. CPM-normalized KEGG orthology pathway tables were stratified by bacterial taxa within HUMAnN2 and were subsequently summarized by bacterial taxa using a suite of custom python scripts and were imported into an R environment. The R packages ggplot2 (34) and plotly (35) were then utilized to produce visualizations displaying the eight most abundant bacterial contributors, based on the total CPM normalized gene count, to each pathway.
Multivariate association with linear models (MaAsLin) was conducted to find associations between clinical metadata of interest, including CDI status as well as antibiotic treatment status and identified metatranscriptome functional pathways. Functional pathway expression data underwent relative abundance normalization prior to MaAsLin analysis as required by the protocol (38). Enrichment results were displayed at a false discovery rate (FDR)-corrected P value (q value) of Ͻ0.40.
Additional sample preparation and bioinformatics methodologies are detailed in the supplemental material.
Data availability. For all sequencing results, an archive is publicly available at NCBI BioProject ID PRJNA478949 under the SRA accession number SRP151803.