Different meta-analyses of IBD patient gene expression profiles from microarray datasets have already identified dysregulation of expression of genes encoding for several inflammatory factors and RNA-binding proteins3–5. Nevertheless, these studies focused on a limited number of genes, lacking not only the whole-transcriptome but also metatranscriptome profiling, the latter newly emerged as successful to uncover novel gut-populating microbial entities6.
Therefore, to provide a wider picture of the whole-transcriptome and metatranscriptome at different tissue/cell levels in both UC and CD patients, publicly available RNA-Seq datasets were collected and analyzed. Being 26 independent studies, we predicted an experiment-dependent bias which was counteracted through a well-established batch correction algorithm, in accordance with source and tissue of origin. Of note, we also tried to batch correct the different library construction strategies utilized, but their variance was already fully explained by the source study. The meta-analysis performed was used as the core to design the IBD TaMMA web app (Supplementary Fig. 1a), which allows quick access to differential gene expression and gene ontology functional enrichment results, among the different conditions (https://ibd-meta-analysis.herokuapp.com/; full guide available at https://ibd-tamma.readthedocs.io/). Sample dispersion within the UMAP, easily attainable through the IBD TaMMA platform, shows clustering in accordance with the tissue of origin but not with the source study (Fig. 1a and Fig. 1b), indicating successful data harmonization. Consistently, housekeeping gene expression levels were found to be comparable across the different tissues and conditions (Supplementary Fig. 1b).
IBD TaMMA pinpoints a strong differential gene expression among UC, CD, and healthy (control) groups in the ileum, colon, and rectum, as shown in Fig. 1c. Of note, IBD-specific proinflammatory signatures were confirmed. Indeed, dysregulations between IBD-derived intestinal and the healthy tissues in the expression levels of Tumor Necrosis Factor-alpha (TNFɑ), Interferon-gamma (IFNG), Interleukin 12B (IL12B), Integrin alpha 4 (ITGA4), Integrin beta 7 (ITGB7), already known as drivers of chronic inflammation and thus exploited as therapeutic targets for IBD patients7, were confirmed (Fig. 1d and Supplementary Fig. 1c). Likewise, S100A8 and A9 transcripts encoding for the two subunits of the fecal biomarker calprotectin8 and the recently emerged S100A128 were increased in intestinal samples from CD and UC as compared to the healthy (Fig. 1e and Fig. 1f). These results are well in line with most of the studies reporting these molecules as biomarkers of inflammation in patients with IBD8.
Metatranscriptomics performed on IBD and healthy stools paralleled previous metagenomics analysis, confirming the Bacteroidetes and Firmicutes phyla, followed by the Actinobacteria and Proteobacteria, as the main colonizers of the fecal microbiota9 (Fig. 1g, upper bars). IBD TaMMA also highlighted IBD and healthy intestinal samples to be colonized by the same phyla, although with different proportions (Fig. 1g, lower bars). Moreover, the decreased intestinal microbiota diversity, a well-known feature of IBD pathogenesis1, was confirmed in IBD stools as compared to the healthy (Fig. 1h), paralleled by the decreased diversity also in colon and ileum from UC and in the colon from CD (Fig. 1i and Fig. 1j). Interestingly, the CD ileum showed increased microbiota diversity compared to the other groups (Fig. 1j), providing a novel insight into the disease location-dependent microbiota composition in CD patients. Of note, the IBD TaMMA also confirmed virome dysbiosis with the expansion of Caudovirales in both pediatric IBD and UC samples10,11 (Fig. 1k) as well as the increased levels of Herpesviridae family in IBD-derived samples and of the Hepadnaviridae family in UC ileum as compared to the healthy, as previously reported12,13 (Supplementary Fig. 1d and Supplementary Fig. 1e).
It is noteworthy that during the analysis most of the human unmapped reads failed to be classified by metatranscriptomics profiling and therefore were considered as NGS dark matter (Methods). Although its analysis goes beyond the scope of this work, a dedicated submission was done as we think these data can also contribute to the understanding of gene and microbial entities not yet known but that may be the aim of future investigations (i.e., discoveries of new microbial entities).
Conclusively, altogether these pieces of evidence establish the IBD TaMMA as a reliable platform confirming well-known features of IBD pathogenesis and resulting in a useful open-source tool for developing further insights into IBD pathogenesis.