Faecal microbiome and metabolic signatures in rectal neuroendocrine tumors

Background: The prevalence of rectal neuroendocrine tumors (RNET) has increased substantially over the past decades. Little is known on mechanistic alteration in the pathogenesis of such disease. We postulate that perturbations of human gut microbiome-metabolome interface influentially affect the development of RNET. The study aims to characterize the composition and function of faecal microbiome and metabolites in RNET individuals. Methods: We performed deep shotgun metagenomic sequencing and untargeted liquid chromatography-mass spectrometry (LC-MS) metabolomic profiling of faecal samples from the discovery cohort (18 RNET patients, 40 controls), and validated the microbiome and metabolite-based classifiers in an independent cohort (15 RNET participants, 19 controls). Results: We uncovered a dysbiotic gut ecological microenvironment in RNET patients, characterized by aberrant depletion and attenuated connection of microbial species, and abnormally aggregated lipids and lipid-like molecules. Functional characterization based on our in-house and Human Project Unified Metabolic Analysis Network 2 (HUMAnN2) pipelines further indicated a nutrient deficient gut microenvironment in RNET individuals, evidenced by diminished activities such as energy metabolism, vitamin biosynthesis and transportation. By integrating these data, we revealed 291 robust associations between representative differentially abundant taxonomic species and metabolites, indicating a tight interaction of gut microbiome with metabolites in RNET pathogenesis. Finally, we identified a cluster of gut microbiome and metabolite-based signatures, and replicated them in an independent cohort, showing accurate prediction of such neoplasm from healthy people. Conclusions: Our current study is the first to comprehensively characterize the perturbed interface of gut microbiome and metabolites in RNET patients, which may provide promising targets for microbiome-based diagnostics and therapies for this disorder.

All samples were sequenced on an Illumina platform in PE150 mode at Novogene Bioinformatics Technology (Beijing, China). Adapter was trimmed and low-quality reads were filtered using trimmomatic-0.39. Then, host sequences were removed by aligning sequencing reads back to host genome reference (hg38) using soap2 (version 2.20) when sequence identity exceeds 90% [1].
Taxonomic profiling of the metagenomic samples was performed using MetaPhlAn2 [2] which uses clade-specific markers to provide pan-microbial (bacterial, archaeal, viral and eukaryotic) quantification at species-level. MetaPhlAn2 was run with default parameters.
At the same time, the high-quality reads were aligned to the updated gut microbiome gene catalog [3] using SOAP2 (version 2.20) with a threshold of more than 90% identity and 95% reads length. Gene abundance profile was calculated as previously described [3]. Next, the relative abundances of KEGG (Kyoto Encyclopedia of Genes and Genomes) orthologous (KOs) groups were summed up from the relative abundances of their respective genes to obtain functional profile.

Bioinformatic analysis
Alpha diversity was measured by observed counts and Shannon index at gene, phylum, genus and species level, respectively, with an in-house Perl script.
Bray-Curtis distance was calculated using python module scipy (1.5.1). Principal component analysis (PCA) was analyzed using R package FactoMineR and factoextra.
Principal coordinates analysis (PCoA) was used for visualizing beta diversity using the Bray-Curtis distance matrix data in R with ggplot2. R packages vegan and ggplot2 was used to analyze and visualize NMDS using Bray-Curtis distance. PERMANOVA was also analyzed using R package vegan with permutation times of 999. LEfSe (Linear discriminant analysis Effect Size) was performed with LEfSe (version 1.0) software to determine the features most likely to explain differences between groups.
Random forest model was built to find biomarkers most likely to be related to BCS status with R package randomForest and pROC was used to perform ROC analysis on random forest models.
Differentially enriched KEGG pathways/modules were identified according to their reporter score from the Z-scores of individual KOs as previously described [4,5].
Briefly, a one-tail Wilcoxon rank-sum test was performed on all the KOs that occurred in more than five samples and adjusted for multiple testing using the Benjamin-Hochberg procedure. The Z-score for each KO was then calculated. Z-score of pathway/module background distribution was corrected and used as the final report score for evaluating the enrichment status. A report score of ≥ 1.96 (95% confidence according to normal distribution) could be used as a detection threshold for significantly differentiating pathways.

HUMAnN2 analysis
Functional profiling was performed by HUMAnN2 [6]. Sample reads are mapped against this database to quantify gene presence and abundance on a per-species basis.
A translated search is then performed against a UniRef-based protein sequence catalogue for all reads that fail to map at the nucleotide level. The result are abundance profiles of gene families (UniRef90s), stratified by each species contributing those genes, and which can then be summarized to higher-level gene groupings such as ECs or KOs. The QE HFX mass spectrometer was used for its ability to acquire MS/MS spectra on information-dependent acquisition (IDA) mode in the control of the acquisition software (Xcalibur, Thermo). In this mode, the acquisition software continuously evaluates the full scan MS spectrum. The ESI source conditions were set as following:

Data preprocessing and annotation
The raw data were converted to the mzXML format using ProteoWizard and processed with an in-house program, which was developed using R and based on XCMS, for peak detection, extraction, alignment, and integration. Then an in-house MS2 database (BiotreeDB) was applied in metabolite annotation. The cutoff for annotation was set at 0.3.

Correlation between genus and species in each group by FastSpar
Microbial association (genus and species level) in each group was determined by FastSpar, a fast and parallelizable implementation of the SparCC algorithm with an unbiased P-value estimator [7]. We selected the significantly different genus and species between RNET and control groups (p < 0.05). FastSpar has been widely used to estimate the correlation values from compositional data. Significant co-occurrence and co-excluding interaction (FastSpar correlation scores roh < −0.2 or roh > 0.2, p < 0.05) were visualized and analyzed using igraph. We calculated the degree, betweenness and strength of each node to estimate its importance to the network.

Statistical analysis
We performed Wilcoxon rank-sum test (two-tailed) for the difference of α diversity and permutation multivariate analysis of variance (PERMANOVA) test for the difference of β diversity between the two compared groups. Spearman correlation analysis between differential enriched features were performed.   (D) ANOSIM test was applied to compare microbial structure dissimilarity between and within groups (two-sided wilcoxon rank-sum test).     Table S1 Participants' clinical information in the discovery cohort.

Table S5
Bacterial β-diversity adjusted by clinical parameters.

Table S6
Metabolic bray-Curtis similarities adjusted by clinical parameters.

Table S7
Microbial and metabolic based classifiers adjusted by clinical parameters.