Tumour Microbiome-Based Subtypes of Colorectal Cancer Correlate with Clinical Variables

Long-term dysbiosis of the gut microbiome has a signicant impact on the development, progression and the aggressiveness of colorectal cancer (CRC) and may explain part of the observed heterogeneity of the disease from phenotypic, prognostic and response to treatment perspectives. Although the shifts in gut microbiome in the normal-adenoma-carcinoma sequence have been described, the landscape of microbiome within CRC and its associations with clinical variables remain under-explored. We performed 16S rRNA gene sequencing of paired tumour tissue, adjacent visually-normal mucosa and stool swabs of N=186 patients with stage 0-IV CRC to describe the tumour microbiome and its association with clinical variable and to derive tumour microbial subtypes. We new genera belonging Based we into three subtypes. The were associated grade, sidedness and one Further, we inspected the associations of with in The primary clinical variables predominantly while the of local and distant metastases stool microbiome. markers of CRC patients’ survival and prognosis. We found that CRC microbiome is strongly correlated with clinical variables, but these associations are dependent on the microbial environment (tumour mucosa, normal mucosa, stool). Our study thus identies limitations of the usage of microbiome composition as marker of CRC progression, suggesting the need of combining several sampling sites (e.g. stool and tumour swabs). report 15 genera associated with tumour mucosa of moderately and poorly differentiated tumours compared to well differentiated tumours. In our study, the sample size allowed the inclusion of an interaction term, thus providing a ner estimation of differences in microbiome composition with respect to tumour grade. We reported 50 genera with association with grade and/or side for all tissue types studied, in total 66 of signicant associations at FDR < 0.1. We conrmed previously reported high tumour grade associations of Fusobacterium, Campylobacter and Mogibacterium, in CRC tumour mucosa Prevotella and Selenomonas were associated with high grade (3), but only in the right-sided/transverse tumours.


Introduction
Colorectal cancer (CRC) is the third most frequent cancer worldwide, and the second leading cause of cancer mortality in Europe [1]. At the same time, it is a heterogeneous disease, both at the phenotypic level and from a prognostic and response to treatment perspective. The current standard treatments are limited and remain ineffective for a large group of CRC patients because of lack of adequate patient selection, resulting in unneeded toxicity and elevated cost due to the over-treating of patients that do not bene t [2,3]. Recent research shows that gut microbiota may have an important role in colorectal tumour initiation and progression [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22].
Several studies showed that bacteria adherent to colorectal adenomas or carcinomas were different from bacteria adherent to healthy gut mucosa [8,11,12] due to the altered tumor microenvironment with decreased pH and modi ed metabolic conditions resulting from a hypoxia and onset of necrosis [23]. Gut microbiota can promote colon cancer development or change the tumour invasion potential through (i) immunomodulation [10,[24][25][26] or (ii) metabolic activity -via the production of speci c toxins inducing DNA damage responses. Overall, the evidence of microbiome importance in colon cancer development is so overwhelming that a bacterial driver-passenger model for colorectal cancer development and progression has been suggested [27] as an alternative to the broadly accepted driver-passenger mutational adenoma-carcinoma model. Additionally, gut microbiota seems to play an important role also in response to anti-cancer therapy [28].
Despite the observation of several consistent patterns of gut microbial disruption between healthy and CRC cases, a full characterization of the microbial landscape in CRC is still missing. The published studies vary greatly in terms of techniques, specimen origin and sample size, thus hampering any integrative analysis. Most studies compared diseased and healthy subjects, and the few that try to characterize microbial composition within the CRC patients suffer from a small sample size. The specimens used in most studies are stool [4,6,7,15,17,18,[20][21][22] or mucosa samples from colonoscopy biopsies [11,13,15] or after resection [6,12,16,19]. Stool microbiota sampling has the advantage of being non-invasive allowing its use for screening and follow-up studies. There have been efforts to incorporate information on the tumour-associated microbiome in order to improve the accuracy of the existing patient CRC prognostic scores [18] or to develop a new screening/prognostic model [29] The results of two different meta-analyses showed that the accuracy of predicting diseased state was about 0.8, similar to occult blood test results, the main non-invasive clinical test for this type of cancer [30,31]. However, the microbial composition in stool only partially re ects the situation in tumour mucosa, a trend consistent across different nationalities of the subjects, techniques of sampling or sequencing methodology [32].
The microbiota adherent to the mucosal tissue differs from the faecal microbiota in its needs for oxygen and nutrient types [33,34], therefore the information derived from stool may be insu cient for capturing tumour-microbe interactions consistent with the disease prognosis. The relevance of the tumour mucosa microbiome assessment for screening purposes is dependent not only on the co-presence of the bacteria in both tumour mucosa and stool, but also on its association with relevant clinical parameters in both sample types. Additionally, studying the (dis)similarity of bacterial composition between tumour and visually-normal mucosa from the same individual may provide hints regarding the changes in microenvironment which have occurred favoring the growth of certain species and shed some light on the underlying tumour-immune-microbe interactions and metabolic pathways.
Recently, two studies provided a comparison of bacterial composition in both tumour tissue and visually-normal tissue, as well as the bacterial composition of stool samples from the same patients [30,31]. Liu et al. [30] showed that the bacterial communities in both tumour tissue and visually-normal tissue were similar, but the study was largely underpowered (N = 8 individuals) and did not answer the question of the clinical relevance of this similarity. Other studies associated microbiome on tumour or in stool with clinical variables [31,35,36] but had a similar disadvantage in terms of statistical power (N = 25, N = 30, N = 53, individuals, respectively).
The above results were mostly species-centric in the sense that they compared the abundance of microbial species between the groups of interest individually. However, a broader view is needed to account for lesser known species coupled with a larger sample size allowing for capturing enough of the inter-tumour heterogeneity thus better understanding the possible effects of bacteria on tumour growth, aggressiveness or response to therapy. In our study, we take a microbial community-centric approach to provide a comprehensive description of CRC tumour microbiome based on 16S rRNA sequencing. We analyse three sample types (tumour mucosa, visually-normal mucosa, stool) from N = 186 individuals with stage 0-IV colorectal cancer.
Our study has a dual nature, both exploratory and con rmatory. We explore and interpret the landscape of tumour mucosa associated microbiome with respect to clinical variables and microbial composition of paired adjacent visually-normal mucosa and paired stool samples. Bene tting from a larger sample size, we advance the state-of-the-art knowledge by reporting previously unseen associations.

Patients and Specimens
All specimens were collected at Masaryk Memorial Cancer Institute (Brno, Czech Republic) during the years 2015-2019 from unselected patients newly diagnosed for CRC. The stool samples were collected from untreated patients before the scheduled surgery. Patients performed the collection at home, the morning of their hospitalization for the surgery and brought the samples to the hospital, where they were immediately frozen at -80 °C until further processing. Swabs form tumour and visually-normal mucosa were collected within 30 minutes of the tumour resection at the department of pathology. If possible, the normal tissue swab was taken at least 20 cm proximally to the tumour. The swabs were then stored immediately in a freezer at -20 °C and without unnecessary delay transferred to -80 °C until further processing. All samples, including stool, were collected using DNA free cotton swabs (Deltalab, Spain).
Overall, we analysed N = 505 samples from N = 186 patients with CRC. Of these, there were 133 triplets (all three sample types from the same patient) and 53 mucosa duplets (swab from tumour and visually-normal mucosa from the same patient).
The study was approved by the ethical committee of Masaryk Memorial Cancer Institute. All patients gave written informed consent in accordance with the Declaration of Helsinki prior to participating in the study. DNA extraction, PCR ampli cation and sequencing of 16S rRNA gene The DNA extraction was performed using DNeasy® PowerSoil® Isolation kit (QIAGEN, Germany) according to the manufacturer's instructions. Extracted DNA was used as a template in amplicon PCR to target the V4 hypervariable region of the bacterial 16S rRNA gene. The 16S metagenomics library was prepared according to the 16S Metagenomic Sequencing Library Preparation protocol (Illumina, USA) with some deviations described below. Each PCR was performed with HotStarTaq Master Mix Kit (QIAGEN, Germany) in triplicate, with the primer pair consisting of Illumina overhang nucleotide sequences, an inner tag and gene-speci c sequences [37,38]. The Illumina overhang served to ligate the Illumina index and adapter. Each inner tag, i.e. a unique sequence of 7-9 bp, was designed to differentiate samples into groups. Primer sequences and PCR cycling conditions are summarized in table S1. After PCR ampli cation, triplicates were pooled, and the ampli ed PCR products were determined by gel electrophoresis. PCR clean-up was performed with Agencourt AMPure XP beads (Beckman Coulter Genomics). Samples with different inner tags were equimolarly pooled based on uorometrically measured concentration using Qubit® dsDNA HS Assay Kit (Invitrogen™, USA) and microplate reader (Synergy Mx, BioTek, USA). Pools were used as a template for a second PCR with Nextera XT indexes (Illumina, USA). Differently indexed samples were quanti ed using the qPCR kit KAPA Library Quanti cation Complete Kit (Roche, USA) and LightCycler 480 Instrument (Roche, USA) and equimolarly pooled according to the measured concentration. The prepared libraries were checked with a 2100 Bioanalyzer Instrument using the High Sensitivity D5000 Screen tape (Agilent Technologies, USA) and concentration was measured with qPCR shortly prior sequencing. The nal library was diluted to a concentration of 8 pM and 20% of PhiX DNA (Illumina, USA) was added. Sequencing was performed with the Miseq reagent kit V2 (500 cycles) using a MiSeq instrument according to the manufacturer's instructions (Illumina, USA).

Data analysis
Preprocessing and quality control Forward and reverse pair-end reads were demultiplexed and barcodes and primers were trimmed. Denoising algorithm with DADA2 [39] was applied separately on forward and reverse reads that passed the quality and length lter and did not contain N's. Reads were merged using the fastq-join method [40]. In the next step chimeras were detected with the function removeBimeraDenovo in DADA2. Chimera sequences were subsequently excluded from the analysis and Amplicon Sequence Variant (ASV) table was created.
The number of reads after quality ltering and chimeras removing ranged from 2968 to 239116, with median of 44277 and mean of 52546 reads per sample. The number of reads did not differ between the sample types (paired Wilcoxon test, Figure S1).

Taxonomy assignment
Taxonomy was assigned to each ASV based on SILVA 123 reference database [41] using the algorithm UCLUST [42] in QIIME [43]. BLAST algorithm [44] was used to identify the species and all taxa with the maximum identity and minimum e-value were selected for each ASV. The observed species metric and the Chao1 and Shannon index were used to estimate alpha diversity for each sample in QIIME. Beta diversity was computed in QIIME using both weighted and unweighted UniFrac metrics [45].

Sample and taxa ltering
We ltered out the ASVs that were unassigned at phylum level and all the ASVs belonging to the phylum Cyanobacteria.
Only the taxa present in at least 3 samples of the same sample type and at the same time represented by at least 9 reads were kept for further analysis to account for possible contaminations. The threshold of 9 reads represents 0.03% taxa abundance in the sample with the least number of reads (2968). This ltering step resulted in discarding 42-58% of taxa at each taxa level (Table S2) All comparisons between the three sample types were performed on triplet samples from 133 patients, totaling N = 399 samples for the analysis. For analysis of tumour -visually-normal mucosa pairs, we used paired tumour and visuallynormal mucosa swabs from 186 patients (totaling 372 samples). For analyses within sample types, we used all the available samples (186 for tumour mucosa swabs, N = 186 for visually-normal tissue mucosa swabs and N = 133 for stool).
Non-metric Multidimensional Scaling (R vegan package [50]) over Aitchinson distance matrices (R coda.base package [51]) was used to analyse tumour microbial heterogeneity and β-diversity. To estimate the contribution of clinical traits in microbiome, β-diversity permutational multivariate analysis of variance for distance matrices (R adonis function of vegan 2.5.4 package [50]) with 999 of permutations was used. To assess the differences between the sample types in alpha diversity we used a paired non-parametric two-way Mann-Whitney U test. We applied a non-parametric approach to identify differences in microbial composition between sample types and the associations between microbial relative abundance and clinical variables. For non-parametric analysis, Friedman test with paired Wilcoxon test and rank regression were used (R package R t [52]). Drop in dispersion test was used to produce overall p-values for rank regression models. Cochran Q test was used for analysis of differences in presence of genera across sample types (analysis of triplets). Benjamini-Hochberg correction for multiple hypothesis testing was applied [53]. Results were considered signi cant at FDR < 0.1. The adjusted p-values are referred to as q-values. Visualization was performed with gplots 3.0.1.1, ggplot2, ComplexHeatmap 1.17.1, and circlize 0.4.8 packages [54][55][56][57].
When comparing the signi cance of discoveries in stool vs mucosa and stool vs visually-normal tissue, the dataset used was the one having patients with all 133 samples. However, for mucosa samples, the full set of 186 was exploited to maximize the statistical power.
For each clinical variable (or a combination of), we only tested genera present in at least 10 samples in one clinical group (or a combination of). We do emphasize that we approached this statistical testing from the point of view of a pilot discovery study.
Due to the known association between tumour grade and location on [58] (con rmed also in our data, p < 0.001, Fisher's exact test), we investigated the associations of microbiome with grade and tumour location in a model with the interaction between covariates compared to model without interaction. To ensure a more balanced design, we considered three categories for the location: right and transversum, left, rectosigmoid and rectum, respectively.
While we consider only associations with FDR < 0.1 to be statistically signi cant, we also report the unadjusted results p < 0.05 for the purposes of hypothesis con rmation by other studies.

Data Access
The data were uploaded to the European Nucleotide Archive under accession number PRJEB35990.

Results
In our effort to describe tumour microbial landscape, we explored the differences in microbiome abundance, diversity, the presence/absence of the species and the proportion of samples with the respective genera in different sample types across patient groups de ned by clinical variables (Table 1).  β-diversity analysis by NMDS performed on all sample types showed that tumor location was the factor with the highest in uence on total microbiome composition for all sample types, while tumour histological grade affected only tumour and visually-normal samples, which had similar microbial pro les (Text S1, Figure S2, Figure S3).

Microbial categorization according to sample type
There was no signi cant difference between the read counts across different sample types (paired analysis of sample triplets, see Methods).
Overall, in all the 505 samples we identi ed 5553 ASVs, of these, 4920 ASVs in the 133 triplet samples. The QIIME assigned species only to 50 ASVs, hence we also performed manual BLAST search to the SILVA database (Table S3).
For further analysis, however, we mainly operated on higher taxonomic levels. After the taxa ltering step (Table S2) (Table S4). Inclusion of the additional 53 duplets (tumour mucosa and visually-normal mucosa swabs) resulted only in slight differences at the genus level -the identi ed taxa remained the same, what changed was their unique presence in some sample types (Text S1).
While most of the genera were found in all three sample types, their incidence and abundance across sample types varied greatly, mainly between mucosal samples and stool, both in overall and pairwise comparisons (Text S1). 14 genera (Stomatobaculum, Pseudoramibacter, Pelomonas, Pasteurella, Mycoplasma, Kingella, Johnsonella, Helicobacter, Deinococcus, Centipeda, Bergeyella, Actinobacillus, Abiotrophia and an unassigned genus from order Comamonadaceae) were detected only in mucosal (tumour and visually-normal) samples ( Figure S4).
We further analysed the pairwise incidence of the 268 genera across sample types using Cochran's Q test and subsequent pairwise McNemar's tests and found that 128 genera varied signi cantly across sample types (analysis of 133 triplets, Text S1, Fig. 2A, Table S5).
To categorize the microbial genera based on their preferred environment we compared their abundance across sample types using Friedman rank sum test. Out of the 268 genera, 104 differed signi cantly in abundance across the sample types ( Table 2, Fig. 1). Based on these results, we de ned ve microbial categories. The rst is based solely on the results of tumour vs stool comparison: tumour genera (46 genera, more abundant in tumours compared to stool). Additionally, within the category of tumour genera, we de ned mucosa genera (41 genera, enriched also in visuallynormal mucosa compared to stool) and tumour-speci c genera (18 genera of tumour category, additionally enriched in tumours compared to visually-normal mucosa). Fifty genera were signi cantly more abundant in stool compared to tumours and visually-normal mucosa form the group of stool genera. The fth category was de ned as the nodifference genera (164 genera, no difference across any of the sample types) (Text S1). Table 2 Total counts of genera found signi cantly differentially abundant across the three sample types (133 triplets), divided into categories according to their enrichment in different sample types and top 10 signi cant genera for each category (see Table S6 for all genera). Abbreviations: "TtoS": tumour to stool, "VNtoS": visually-normal to stool, "TtoVN": tumour to visually-normal.  Fig. 2A).
We performed detailed literature search which revealed that tumour genera consisted predominantly of oral bacteria, many known as oral pathogens. The presence of some genera on tumor mucosa was never reported before nor associated with CRC (Table S8).

Microbiome and clinical variables
In the next step, we assessed the association of microbiome abundance with the clinical parameters and interpreted the results based on our microbial categorization. The results for each clinical variable are summarized in Table 3.  Figure S6, Figure S7, Figure S8, Figure S9). Some of these associations were found also in visually-normal mucosa (analysis of 186 samples).
In tumour mucosa, three genera, Fusobacterium, Campylobacter and Mobigacterium, signi cantly increased with tumour grade regardless of primary tumour side. Campylobacter was also associated with lower abundance in the mucosa of the left sided tumours. Of these, in the visually-normal mucosa adjacent to these tumours, none remained associated with high tumour grade.
Leptotrichia was signi cantly increased in advanced grade tumours (2 or 3 compared to 1); while the interactional model was chosen according to the drop dispersion test, none of interactional coe cients were signi cant. Prevotella and Selenomonas both had signi cantly increased abundances in the grade 3 right sided tumours. Prevotella was also increased in the stool of patients with grade 3 rectosigmoid/rectum tumours. Lachnoclostridium was associated with grade 2 in all tumours except for rectosigmoideum/rectum.
In contrast, Lachnospira, Ruminoclostridium 6, Gemella, [Eubacterium] ventriosum group, Methanobrevibacter, an uncultured bacterium from Opitutae vadinBB60 group family, Ruminococcaceae UCG-010, Victivallis and an uncultured species and an Incertae Sedis genus from Lachnospiraceae family were signi cantly enriched mainly in left-sided (for some including rectosigmoid/rectum) low-grade tumours. Lachnospira increased in abundance in grade 3 tumours of the rectosigmoid/rectum. In visually-normal mucosa, the same association was observed for Gemella, while Lachnospira was only signi cantly enriched in left, rectosigmoid and rectum location and [Eubacterium] ventriosum group only in rectum. Ruminoclostridium 6 remained enrichedin the stool of patients with low-grade left-sided tumours and grade 2 right-sided tumors.
Erysipelatoclostridium, Holdemania, Selenomonas 3 and Selenomonas 4 and Fretibacterium were increased in mucosa of the right-sided and transverse tumours. The same associations were found only for Fretibacterium and Selenomonas in the visually-normal mucosa. Fusicatenibacter, Christensenellaceae R-7 group, Ruminococcaceae UCG − 013 were increased in mucosa of the left-sided, rectosigmoid and rectal tumours, Coprococcus 1 and Family XIII AD3011 group in the mucosa of left-sided tumours and Bi dobacterium in the mucosa of rectosigmoid and rectal tumours. Similar associations were found for visually-normal mucosa for the Bi dobacterium.
For Flavonifractor and Odoribacter, although the model itself was signi cant in the drop dispersion test, none of the coe cients was signi cant alone.

Tumour stage and microbiome
When comparing early (0-II) and advanced (III-IV) stages, we identi ed increased abundance of Peptoclostridium and Fusobacterium associated with advanced stage and increased abundance of Parabacteroides, Lachnospiraceae FCS020 group and Tyzzerella 4 associated with early stage tumours. Of these, only Peptoclostridium was associated with an advanced stage in visually-normal tissue. None of these genera were signi cant after the FDR correction (p < 0.05, FDR > 0.1). In stool, we found a different set of genera associated with advanced tumour stage: increased Streptococcus, Peptococcus and Akkermansia and decreased Dorea and Ruminiclostridium (p < 0.05, FDR > 0.1).  Figure S10, Figure S11).

Patients with advanced T stages (pT 3-4) were characterized by signi cant increase in abundance of
The presence of metastases (local or distant) at the time of diagnosis was predominantly associated with changes in stool microbiome. Except for uncultured genus from the Erysipelotrichaceae family, none of these associations were signi cant after FDR correction.

Tumour CRC microbial subtypes
We performed hierarchical clustering of patients based on the relative abundance of the 46 tumour genera in the tumour mucosa samples. As a distance measure between patients we used Aitchinson's distance, and Euclidean distance for the clustering of genera (see Methods).
Based on the tumour-mucosa microbial composition we observed three major subtypes of tumours (TS1-TS3), that could further be divided into seven groups (a-h) (Fig. 3). The bacteria clustered into six groups M1-M6 (Table S8). The seven minor groups a-h are re ecting pro les of certain individual species, such as Sutterella, Peptoclostridium, Flavonifractor, Coprobacter, Aggregatibacter, Granulicatella, Hungatella, Alloprevotella or Slackia. We associated these subtypes with clinical variables ( Table 1).
The M1 group and M2 group are represented by typical gut microbiome members. The M1 group consists of ve most common and most abundant genera Fusobacterium, Lachnoclostridium, Bacteroides, Escherichia-Shigella and one uncultured genus from family Lachnospiraceae. All tumours contain at least three of these bacteria, most tumours (78,5%) all ve. These bacteria have high co-occurrence across sample types ( Fig. 2A, fourth panel) represents 54% (100) of tumours and is mostly missing the M3-5 bacterial groups as well as most of the high-grade related species, containing in median TS3 is thus characteristic by increased proportion of low grade tumours (15% grade 1). In TS3 microbial subtype, right-sided and left-sided tumours are equally represented.

Discussion
Carcinogenesis of colorectal cancer is a complex process with a unique set of somatic molecular changes that can be caused by different factors. Some studies correlated the dysbiosis of gut microbiome with the development of colorectal cancer in the healthy mucous-adenoma-carcinoma sequence or focused on elaborating the concrete role of selected bacterial species in gut (colorectal) pathogenesis progression [4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. In contrast, we characterized the heterogeneity of the microbiome in the ongoing disease through comprehensive description of microbial communities, as a rst step for in-depth targeted studies.
The comparison of three sampling sites provided us with insights into the preferred environment of the observed species. The resulting microbial categories served for focusing further analyses and interpretation of our ndings. We decided to base our characterization of colorectal cancer microbial landscape on bacteria with increased abundance in tumour mucosa compared to stool, speci cally the 46 tumour genera, to lter out potential stool contaminants. With a median of 58.6%, the tumour genera represented an important fraction of total microbiome found on tumour mucosa. We consequently de ned CRC tumour microbial subtypes and associated them with clinical variables. Naturally, we extended our search for clinical associations with microbiome to all available sample types.
For ten genera previously associated with stool of CRC patients we report for the rst-time their presence on tumour mucosa (see below).
Both the unsupervised NMDS analysis and supervised comparison of microbial composition between sample types con rmed previously reported observations from 16S rRNA sequencing studies of much smaller sample size (31, 27 and 8 individuals) [8,11,30] or a qPCR of speci c bacteria [16], that the tumour mucosal microbiome is dominated by mucosa-associated bacteria and that these species are at the same time associated also with non-cancerous (visuallynormal) mucosa. It is, however, debatable whether the non-tumour, visually-normal tissue (however distant from the tumour) from the surgically removed segment can be considered healthy or is possibly already in uenced by the bacteria initiating CRC development. Consistent with the bacterial driver-passenger model as proposed by Tjalsman et al. [27] our mucosa genera could be considered as candidates for potential drivers, while tumour-speci c genera could be potential passengers.
In further agreement with the driver-passenger model, our analysis revealed that tumours harbour a diverse community of opportunistic pathogens mainly of oral origin (26 of 46 tumour genera). Increasing evidence suggests that oral bacteria can migrate to the colon and cause infections and in ammation [59,60].
Most importantly, we newly identi ed genera of both oral and gut origin, not previously associated with colorectal cancer, with increased abundance in the tumour lesions: Selenomonas 3, Selenomonas 4, Aggregatibacter, Actinobacillus, Bergeyella, Phocaeiola, De uviitaleaceae UCG-011, Phocaeiola, an uncultured species from Veillonelaceae family, and an uncultured species from boneC3G7 at the family level (BLAST hit Fusobacterium necrophorum) (oral origin) and Tyzzerella 4, Massilia, and an unassigned genus from Peptostreptococcaceae family (gut origin). Some of these newly associated genera contain species that are known human pathogens causing infections of mucosal or other tissues. Selenomonas, Phocaeiola and newly established genera Aggregatibacter have been associated with periodontal disease [75][76][77]. Bergeyella and Actinobacillus genera comprises animal (zoonotic) pathogens causing infections in human through animal bites [78,79] and a few human pathogens causing endocarditis (B. cardium) or respiratory infections (A. hominis) [80,81]. In our study both genera were detected only in mucosal swabs of visually normal tissues or tumours. For other genera we newly associated with CRC, their potential involvement in CRC is not so obvious. The tumour-speci c genera of De uviitaleaceae might in uence CRC through the metabolism of butyrate [82]. Butyrate is critical for intestinal microbial balance and colon health and the atrophy of butyrate metabolism was associated with the pathogenesis of various colonic diseases [83]. The association of Tyzzerella 4 from the Lachnospiraceae family with CRC can be due to its increased occurrence in patients with higher cardiovascular risk (CVR) factor scores [84], that are associated also with CRC [85]. Presence of Massilia in CRC patients was also never described previously, but it was detected in patients with pancreatic cancer [86].
Indeed, some of the tumour genera were shown to be involved in the process of carcinogenesis, for some, a mechanism of action was described, but their role as bacterial drivers or passengers within the existing disease remains unknown.
Correlating the tumour microbiome with clinical variables of tumour progression bears the promise of offering viable hypotheses on the role of bacteria in the progression of the disease. The identi cation of such associations within visually-normal mucosa or with the stool microbial composition has a signi cant potential in derivation of screening biomarkers. Currently, the association between clinical variables and gut microbial composition are understudied and only few efforts addressed the topic but on rather limited cohorts.
It is known that right and left-sided colorectal tumours are different in terms of genetic stability and prognosis [87] and, given the spatial variation in microbiome's taxonomic structure in different parts of gut [36], the effect of tumour localisation on grade-microbiota interaction should be estimated. So far, few studies have associated microbiome of CRC patients with the tumour grade using different approaches (real-time PCR, 16S rRNA sequencing) but without considering the interactions between tumour sidedness and grade. A study of F. nucleatum's DNA presence in formalinxed para n-embedded tumour slides showed that the proportion of this species increased with tumour histological grade [88]. Wu et al. [36] report 15 genera associated with tumour mucosa of moderately and poorly differentiated tumours compared to well differentiated tumours. In our study, the sample size allowed the inclusion of an interaction term, thus providing a ner estimation of differences in microbiome composition with respect to tumour grade. We reported 50 genera with association with grade and/or side for all tissue types studied, in total 66 of signi cant associations at FDR < 0.1. We con rmed previously reported high tumour grade associations of Fusobacterium, Campylobacter and Mogibacterium, in CRC tumour mucosa [36,88]. Prevotella and Selenomonas were associated with high grade (3), but only in the right-sided/transverse tumours.
Almost all these microorganisms belong to tumour genera, except for Methanobrevibacter (stool genera) and Mogibacterium, which showed no difference in abundance between the sample types. Mogibacterium is a known oral pathogen, but the mechanism of its action remains unknown [89].
In our analysis, we observed potentially bene cial effects of the increased abundance of 20 stool genera signi cantly associated with left location, namely decreasing tumour grade with increased abundance, e.g. Bi dobacterium, Ruminococcaceae UCG-010 and Victivallis in tumour mucosa; Porphyromonas, Lachnospiraceae UCG − 005 and Gelria, in stool. So, while these genera are signi cantly depleted in mucosal samples compared to stool, a signi cant difference in abundance between location remained. Bi dobacterium was previously shown to have anti-cancerogenic effects [73,[90][91][92][93]. Similar association with left-sided and low-grade tumours showed 2 tumour genera (Lachnospira, Gemella) and Methanobrevibacter. Except for Bi dobacterium, none of these genera were previously reported to be associated with tumour grade.
We can only speculate whether the prolonged exposure of tumour mucosa to predominantly stool bacteria that is mechanistically related to tumours in distant part of the colon (left-sided or with onset in rectosigmoid and rectum) can have potentially harmful or bene cial effects or whether any associations are mostly due to the well-known molecular differences in the right vs left sided tumours [32,63,94].
In the study of Pu et al, Fusobacterium, Corynebacterium, Enterococcus, Neisseria, Porphyromonas and Sclegelella were more abundant and Oribacterium, Desulfovibrio, Clostridiales and Lactobacillus were less abundant in the invasive cancer group [31]. As a result of our work we partially con rm these ndings. In addition to Fusobacterium, we also identi ed Campylobacter to be increased in mucosa of advanced T stages tumours. On the contrary, we found Corynebacterium, Enterococcus, and Neisseria genera to be enriched in early stage tumours (p < 0.05, FDR > 0.1). This situation could be caused by a different T grouping strategy in our work and the absence of T4 tumours in the study of Pu et al. and by much larger sample side in our study (N = 186 compared to N = 25).
Early detection of local and distant metastases remains one of the most important tasks in cancer management. We con rmed a previously published increase of Akkermansia and Porphyromonas in stool of patients with local metastases (p < 0.05, FDR > 0.1) (82). Moreover, in our work, stool abundance of the uncultured genus from Erysipelotrichaceae family was directly associated with distant cancer metastases (FDR < 0.1), whereas in the study of Han et al. a close taxonomical relative of this genus, Erysipelotrichaceae incertae sedis, was linked with local metastasis presence. Family Erysipelotrichaceae is associated with the in ammation-related intestinal disorders, including in ammatory bowel diseases and Crohn's disease and has immunogenic properties [95]. In addition, abundance of this bacteria in stool was shown to be positively correlated with the blood level of TNF-in human immunode ciency virus infected individuals [96] and the TNF-level itself is directly linked with rate of metastasis occurrence in colon cancer [97]. Based on this, we can speculate that Erysipelotrichaceae-associated in ammation in the gut of CRC patients could contribute to cancer metastasis rate.
Also, it is interesting to note that metastases occurrence is mainly associated with shifts in stool microbiome composition. On one side, it raises the possibility of potential microbiome-based non-invasive metastasis diagnostics in colorectal cancer or monitoring the patients at risk. On the other, it is a question still under discussion whether these changes are speci c, or this alteration of the stool microbial community re ect the changes in overall health status in the presence of a metastasis and the cancer progression itself. As an example, it is known that non-colonic malignancies, for example breast cancer and lung cancer, are also accompanied by the shifts in gut microbiome [98][99][100].
While summarising the results, a common list of bacteria associated with tumour progression measured according to different criteria (histological grade, TNM stage, and T, M, N separately) was identi ed. Five bacterial genera including Peptoclostridium, Fusobacterium, Campylobacter, Streptococcus, and Akkermansia were linked with three and more progression criteria at least for one specimen type, among them .
The screening potential of individual bacteria was assessed by pairwise analysis of incidence of all genera across sample types. The relevance of the observed differences is context dependent. On-tumour microbes with signi cant clinical associations and with no difference in incidence across samples types are perfect candidates for stool-based screening studies or stool-based prognostic and predictive classi ers. In our study these were 13 tumour genera, some of which previously associated with CRC like Lachnoclostridium, Streptococcus, Bacteroides, Escherichia-Shigella, Coprobacter, Slackia and Sutterella. Interestingly, Fusobacterium, the most studied bacterium in the context of CRC, the quanti cation of which was previously suggested for stool-based screening for advanced carcinoma [101] is only present in 63% of stool samples of patients with CRC. At the same time, in more than 31% patients with Fusobacterium present on tumour mucosa we failed to detect it in the stool. In fact, most of the tumour-speci c genera if present on tumour mucosa, were not identi ed in stool of the same patients in more than 50% (e.g. Haemophilus, Campylobacter, Gemella, Parvimonas, Leptotrichia, Solobacterium, Howardella, Hungatella) or 80% (e.g. Selenomonas, Selenmonas 3, Eigenella, Aggregatibacter, Massilia, Neisseria) of cases. Given that these genera prefer the mucosal environment over stool, such associations are not entirely surprising. It needs to be emphasized, that although these genera are often signi cantly associated with clinical parameters, their stool screening potential remains poor and these genera are better candidates for colonoscopy biopsy sample screening.
The three tumour-mucosa based microbial subtypes we derived on patterns of similarity of abundance of the tumour genera were associated with tumour grade, location and stage. An interesting observation was that the tumour microbial subtypes differ not only in the type of the tumour genera they host, but also on the count of potentially pathogenic microbiome correlated with high grade and stage. Of the 13 high grade or high stage related genera, TS1 tumours had a median of 9 (69%), TS2 of 7 (54%) and TS3 of 5 (38%) differing this in what we could call "microbial pathogens burden".
This subtyping could re ect differences in tumour biological properties linked with cancer progression: malignant tumours with active growth, cell and tissue atypia because of disruption of mucus layer and dysregulation of local immunity provides more comfortable conditions to aggressive microbial consortia expansion and unconventional (oral) species homing. Moreover, given the fact of bacteria-supported carcinogenesis, proofed in animal models [10,102], the pathogenic bacteria growth leads to additional dedifferentiation of tumour cells forming the pathogenetic loop. It is to be emphasized, that the proposed subtypes based on tumour genera are not comprehensive from the point of view of all the genera present on the tumour and other genera can still have important effects. As we discussed above, the presence of some stool genera on tumour mucosa of left-sided tumours was associated with low-grade.
It remains to be further investigated whether the subtypes could improve the prediction of patients' survival and prognosis. We can speculate that high microbial pathogens burden could be worsening not only the tumour progression, but also potentially the patient's condition after the surgical resection and during and after the chemotherapy treatment, since these genera are present also in visually-normal mucosa.
One possible complication, metachronous colorectal tumour, could potentially be associated with visually-normal mucosa microbiome alteration. Metachronous cancer, i.e. tumour occurring 6 months after the resection of primary tumour, is traditionally linked with hereditary conditions as familial adenomatous polyposis, but it also detected in patients above age 65 with sporadic cancer [103]. Given the fact that tumour-related genera reside also on visuallynormal mucosa, they could initiate CRC tissue dysplastic changes and malignisation. There is limited evidence of linkage between mucosal microbiota and metachronous adenomas growing demonstrated by Liu et al [104]. On the other side, it is shown that the microbiome could interact and metabolise chemotherapeutic medicine which lead to modulation of its activity and toxicity [105]. In the light of the above, modi cation of gut microbiome after colorectal cancer surgical removal might be considered as an additional step of treatment to prevent tumour recurrence and modulate chemotherapy effectiveness and toxicity.

Conclusion
In our study, by analysis of 505 samples from N = 186 patients, we extended the current characterization of colorectal cancer microbiome in several ways. Thanks to the large sample size, we identi ed bacterial genera that were not previously associated with CRC tumour mucosa, clinical variables or with colorectal carcinoma at all. These genera should be studied more in detail to describe their mechanism of interaction with the disease.
By focusing on the microbial community analysis, in contrast to classical microbiome-centered approaches we were able to identify co-occurring species and 3 major tumour-microbial subtypes that correlate with clinical variables, mainly grade, location and TNM staging. The subtypes also differ in what we describe as microbial pathogens burdenthe number of pathogenic species correlated with increased grade and stage present on tumour mucosa, although the concept can be de ned with respect to all three environments (tumour mucosa, visually-normal mucosa and stool).
It is well known that the gut microbial composition changes with dietary patterns and lifestyle, that could be regionbased [106]. More studies of similar sample size or larger, from different geographical locations are needed to derive robust and generalizable patterns.
We make the full data available including clinical variables as a rst step towards building a data corpus that could support such investigations. The technology chosen was high throughput, tting the purpose of microbial communitybased analysis. We did perform the sequence matching for the identi ed ASVs against the SILVA database, however, being aware of the limitations, we provide these results solely as supplementary information without discussing them here in detail.
Having sampled the microbiome at three different complementary sites allowed the study of several environments leading to the de nition of novel microbial categories with multiple implications. Our study shows that the associations with clinical variables found for the tumour mucosal or adjacent visually-normal mucosa microbiome are rarely preserved in the microbial composition of stool and vice versa. While tumour histological grade, stage and location are re ected in corresponding mucosal microbiome, the presence of lymph node or distant metastases in uences mainly stool microbiome. It seems that the mucosa and stool microbiome are complementary with respect to modulation of their effects on disease progression. Tumour-mucosa biopsies from colonoscopy might need to be coupled with stool sampling for e cient screening or diagnostic purposes.

Declarations
Ethics approval and consent to participate The study was approved by the ethical committee of Masaryk Memorial Cancer Institute. All patients gave written informed consent in accordance with the Declaration of Helsinki prior to participating in the study.

Consent for publication
Not applicable.
Availability of data and material Sequencing data were uploaded to the European Nucleotide Archive under accession number PRJEB35990.

Competing interests
The authors declare that they have no competing interests.

Funding
The work was supported by the Czech Ministry of Health (AZV 16-31966A). Authors

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.