Assembly of novel microbial genomes from gut metagenomes of rhesus macaque (Macaca mulatta)

ABSTRACT Rhesus macaque (RM, Macaca mulatta), as an important model animal, commonly suffers from chronic diarrheal disease, challenging the breeding of RMs. Gut microbiomes play key roles in maintaining intestinal health of RMs. However, it is still unclear about more features of gut microbiome as responsible for intestinal health of RMs. In this study, we performed de novo assembly of metagenome-assembled genomes (MAGs) based on fecal metagenomes from chronic diarrheal RMs and asymptomatic individuals. In total of 731 non-redundant MAGs with at least 80% completeness were reconstructed in this study. More than 97% MAGs were novel genomes compared with more than 250,000 reference genomes. MAGs of Campylobacter and Helicobacteraceae from RM guts mainly carried flagella-associated virulence genes and chemotaxis-associated virulence genes, which might mediate motility and adhesion of bacteria. Comparing to MAGs of Campylobacter from humans, distributions and functions of these MAGs of Campylobacter from RMs exhibited significant differences. Most members of Bacteroidota, Spirochaetota, Helicobacteraceae, Lactobacillaceae and Anaerovibrio significantly decreased in guts of chronic diarrhea RMs. More than 92% MAGs in this study were not contained in 2,985 MAGs previously reported from other 22 non-human primates (NHPs), expanding the microbial diversity in guts of NHPs. The distributions and functions of gut microbiome were prominently influenced by host phylogeny of NHPs. Our results could help to more clearly understand about the diversity and function of RMs gut microbiome.


Introduction
Due to many populations and the similarity of genetic evolution to human, rhesus macaque (RM, Macaca mulatta) generally serves as an important model animal to reveal physiology of human, research clinical disease and develop new drugs. [1][2][3] Therefore, RMs with huge needs are widely raised in captivity as model animal. However, many captive RMs usually suffer from chronic diarrhea with high morbidity and mortality to pose serious challenges in breeding and management of RMs. 4 Chronic diarrhea of RMs is prominently manifested by long-term and recurrent diarrhea. 5 Previous studies indicated that chronic diarrhea disease of RMs was accompanied by significant changes of gut microbial composition and metabolic function. 6,7 The significant decrease of carbohydrate-active enzyme (CAZyme) genes and significant increase of antibiotic resistance genes (ARGs) were found in guts of chronic diarrhea RMs comparing with asymptomatic RMs. 7 However, these changes of microbial abundance and diversity might not mean that emergence and continuation of diarrhea disease are caused by the increased/decreased microbiome. In addition, taxonomic annotation of gut microbiome based on metagenomic short reads might lose many important microbial taxa due to databases or algorithms. 6 The abundance changes of different genes are also difficultly located to concrete microbial species. It is still unclear about more features of gut microbiome as responsible for chronic diarrhea of RMs. Therefore, functional exploration of gut microbiome at genome level is necessary to better understand the important roles of gut microbiome in RMs.
Although some isolates were obtained from guts of RMs in several previous studies, 7,8 the number of these isolates was negligible in guts of RMs with innumerable microorganisms. For now, bacterial genomes were mainly obtained by conventional culture in laboratory. However, the culture of gut microorganisms in artificial culture medium is still difficult, although many microorganisms have been indiscriminately cultured and were obtained draft genomes. [9][10][11][12] Metagenomic binning is an effective method for reconstruction of genomes from rare community members. The metagenome-assembled genomes (MAGs), the microbial draft genomes assembled from metagenomic reads, have been largely reconstructed from gastrointestinal tracts of animals, such as cow rumen, 13 chicken cecum, 14 murine gut. 15 These MAGs from gastrointestinal tracts of animals vastly promoted the understanding about diversity and function of microbiome. In addition, metagenomic assembly could obtain a mass of novel draft genomes. 16 These novel MAGs have expanded the life tree to further enlarge the understanding of unknown microorganisms' characteristics and functions, in particular non-culturable species. 17 Manara et al. 18 reconstructed 2,985 MAGs from 22 non-human primate (NHP) species. These MAGs immensely expanded the understanding of diversity and function for NHP gut microbiome, but didn't involve in MAGs from RMs. The gut microbial composition and diversity, in particular specific microbial taxa, could be partly influenced by host phylogeny. 19,20 The host specificity of NHPs could reportedly shape the different gut microbial composition. [21][22][23] Heritable difference of gut microbiome at genome level might stochastically evolve in adapting of diverse gut environment. 24,25 These differences of gut microbiome might be presented not only by gut microbial composition but also by genome variation. Therefore, these gut microbial genomes from NHPs could enhance the understanding of hostmicrobiome coevolution.
In this study, we sought to reconstruct MAGs of RMs (including asymptomatic RMs and chronic diarrhea RMs) based on fecal metagenomic reads to explore the diversity and function of gut microbiome at genome level. We finally obtained 731 non-redundant MAGs from fecal metagenomes of RMs, including many novel strains and novel species comparing with reference genomes, which further expanded genomes sets of NHPs' gut microbiome. Therefore, our results might help to more clearly understand about the gut microbiome of RMs, in particular chronic diarrhea individuals, more importantly, and provide new insights in prevention for diarrhea of RMs.

Reconstruction of 731 MAGs from fecal metagenomes of RMs
Metagenomic binning of 29 fecal metagenomes of RMs (including 11 chronic diarrhea RMs and 18 asymptomatic RMs) from our previous study 7 were performed. After preliminarily completed metagenomic binning, we obtained 7,082 raw bins from fecal metagenomes of RMs. After removing redundant bins with ANI>99% between genomes, we further obtained 731 non-redundant MAGs with estimated completeness>80% and estimated con-tamination<10%, 93 MAGs of which were from coassembly binning and 638 MAGs from singlesample binning. Of these MAGs, 354 MAGs had completeness>90% and contamination<5% (defined as high-quality draft genomes 26 ), 145 MAGs had completeness>95% and contamina-tion<5%, and 24 MAGs had completeness>97% and none contamination (Supplementary Figure  S1a and Supplementary Table S1).

The functional characterization of 731 MAGs
To further compare the functional features, we predicted and annotated the genes of these MAGs. Contigs' numbers in these 731 MAGs ranged from 3 to 375 (Supplementary Figure S2a and Supplementary Table S1). Total length of MAGs' contigs ranged from 745,063bp to 4,276,366bp with average length of contigs from 5219.6bp to 498,8330bp. Numbers of predicted genes in MAGs ranged from 720 to 4,080, 88.1% of which ranged from 1,300bp to 2,700bp. rRNA genes of most MAGs cannot be identified, while 5S rRNA genes of 393 MAGs, 16S rRNA genes of 115 MAGs and 23S rRNA genes of 60 MAGs were identified from genome sequences. 47 MAGs included full-length 16S rRNA genes, while 68 MAGs only carried fragment of 16S rRNA genes.
The virulence-associated genes in genome of each MAG were predicted based on VFDB database. Total 289 virulence genes were identified in MAGs. Notably, the MAGs of Campylobacteraceae, Helicobacteraceae, Escherichia, Haemophilus and Haemophilus_A enriched many virulence-associated genes ( Figure 2b).

The genomes of Campylobacter and Helicobacteraceae
Previous studies reported that Campylobacter infection, in particular C. coli and C. jejuni, was associated with diarrhea of RMs. 8,27,28 In this study, we reconstructed 11 MAGs of Campylobacter not assigned to any species level (classified to 2 SGBs, 4 from SGB_80 and 7 from SGB_81). Four MAGs (all belonging to SGB_80) were significantly more abundant in guts of asymptomatic RMs than those in guts of chronic diarrhea RMs (p < 0.05), while other 7 MAGs (all belonging to SGB_81) were more abundant in guts of chronic diarrhea RMs, 3 MAGs of which were significantly more abundant (p < 0.05) (Supplementary Table S5).
Manara et al. 18 also reconstructed 3 MAGs of Campylobacter from other NHPs. To identify genetic relationship of the 14 MAGs of Campylobacter, we downloaded 37 reference genomes of Campylobacter from NCBI, and constructed a phylogenetic tree of these genomes (Supplementary Figure S3a). The 11 MAGs of RM and one MAGs of Macaca fascicularis were clustered to 2 clades corresponding with taxonomic a b labels of SGBs. And they were more closely related to C. hyointestinalis, C. fetus and C. iguaniorum, but were distant to C. coli and C. jejuni. We annotated virulence genes in 14 MAGs of Campylobacter from RMs and other NHPs, and reference genomes of C. coli, C. jejuni, C. fetus, C. hyointestinalis, C. iguaniorum and C. lanienae. A total of 133 virulence genes were identified in these genomes (Figure 3a). C. jejuni, C. coli and one MAG of other NHPs harbored most virulence genes, but majority of virulence genes were not detected in other MAGs and other reference genomes. Most MAGs carried flaA or flaB gene, as the major flagellin genes. The virulence genes of RM's MAGs were enriched in flagella-associated genes, such as fliI, fliG, fliP, fliN and fliM. In addition, chemotaxis-associated genes, such as cheV3, cheA and cheY, were also detected in these MAGs.
To further compare differences of Campylobacter genomes between NHPs and humans, 301 Campylobacter MAGs from humans were obtained from previous study. 29 These human MAGs were mainly from oral cavities and guts of humans with westernized diet, whilst only 9 MAGs were from oral cavities and guts of humans with non-westernized diet. We constructed a phylogenetic tree of these Campylobacter genomes ( Figure 3b). All MAGs from RMs and one MAGs from Macaca fascicularis were clustered in an independent clade, and 2 other NHP MAGs were close to them. They were significantly different from MAGs of humans in phylogenetic tree. We also functionally annotated genes of these MAGs based on UniRef50 database. NDMS plot exhibited that these MAGs at functional level were obviously separated based on sampling sites, mainly oral cavity and gut (Figure 3c). We further compared functional differences of all MAGs from guts. NDMS plot exhibited that MAGs from RMs at functional level were significantly separated with MAGs from humans ( Figure 3d). Helicobacter may be an important factor in development of chronic diarrhea of RMs. 28,30 However, Helicobacter was commonly also prevalent in guts of asymptomatic RMs and significantly decreased in guts of diarrheal RMs. 6,27 In this study, we retrieved 15 MAGs of Helicobacteraceae, 6 of which belonged to Helicobacter_B, 1 to Helicobacter_C and 8 not assigned to genus level. All MAGs were recovered from fecal metagenomes of asymptomatic RMs. These MAG of Helicobacteraceae were scarcely detected in fecal metagenomes of chronic diarrhea RMs (only 1 MAG was detected), but were prevalent in fecal metagenomes of asymptomatic RMs.
In addition, 20 MAGs of Helicobacteraceae were also retrieved from metagenomes of other NHPs, 18 including 12 of which belonged to Helicobacter_B, 2 to Helicobacter_C and 6 not assigned to genus level. Of these 20 MAGs, 15 MAGs of Helicobacteraceae were recovered from fecal metagenomes of Macaca fascicularis. We downloaded 44 reference genomes of Helicobacter from NCBI, and constructed a phylogenetic tree of them and MAGs of Helicobacteraceae from RMs and other NHPs (Supplementary Figure S3b). These MAGs were mainly placed in 2 clusters. All 18 MAGs assigned to Helicobacter_B were most closely related to Helicobacter macacae. 6 MAGs from RMs and one MAG from Macaca fascicularis were clustered with reference genome of Helicobacter macacae. Notably, 16 of these 18 MAGs were retrieved from RM and Macaca fascicularis. Most of MAGs not assigned to genus level were closely related to Helicobacter winghamensis and Helicobacter pullorum. Only five MAGs of Helicobacteraceae were obtained from metagenomes of humans, 29 as could suggest that the species of Helicobacteraceae seemed to be rare in human guts comparing with NHPs' guts.
We also annotated the virulence genes in all MAGs of Helicobacteraceae (Figure 3e). flaA or flaB genes were detected in most MAGs. Similarly, virulence genes of these MAGs were enriched in flagella-associated genes, such as fliS, flgG, flgC, flaB, fliG, fliP and fliQ, and chemotaxisassociated genes, such as cheY, cheA and cheV1.

Abundances of MAGs in fecal metagenomes of RMs
To compare compositional differences of gut microbiome between asymptomatic RMs and chronic diarrhea RMs, we quantified and compared abundances of the 731 MAGs in fecal metagenomes of RMs. The α-diversity of these MAGs including Shannon index and Simpson indexes were significantly higher in guts of asymptomatic RMs than chronic diarrhea RMs (Figure 4a). Based on Bray-Curtis distance of abundances of MAGs in fecal metagenomes of RMs, PCoA plot exhibited that samples of chronic diarrhea RMs and asymptomatic RMs were significantly divided (Figure 4b). The abundances of all MAGs in the fecal metagenomes of RMs exhibited that samples of chronic diarrhea RMs and asymptomatic RMs were clearly divided in heatmap (Figure 4c). The abundances of 425 MAGs in fecal metagenomes of chronic diarrhea RMs and asymptomatic RMs exhibited significant different (Wilcoxon rank sum test, p < 0.05) (Supplementary Table S5). Most members of the phylum Bacteroidota (such as these families UBA932, Paludibacteraceae, Tannerellaceae and P3, as well as some members of Bacteroidaceae), most members of the family Helicobacteraceae, most members of the family Lactobacillaceae (such as Lactobacillus, Ligilactobacillus and Limosilactobacillus), most members of the family Selenomonadaceae (Anaerovibrio) and most members of the phylum Spirochaetota (such as Brachyspira and Treponema_D) were significantly more abundant in guts of asymptomatic RMs than those in guts of chronic diarrhea RMs. Most obviously, several taxa of these MAGs, such as Helicobacteraceae, Anaerovibrio of Selenomonadaceae and Brachyspira of Brachyspiraceae, were hardly detected in fecal metagenomes of chronic diarrhea RMs, but were prevalent in fecal metagenomes of asymptomatic RMs.
We re-identified the taxonomic labels of 2,985 MAGs of other NHPs using GTDB-Tk. Of these genera of other NHP MAGs, the genus Prevotella had most MAGs (n = 301). In addition, 36  studies 6,31-34 revealed the gut microbial composition of RMs, but it is necessary to develop a deeper understanding of RMs' gut microbiome at genomes level, especially chronic diarrhea individuals. The reconstruction of microbial genomes based on metagenomic sequences greatly expands the number and diversity of microbial genomes, in particular uncultivable strains. 26 Although Manara et al. 18 retrieved 2,985 MAGs from 22 NHPs, none MAG from RMs was obtained due to lack of RMs' metagenomes. Therefore, in this study, we reconstructed total 731 MAGs from 29 RMs' fecal metagenomes and further expanded the number and diversity of microbial genomes from NHPs. About 80% MAGs in the study belonged to these phyla Firmicutes_A, Bacteroidota, Firmicutes and Firmicutes_C. These phyla Firmicutes_A, Firmicutes and Firmicutes_C belonged to the phylum Firmicutes in NCBI taxonomy. The taxonomic distribution of these reconstructed MAGs accorded with the abundance of microbial composition in RMs' gut according to previous studies based on metagenomic sequencing 6,7 or 16S rRNA gene amplicon sequencing. [31][32][33][34] It also indicated that genomes of high-abundance microbes had a higher recovery ratio based on metagenomes assembly. For low-abundance microbes, coassembly is an effective means to reconstruct lowabundance MAGs. 35 The co-assembly improved the recovery ratio of low-abundance MAGs in this study, and increased reconstructed potential of novel genomes.
The GTDB contains about 250,000 bacterial and archaeal genomes and provides a standardized genome-based taxonomy, 36 which efficiently helps to identify potential novel genomes and taxa. Comparing with reference genomes of GTDB, more than 95% MAGs were non-redundant genomes and more than 1/3 MAGs belonged to potential novel species in our study. Notably, about half of non-redundant genomes belonged to the class Clostridia and Bacteroidia, which played key roles in degrading complex carbohydrates. 37 For instance, several taxa of Clostridia, such as Lachnospiraceae and Ruminococcaceae, were reportedly capable of degrading complex carbohydrates to produce short-chain fatty acids (SCFAs), maintaining the colonic health of host. 38,39 The annotation of CAZymes also indicted more abundant GHs in these taxa, which could imply the great potential in converting polysaccharide to SCFAs. 40 In addition, previous study demonstrated that ARGs were enriched in guts of chronic diarrhea RMs. 7 The high-abundance ARGs might be closely associated with frequent usage of antibiotics for treatment of diarrhea. In this study, we found that members of Escherichia in guts of RMs were main carriers of ARGs. Therefore, monitoring and control of ARGs might need focus on members of Escherichia, especially Escherichia coli, regarded as an important indicator of pathogens. These MAGs provided new understanding for specific functions of gut microbiome of RMs.
Diarrhea disease could greatly alter the composition and decrease diversity of gut microbiome. 41 In this study, based on MAGs abundance in guts, we also demonstrated that chronic diarrhea significantly reduced the diversity of gut microbiome. Several taxa of MAGs showed significant difference between chronic diarrhea RMs and asymptomatic RMs. Most taxa were more abundant in guts of asymptomatic RMs than chronic diarrhea RMs. Of these taxa with significant difference, Bacteroidota mainly consisted of the members that are adept at polysaccharide degradation, due to carrying large numbers of CAZymes genes in genomes. 42,43 In these MAGs of Bacteroidota in gut of asymptomatic RMs might be favorable for degradation of the indigestible polysaccharide. Comparing with chronic diarrhea RMs, these asymptomatic RMs had more lactic acid bacteria, such as Lactobacillus johnsonii, Lactobacillus amylovorus and Ligilactobacillus animalis. Lactic acid bacteria play probiotic roles in gut of host by means of competition of adhesion in epithelial cells with pathogens, producing antimicrobial peptides and SCFAs, and stimulating host defenses. 44,45 Several species of Lactobacillus have been demonstrated probiotic functions in alleviating diarrhea and decreasing incidence of diarrhea. 46,47 It suggested that gut of RMs was an important resource pool of probiotics. In these taxa with significant difference, Helicobacteraceae, Brachyspira and Anaerovibrio were prevalent in guts of asymptomatic RMs, but were hardly detected in guts of chronic diarrhea RMs. The Anaerovibrio was considered as the important species assisting in lipid hydrolysis to fatty acids. 48 Several species of Brachyspira were prevalent in NHPs and normally didn't induce inflammatory response. 49,50 These bacteria might be prevalent in guts of asymptomatic RMs, but the decrease of these taxa could be caused by antibiotic usage of chronic diarrhea RMs.
The species of Campylobacter, in particular C. jejuni and C. coli, were commonly considered as critical pathogens in leading intestinal disease of human. 51,52 Although C. coli and C. jejuni were also reportedly associated with diarrhea of RMs, 8,27,28 all MAGs of Campylobacter from RMs were obviously distant to C. coli and C. jejuni in phylogenetic tree. MAGs of RMs were also different from MAGs of human. Most MAGs of Campylobacter from RMs carried flaA and flaB genes encoding the major flagellin FlaA and FlaB as important structures of flagella, which plays an important role in motility of bacteria. 53 In addition, the chemotaxis-associated genes, such as cheA, cheD, cheV and cheY, mediated the motile sense and directional move. 53 The flagella of Campylobacter contributed the motility but also invasion. 54 fliQ, fliG and fliP encoded type 3 secretion system (T3SS), medicating secretion of virulence proteins. 55 However, the key genes, such as ciaB and ciaC, encoding Campylobacter invasion antigens (Cia) was absent in MAGs of RMs. Comparing with reference genomes of C. coli and C. jejuni, adhesion-associated genes, such as cadF and capA, invasion associated genes, such as ciaB and ciaC, and toxin-associated genes, such as cdtA, cdtB and cdtC, were absent in MAGs of RMs. It might suggest the weaker virulence in these MAGs of RMs comparing with C. coli and C. jejuni. However, due to flagella-associated genes and chemotaxis-associated genes, these Campylobacter spp. not belonging to C. coli and C. jejuni still might cause diarrhea of RMs and should be paid attention to.
Helicobacter cinaedi was reportedly common in guts of captive RMs. 56 However, we didn't obtain the genomes of Helicobacter cinaedi in this study. These MAGs of Helicobacteraceae were also distant to Helicobacter cinaedi. MAGs of Helicobacteraceae also carried the flaA or flaB genes. Similarly, flagella-associated genes and chemotaxis-associated genes were found in MAGs of Helicobacteraceae from RMs and other NHPs. This also suggested that these species of Helicobacteraceae possessed the important characters of motility and adhesion. 57 We also found most MAGs of Helicobacteraceae, in particular Helicobacter macacae, were found in guts of RMs and Macaca fascicularis. Therefore, Helicobacter macacae might be more prevalent in guts of Macaca monkey. The species of Helicobacteraceae were only found in guts of asymptomatic RMs and absent in guts of chronic diarrhea RMs. Previous study also found that Helicobacter macacae was lost in guts of chronic diarrhea RMs. 27 In general, Campylobacter and Helicobacter were typically colonized in gut mucosa. The pathological gut of chronic diarrhea RMs might be unfavorable for adhesion of them. Alternatively, antibiotic treatment of chronic diarrhea RMs might reduce abundances of Campylobacter and Helicobacter. Unfavorable adhesion and antibiotic usage might be main causes that Campylobacter and Helicobacter were decreased in guts of chronic diarrhea RMs.
Comparing with 2,985 MAGs of other NHPs, 18 more than 90% MAGs were non-redundant genomes and about half genomes represented different species. These unique genomes suggested that RMs' gut harbored many unique microbes comparing with guts of other NHPs. Of these unique species, Lachnospiraceae and Ruminococcaceae play key role in degrading carbohydrate to produce short-chain fatty acids (SCFAs). 38,39 Interestingly, most MAGs belonging to same strains or same species were found in MAGs of Macaca fascicularis. It could imply that host phylogeny plays an important role in shaping gut microbial composition. 22 We found that Prevotella, belonging to Bacteroidetes, were dominant taxa in gut of NHPs. Prevotella was associated with plant-rich diets of host. 58 The prevalence of Prevotella in guts of most NHPs might be associated with highfiber diets of NHPs. We found that distribution and function of Prevotella were strong correlated with host phylogeny. The differences of Prevotella distribution and function were significant exhibited between higher clades instead of species of NHPs. It might suggest that adaptive evolution of Prevotella in guts of NHPs might earlier appear than existing phylogenetic evolution of NHPs.
In summary, we performed de novo assembly of 731 MAGs including many novel genomes, covering the shortage of gut microbial genomes from RMs. We found that members of Campylobacter and Helicobacteraceae might play potential roles in causing intestinal diarrhea of RMs due to flagella-associated virulence genes mediating motility and adhesion of bacteria. Unique gut microbial genomes in RMs suggested obvious difference in gut microbiome between RMs and other NHPs, and also further expanded the diversity of gut microbial genomes in NHPs. We revealed the distributions and functions of gut microbes in NHPs were prominently influenced by host phylogeny of NHPs. Taken together, our data strengthened the functional understanding of RM gut microbiome at genome level, in particular chronic diarrhea individuals, more importantly, and provide new insights in prevention for diarrhea of RMs. The data also strengthened the functional understanding of NHP gut microbiome at genome level and indicated coevolution between host and gut microbiome in NHPs.

Reconstruction of metagenome-assembled genomes
Metagenomic binning of 29 fecal metagenomes of RMs from our previous study 7 were performed. The study was approved by the Ethics Committee of College of Life Sciences, Sichuan University (No. 20210308001). The 29 fecal metagenomes of RMs were from 18 asymptomatic RMs and 11 chronic diarrhea RMs (Supplementary Table S6). Raw metagenomic reads were trimmed and filtered to remove adapters, low-quality reads and hosts' sequences using Trimmomatic 59 and Bowtie2 60 as KneadData pipeline (https://github.com/biobakery/knead data). The assembly of single sample were performed using MEGAHIT 61 with the optionmin-contig-len 300. To retrieve low-abundance bins as much as possible, co-assembly of diarrheal and asymptomatic samples were separately performed using MEGAHIT with the optioncontinue -kmin-1pass -k-list 27,37,47,57,67,77,87 -min-contig-len 1000. The co-assembly of asymptomatic samples was performed in two batches with 9 samples per batch.
Bowtie2 was used to map metagenomic reads back to these single-sample assemblies and coassemblies. SAMtools 62 converted SAM file to BAM file. Metagenomic binning of singlesample assembly and co-assembly was performed using MetaBAT2 63 with the option -minContigDepth 2 -minContigLength 2000. Raw bins were dereplicated at default threshold of 99% average nucleotide identity (ANI) using dRep 64 with the option dereplicate_wf -comp 80 -con 10 -str 100 -strW 0. Bins were also dereplicated using dRep at threshold of 95% ANI to obtain SGBs. The ANI between genomes was calculated by FastANI. 65 The completeness and contamination of all bins were evaluated by CheckM. 66

Prediction of taxonomy label of MAGs and quantification of MAGs in metagenomes
The putative taxonomy label of MAGs was predicted using GTDB-Tk 67 with the 'classify_wf ' function. GTDB-Tk annotation is based on Genome Taxonomy Database (GTDB) taxonomy. 36 We also obtained NCBI taxonomy of MAGs using script "gtdb_to_ncbi_majority_vote.py" (available from https://github.com/ Ecogenomics/GTDBTk/). The putative taxonomy label of MAGs based on GTDB taxonomy was determined by the lowest taxonomy level of classification result of GTDB-Tk. These MAGs were also analyzed using MAGpy, 68 which uses a Snakemake pipeline based on freely available bioinformatics softwares, including CheckM, Prodigal, 69 PfamScan, 70 Sourmash, 71 Phylo PhlAn 72 and DIAMOND BLASTP. 73 The phylogenetic tree of MAGs was constructed by PhyloPhlAn based on more than 400 most conserved proteins from microbial genomes, and was drawn by Figtree (https://github.com/ram baut/figtree) and GraPhlAn. 74 The abundance of MAGs in metagenome was quantified using Salmon 75 with "quant_bins" module of MetaWRAP 76 to estimate the abundance of each scaffold in each sample, and then compute the average abundance of bin. Metagenomic reads were aligned using BWA MEM 77 and mapping ratios of reads were calculated using SAMtools.

The function annotation of MAGs
The genes of MAGs were predicted and translated to amino acid sequence by Prodigal. These nonredundant gene sets at proteins level were constructed using CD-HIT 78 with similarity of 99%, 95% and 90%. These non-redundant genes were aligned to UniProt TrEMBL, UniRef90 and UniRef50 database using DIAMOND, as well were aligned to eggNOG database 79 using eggNOG-Mapper. 80 CAZymes genes were identified using hmmscan 81 based on dbCAN database. 82 ARGs were identified using RGI 83 based on comprehensive antibiotic resistance database (CARD). 83 KEGG orthologys (KOs) were annotated using KofamKOALA. 84 The rRNA genes of MAGs were predicted using barrnap (https://github.com/tsee mann/barrnap). Virulence-associated genes were annotated using DIAMOND to align to virulence factor database (VFDB) 85 at thresholds of 70% similarity and e-value<1e-5. MAGs were functionally annotated based on Uniref50 using DIAMOND. The non-metric multidimensional scaling (NMDS) plots were drawn using the metaMDS function based on Jaccard distance.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by Sichuan Science and Technology Program (No. 2023NSFSC1935), the National Natural Science Foundation of China (No. 32070413) and Key R&D Program of Sichuan Province (No. 22ZDZX0011).

Authors' contributions
SY and ZF wrote the manuscript. SY, ZF, JiaweiL, XW and YL performed the bioinformatics analyses. BY and MH revised the manuscript. AZ and JingL designed and supervised the study. All authors read and approved the final manuscript.

Data availability statement
The data that support the findings of this study are openly available in CNGB Sequence Archive (CNSA) of China National GeneBank DataBase (CNGBdb) with accession number CNP0001810 at https://db.cngb.org/.

Ethics
This study was approved by the Ethics Committee of College of Life Sciences, Sichuan University (No. 20210308001).