Phylogenomic Analyses of Snodgrassella Isolates from Honeybees and Bumblebees Reveal Taxonomic and Functional Diversity

ABSTRACT Snodgrassella is a genus of Betaproteobacteria that lives in the gut of honeybees (Apis spp.) and bumblebees (Bombus spp). It is part of a conserved microbiome that is composed of a few core phylotypes and is essential for bee health and metabolism. Phylogenomic analyses using whole-genome sequences of 75 Snodgrassella strains from 4 species of honeybees and 14 species of bumblebees showed that these strains formed a monophyletic lineage within the Neisseriaceae family, that Snodgrassella isolates from Asian honeybees diverged early from the other species in their evolution, that isolates from honeybees and bumblebees were well separated, and that this genus consists of at least seven species. We propose to formally name two new Snodgrassella species that were isolated from bumblebees: i.e., Snodgrassella gandavensis sp. nov. and Snodgrassella communis sp. nov. Possible evolutionary scenarios for 107 species- or group-specific genes revealed very limited evidence for horizontal gene transfer. Functional analyses revealed the importance of small proteins, defense mechanisms, amino acid transport and metabolism, inorganic ion transport and metabolism and carbohydrate transport and metabolism among these 107 specific genes. IMPORTANCE The microbiome of honeybees (Apis spp.) and bumblebees (Bombus spp.) is highly conserved and represented by few phylotypes. This simplicity in taxon composition makes the bee’s microbiome an emergent model organism for the study of gut microbial communities. Since the description of the Snodgrassella genus, which was isolated from the gut of honeybees and bumblebees in 2013, a single species (i.e., Snodgrassella alvi), has been named. Here, we demonstrate that this genus is actually composed of at least seven species, two of which (Snodgrassella gandavensis sp. nov. and Snodgrassella communis sp. nov.) are formally described and named in the present publication. We also report the presence of 107 genes specific to Snodgrassella species, showing notably the importance of small proteins and defense mechanisms in this genus.


RESULTS AND DISCUSSION
Species delimitation within the Snodgrassella genus. We studied the diversity within the genus Snodgrassella using extensive phylogenomic and average nucleotide identity (ANI) analyses (Fig. 1). Highly conserved genes were selected to perform the phylogenomic analyses. Multiple events of HGT affecting Snodgrassella genomes have been reported (2,31,32), and such events may be damaging for inferring species phylogenies (36)(37)(38). To minimize interference of HGT, we used shared Neisseriaceae core genes only, by incorporating 35 non-Snodgrassella bacteria from this family into our data set. We further imposed a strict unicopy presence onto core genes in order to avoid artifacts linked to paralogous sequences, which also have been reported to be deleterious to phylogenomic studies (39,40). Finally, we enforced high geometric and functional indices (0.8 for both), resulting in the selection of genes with less than 20% gap insertions and 20% substitutions, respectively. Our phylogenomic data set was thus composed of 254 genes that were structurally and functionally conserved within the family Neisseriaceae. Our analyses of the Neisseriaceae phylogeny confirmed with 100% bootstrap support that the genus Snodgrassella is monophyletic within the family Neisseriaceae (see Fig. S1 in the supplemental material) (21,41). Its nearest-neighbor taxon was Populibacter corticis (40). Prior to the description of P. corticis in 2017 (41), the lineage formed by Stenoxybacter acetivorans and Snodgrassella spp. was suggested to represent a gut-specific clade within the Betaproteobacteria, based on the observation that also S. acetivorans was isolated from an insect (i.e., termite gut sample) (21). The description of P. corticis, an organism isolated from bark tissue of poplar canker, appeared to disrupt this image, yet, more should be known about the latter organism before the hypothesis of a gut-specific clade within the Betaproteobacteria is abandoned. In the rest of the present study, we used P. corticis CFCC 13594 T as an outgroup for studying Snodgrassella phylogeny.
Our amino acid-and nucleotide-based phylogenomic and leave-one-out analyses all revealed a clear separation between Snodgrassella isolates from A. mellifera and Bombus spp., with full support values in bootstrap and leave-one-out analyses ( Fig. S2 and S3). The 75 Snodgrassella strains formed three clades. Those isolated from Asian honeybees, referred to as Apis group1, formed a monophyletic group separated from the two other clades with full support in both bootstrap and leave-one-out analyses ( Fig. S2 and S3). The same three clades were reported by Powell et al. (42) using minD as a single phylogenetic marker gene. Yet, the latter study did not use an outgroup to root the obtained phylogeny. Using 254 core genes and P. corticis CFCC 13594 T as an outgroup, we demonstrated that Apis group1 is the first diverging group of the Snodgrassella phylogeny ( Fig. S2 and S3). The separation into three lineages was confirmed by removing P. corticis CFCC 13594 T from the phylogenomic analyses in order to check the presence of a longbranch attraction artifact (Fig. S4). The Snodgrassella phylogeny showed shorter branch lengths in Snodgrassella isolates from Apis mellifera (here referred to as Apis group2) than those in isolates from Bombus spp. (Fig. 2). The latter represented five lineages ( Fig. 2 and Table 1). The first, Bombus group1, was composed of four isolates from Bombus pensylvanicus (all collected in the United States) and was fully supported by the bootstrap and leaveone-out analyses ( Fig. S2 and S3). The second, Bombus group2, consisted of a single isolate from Bombus nevadensis (United States). The three remaining groups were fully supported in the bootstrap and leave-one-out analyses as well and were referred to as Bombus group3, which was composed of three isolates from Bombus appositus (United States), Bombus group4, which was composed of two isolates from Bombus pascuorum and Bombus lapidarius (Belgium), and Bombus group5, which comprised 17 isolates collected from 11 Bombus FIG 1 Graphical abstract. We used 9 newly sequenced Snodgrassella genomes and 67 (including the outgroup) public genome assemblies to reconstruct the phylogeny of the genus Snodgrassella with 254 core genes, all functionally and structurally conserved in the Neisseriaceae family. We used independent phylogenetic methods (bootstrapping, leave-one-out analysis, and taxon sampling) to assess the robustness of our phylogenomic trees. Combined with an average nucleotide identity analysis, our results indicated new species delimitations within this genus. We further investigated specific gene contents within the species groups and their functional importance.
Phylogenomic and Functional Analyses of Snodgrassella mSystems species in Belgium and the United States ( Fig. 2 and Table 1) and which included the isolates wkB12 and wkB29, reported by Kwong and Moran (21). This topology was confirmed with sparser taxon sampling, after removing 40 strains with dRep (43) and computing another large phylogenomic analysis (Fig. 3). The branching patterns of the seven groups were the same in all amino acid-and nucleotide-based trees inferred, except for Bombus group2. Bombus group2 appeared as a sister group of Bombus group3 and Bombus group4 in all analyses (with a bootstrap of ,70), except in the nucleotide-based phylogenomic analysis, in which Bombus group2 branched with Bombus group1 (Fig. S3).
To quantify the taxonomic divergence between the observed phylogenomic groups, we performed ANI analyses. The latter are generally used for species delimitation in bacterial taxonomy (44), with ANI values of about 95 to 96% corresponding to the species delineation threshold (45,46). The ANI analysis (Fig. 2, part ANI) revealed values within each of the phylogenomic groups (except Bombus group2, for which only a single genome was available) that were consistently above 95% ANI, while ANI values between genome sequences of different groups were consistently below 87%. Therefore, each of the seven phylogenomic groups corresponded to a distinct Snodgrassella species in modern bacterial taxonomy. These seven species also corresponded with the seven Snodgrassella species defined in the GTDB database (https://gtdb.ecogenomic.org/searches?s=al&q=Snodgrassella) (47) (Fig. 2, GTDB part). Additional Snodgrassella species likely exist as gut symbionts in stingless bees and Bombus species from as yet poorly examined continents, as suggested by their SSU rRNA phylogeny (10). The S. alvi strain wkB2 T clustered within Apis group2, and by taxonomic convention, the name S. alvi should therefore be restricted to organisms from Apis group2. All other phylogenomic groups detected in the present and in earlier studies represent novel Snodgrassella Host species were taken from biosample metadata in the NCBI portal, except for the 9 newly sequenced genomes, for which the metadata were taken from reference 15. (B) ANI heat map. ANI values were computed with fastANI (74). A triangular matrix was constructed according to pairwise distances. Colors associated with ANI values are given in the fastANI key. (C) GTDB hits. The blue squares on the right were determined using GTDBtk on the genomes of the associated strains (77). Asterisks indicate the strains retained by dRep after dereplication.

Phylogenomic and Functional Analyses of
Phylogenomic and Functional Analyses of Snodgrassella mSystems species. Below we propose to formally name Bombus group4 as Snodgrassella gandavensis, with LMG 30236 (=CECT 30450) as the type strain, and Bombus group5 as Snodgrassella communis, with LMG 28360 (=CECT 30451) as the type strain. The formal naming of the remaining phylogenomic groups (i.e., Apis group1, Bombus group1, Bombus group2, and Bombus group3) can be done upon characterization and public deposit of reference cultures to conform to standard practices in bacterial taxonomy (44). It has been suggested that the transmission mode of Snodgrassella between workers and larvae is mainly vertical within a colony (27,32,48). Although for most of the groups (i.e., Snodgrassella species) detected in the present study, only a limited number of isolates or genomes were available, our data showed that indeed several Snodgrassella species occur in more than one Bombus species, and therefore this rejects strict cospeciation. This result is in agreement with data reported by Powel et al. (42), who demonstrated that Bombus group5 Snodgrassella (i.e., S. communis) was detected in several Bombus species, and with the experimental demonstration of host jumps in Bombus species (2). The close phylogenetic relationship between Snodgrassella species associated with bumblebees and Snodgrassella species associated with the Western honeybee additionally supports the hypothesis of horizontal transfer of symbionts between bee clades. The origin of extant bumblebees is estimated to the Miocene, as based on fossil records and molecular phylogeny (49,50). The origin of this clade is associated with a global cooling of the Palearctic region (51). Ancestors of bumblebees were likely sharing host plants with the Western honeybee ancestor clade, which is also associated with a temperate climate, and not with the Apis cerana honeybee clade, which is associated with a subtropical to tropical warmer climate. Overall, the similarity of microbiota among corbiculate bees seems to be related to both phylogeny and sharing of common habitats/climates. Specific gene analysis. Snodgrassella has been coevolving with other bacteria in the gut of honeybees and bumblebees for 80 million years (9, 10, 21). Interestingly the age of the common ancestor of the clade of the hosts (i.e., corbiculate bees, which include both  Bombus group3 Apis group2 Apis -Bombus groups  (68) and Mantis (53). Numbers indicated in the pie charts correspond to absolute numbers of OGs identified in the respective Snodgrassella subgroups. Specific genes were computed for entire groups before dereplication.
Phylogenomic and Functional Analyses of Snodgrassella mSystems bumblebees and honeybees) is also estimated round ;85 million years (52). Different species and genotypes have evolved and may have developed different functions in their hosts. We investigated the presence of group-specific genes (i.e., genes that are exclusively present in one group [or species] and that are shared by all members of this group) and determined their functionalities. The rationale was that a conserved gene probably confers a benefit to members of that group. We detected 107 specific genes: 19 were specific for Apis group1, 12 for Apis group2 (i.e., S. alvi), 10 for Bombus group1, 12 for Bombus group3, 24 for Bombus group4 (S. gandavensis), and 1 for Bombus group5 (S. communis) (Fig. 3). No specific genes were found for Bombus group2. In a next step, genes shared by multiple groups or species were examined. Only two group combinations presented specific genes: Apis group1 and Apis group2 shared 4 genes that were absent in all others, and all Snodgrassella spp. together shared 25 genes (Fig. 3).
Of these 107 genes, 37 did not correspond to proteins known in COG or Mantis (53). The latter is the most recent and comprehensive tool for protein annotation and contains Pfam, eggnog, NPFM, TIGRfams, and Kofam family data for the three domains of life (53). These 37 proteins were relatively short (median, 53 amino acids [aa]; interquartile range [IQR], 46 aa) compared to the rest of the 107 genes (median, 197 aa; IQR, 202.25), with only six proteins above 100 aa ( Table 2). Small proteins in bacteria can have roles in transport and signal transduction, can act as chaperones, can be involved in stress responses or virulence (54,55), and can also be used as bacteriocins (56). Unknown genes are a recurrent issue in metagenomic studies, where they may represent up to 40% of the genes (57). Besides these genes without a match, 24 genes encoded proteins with unknown function but with hits in COG or Mantis, while six genes had a general predicted function only and could not be affiliated with a metabolic pathway. Together, 67 of 107 specific genes (62.6%) and their putative benefits remained unknown ( Table 2 and Fig. 3).
Eleven of the 40 identified specific genes represented defense mechanisms, which are important for gut colonization in bees (14) ( Table 2 and Fig. 3). The two most conserved defense mechanisms (4 genes each) corresponded to drug exporter and Rhs proteins ( Table 2). These two defense mechanisms generate similar benefits and can provide competitive advantages. Steele and Moran (34) recently demonstrated that Rhs proteins represent different toxins, which are injected in neighboring cells through a type VI secretion system (T6SS) (2,(32)(33)(34). The four Rhs proteins found in the present study were specific genes of Bombus group3, a gut symbiont thus far detected only in B. appositus. The presence of specific toxins in these three strains might explain the specificity for their host by T6SS mediated-competition. The last three defense mechanisms included proteins for glycosylation of phage DNA (Bombus group1), sensing of invasive viruses, and antibiotic production (Apis group1) ( Table 2). Apis group1 possessed a specific protein closely related to the 2-oxo-3-(phosphooxy)propyl 3-oxoalkanoate synthase, which triggers the production of the antibiotic virginiamycin in Streptomyces virginiae (58).
Among the remaining identified specific genes, 9 genes belonged to the metabolic category amino acid transport and metabolism, 4 genes belonged to inorganic ion transport and metabolism, and 4 genes belonged to carbohydrate transport and metabolism (Fig. 3). Because host gut environments are scarce in iron and amino acids, de novo biosynthetic pathways and iron importers have been reported to be essential for gut host colonization (59). Specific genes linked to carbohydrate metabolism might provide a benefit linked to a carbohydrate-rich diet (1), even if such genes are more commonly found in Gilliamella than Snodgrassella genomes (30). The 12 remaining identified specific genes represented different metabolic pathways with unclear function in, or benefit to, the host ( Table 2).
Horizontal gene transfer. Horizontal gene transfer is a process that plays a major role in bacterial evolution. Because of the spatial proximity of Snodgrassella and other gut bacteria during 80 million years of coevolution, HGT events may have impacted the specific gene complement. We investigated the potential presence of such events by using MetaCHIP (60). We included all bee gut symbiont genomes available in RefSeq (561 genomes: i.e., 75 of Snodgrassella plus 486 of representatives of other bee gut symbionts) to maximize the odds of HGT detection and detected only 57 events of HGT in which Snodgrassella spp. were a donor or recipient (Fig. S5 and Table S3). This is less than the 87 events reported by Kwong et al. (2).  MetaCHIP is designed to infer HGT from genomes assembled from microbial communities (60) and detects potential HGT events using a BLAST best-hit approach and then validates these hits by performing a duplication-transfer-loss (DTL) reconciliation between a species tree and an individual gene tree. The DTL filter used by MetaCHIP is more stringent than the BLAST E value threshold of 1e250 used by Kwong et al. (2), which may explain this difference. None of the 107 specific genes described above was present among the 57 HGT-affected genes detected by MetaCHIP using BLASTP and a filter on 98% identity (Table S3). We investigated the absence of HGT in the group of 107 specific genes further by enriching the comparison with orthologous sequences taken from the 561 gut symbiont genomes used above and with orthologous sequences from the 247 genomes unrelated to the bee gut ecosystem, thereby also covering bacterial and archaeal diversity (Fig. 4). This allowed us to analyze the taxonomic diversity of the specific genes and revealed that 68 genes were unique to Snodgrassella, while 39 genes were found in other bacteria too (27 genes in multiple other bacteria and 12 genes in only one other bacterium) (Fig. 4). The evolutionary history of the 68 genes unique to Snodgrassella is explained more easily by duplication in Snodgrassella than by acquisition by HGT or gene loss in all other bacteria. In contrast, the 27 specific genes found in other bee gut symbionts or in other bacteria unrelated to the bee gut microbiome suggest that gene loss in other Neisseriaceae was the most plausible evolutionary path of the genes. We computed individual gene trees for these 27 OGs (available at https://github.com/Lcornet/SNOD) and compared them manually to species trees (Fig. S6).
For only 8 genes out of the 27 (indicated in Table 2), Snodgrassella sequences clustered with those of other bee gut symbionts, which may suggest HGT events. For 12 genes (out of 107), no convincing evidence of HGT was detected because these genes were only present in two species, making the comparison to the species tree impossible. Ten of the latter genes were detected in Gilliamella and Snodgrassella only, and two were detected in Snodgrassella and Frischella only. These genes included three out of four specific Rhs proteins of Bombus group3, which were shared with Gilliamella genomes. Conclusion. In the present study, we used highly conserved Neisseriaceae core genes to perform multiple phylogenomic analyses. We demonstrated monophyly of Snodgrassella strains isolated from bumblebees and paraphyly of strains isolated from honeybees. We also demonstrated that Snodgrassella strains from Asian honeybees are an early diverging group. Combined with ANI analyses, our data further indicated that this genus comprises at least seven species. Below we describe and formally name two new Snodgrassella species from bumblebees: i.e., S. gandavensis sp. nov. for Bombus group4 (with LMG 30236 as the type strain) and S. communis sp. nov. for Bombus group5 (with LMG 28360 as the type strain). We detected 107 specific genes among these seven species. For the large majority of these 107 genes, there was no evidence for HGT. Functional analyses revealed the importance of small proteins, defense mechanisms, amino acid transport and metabolism, inorganic ion transport and metabolism, and carbohydrate transport and metabolism among these specific genes.
Cells are nonmotile, Gram stain-negative rods about 1.2 mm long and 0.8 mm wide that occur singly or in pairs. Optimal growth is on rich agar medium at 37°C in a CO 2enriched atmosphere. S. gandavensis grows under anaerobic conditions when supplemented with 10 mM KNO 3 , but not under aerobic conditions. The bacterium is positive for catalase activity and nitrate reduction and negative for oxidase activity.
The type strain is LMG 30236 (=CECT 30450), which was isolated in 2015 from the gut of Bombus pascuorum sampled in 2015 in Wetteren, Belgium. The whole-genome sequence of LMG 30236 T has a size of 2.51 Mbp. The DNA G1C content is 43.76 mol%. The whole-genome sequence is publicly available under accession no. GCA_914768025.1. The 16S rRNA gene sequence is publicly available under accession no. OU943324.
Additional information is provided in Table 3 and the supplemental material. Description of Snodgrassella communis sp. nov. Snodgrassella communis (com.mu'nis. L. fem. adj. communis, common, because of its wide host range).
Cells are nonmotile, Gram stain-negative rods about 1.2 mm long and 0.8 mm wide that occur singly or in pairs. Optimal growth is on rich agar medium at 37°C in a CO 2 -enriched atmosphere. S. communis grows under anaerobic conditions when supplemented with 10 mM KNO 3 . Growth under aerobic conditions is strain dependent. The organism is positive for catalase activity and nitrate reduction and negative for oxidase activity.
The type strain is LMG 28360 (=CECT 30451), which was isolated in 2013 from the gut of Bombus terrestris sampled in 2013 in Ghent, Belgium. The whole-genome sequence of LMG 28360 T has a size of 2.31 Mbp. The DNA G1C content is 43.26 mol%. The whole-genome sequence is publicly available under accession no. GCA_914068745.1. The 16S rRNA gene sequence is publicly available under accession no. OU943323.
Additional information is provided in Table 3 and the supplemental material.

MATERIALS AND METHODS
All custom scripts specifically developed for this study are available at https://github.com/Lcornet/SNOD.  Phylogenomic and Functional Analyses of Snodgrassella mSystems used to identify genomes with contamination levels below 5% and completeness above 95%, which yielded 66 RefSeq genomes. In addition, we determined whole-genome sequences of nine bumblebee isolates from an earlier study (15). Together, the genomes of 48 isolates originated from honeybee gut samples (from 4 species) and 27 from bumblebee gut samples (from 14 species). Assembly quality metrics for all 75 genomes were obtained with QUAST v5.0.2 (64). The isolates, along with their geographic origin, accession numbers, and CheckM and QUAST parameters, are listed in Table 1. DNA extraction. Genomic DNA was isolated using a Maxwell 16 tissue DNA purification kit (catalog no. AS1030) and a Maxwell 16 instrument (catalog no. AS2000). The integrity and purity of the DNA were evaluated on a 1.0% (wt/vol) agarose gel and by spectrophotometric measurements at 234, 260, and 280 nm. A Quantus fluorometer and a QuantiFluor ONE double-stranded DNA (dsDNA) system (Promega Corporation, Madison, WI, USA) were used to estimate the DNA concentration.
Library construction and genome sequencing. Library preparation and whole-genome sequencing were performed by the Oxford Genomics Centre (University of Oxford, United Kingdom). Library preparation was performed using an adapted protocol of the NEB prep kit. Paired-end sequence reads (PE150) were generated using an Illumina NovaSeq 6000 platform (Illumina, Inc., USA).
Core gene analysis. ToRQuEMaDA (66), 10 June 2021 version, was used to select genomes representing the diversity of other Neisseriaceae (i.e., bacteria not belonging to the genus Snodgrassella). We selected 35 Neisseriaceae genomes using the following options of tqmd_cluster.pl: dist-threshold of 0.90, kmer-size of 12 and max-round of 10 and CheckM v1.1.3 activated (see Table S1 in the supplemental material). The 110 genomes were then imported into the pangenomic workflow of anvi'o v7.1 (67). The genomes were first loaded individually into anvi'o using the anvi-gen-contigs-database script with default options, and NCBI COGs (68) were associated with each genome database using the anvi-run-ncbi-cogs script with default options. The consolidated database was constructed using the anvi-gen-genomes-storage script with default options and anvi-pan-genome script with the following options: min-occurrence of 2 and mcl-inflation of 10. The protein sequences of all genomes, along with their COG annotations, were extracted from the anvi'o database using the anvi-get-sequences-for-gene-clusters script with default options and the anvi-summarize script with default options, respectively. Functional and geometric indices of anvi'o were computed using the anvi-computegene-cluster-homogeneity script with default options. Orthologous groups (OGs) were reconstructed from anvi'o output files using the custom script anvio_pan-to-OGs.py with default options. Finally, core genes were selected using the custom script anvio_OGs-filtration.py with the following options: pfilter set to yes, fraction set to 1, unwanted orgs limit set to 0, cfilter set to yes, maxcopy set to 1, hfilter set to 1, maxfunctional index set to 0.8 and maxgeometricindex set to 0.8. These settings resulted in the selection of 254 core genes out of 11,185 OGs, present in 100% of the 110 Neisseriaceae genomes.
(ii) Snodgrassella phylogeny. Populibacter corticis (GCF_001590725.1) was selected as an outgroup for the Snodgrassella phylogenomic analysis. The concatenated sequences of 75 Snodgrassella and P. corticis genomes were extracted from the 254-core-gene supermatrix of Neisseriaceae to produce a supermatrix of 76 organisms by 85,654 unambiguously aligned amino acid positions (0.04% missing character states), from which a phylogenomic tree was inferred with RAxML as described above. A leave-one-out analysis was then performed by randomly deleting %20% of genes present in our data set. To this end, the corresponding alignments were also reduced to 76 organisms and used to construct 100 data sets of about 70,000 conserved positions by randomly combining alignment files using the script jack-ali-dir.pl from Bio-MUST-Core (available at https:// metacpan.org/dist/Bio-MUST-Core). The 100 supermatrices were assembled using SCaFoS v1.30k (70) with default settings. Trees were inferred using RAxML v8.1.17 (71) using the fast experimental tree search method and the PROTGAMMALGF model. A consensus tree was built from the set of 100 trees using the program consense v3.695 (from the PHYLIP package [72], but modified to handle long sequence names), with default settings. In order to test the possible influence of a long branch attraction artifact, the outgroup was eliminated from the supermatrix, and a new phylogenomic analysis was performed with the same protocol as described above but on a matrix of 75 organisms by 85,654 unambiguously aligned amino acid positions (0.03% missing character states). We checked the effect of taxon sampling reduction by deleting 40 Snodgrassella strains from our supermatrix, while conserving their diversity and the outgroup, using dRep v2.2.3 (43) with default settings. A phylogenetic analysis was inferred with the same protocol described above but using a matrix of 36 organisms by 85,654 unambiguously aligned amino acid positions (0.22% missing character states). Subsequently, nucleotide-based phylogenomic trees were inferred from the 254 core genes. Protein sequence alignments were back-translated by capturing and aligning the corresponding DNA sequences with the program leel (available at https://metacpan.org/dist/Bio-MUST-Apps-FortyTwo). A supermatrix of 76 organisms by 264,981 aligned nucleotides (1.59% of missing character states) was generated using SCaFoS v1.30k (70). A large phylogenomic analysis was inferred using RAxML v8.1.17 (71) with 100 bootstrap replicates under the GTRGAMMA model and using two different partitions for codon positions 1 and 2 together (here, "1&2") and position 3. As for the protein trees, leave-one-out analyses were then performed by random selection of genes to construct 100 data sets of about 100,000 aligned nucleotides. The consensus trees were produced with the same protocol described above, but using the GTRGAMMA model of RAxML v8.1.17 (71). Two leave-one-out analyses were generated: one using only codon positions 1&2 and one using two different partitions for codon positions 1&2 and 3.
Gut bacterial phylogeny. A total of 486 genomes representing the main phylotypes of bacteria found in the gut of Apis spp. and Bombus spp. (excluding Snodgrassella) were downloaded from RefSeq, prior to filtering the genomes with CheckM v1.1.3 (63) as described above. Also, 247 additional genomes covering a broad diversity of bacteria and archaea were selected with ToRQuEMaDA (66) using the following options: dist-threshold of 0.90, kmer-size of 12 and max-round of 10, and CheckM activated (Table S2). To perform a large phylogenomic analysis, we used core genes extracted by CheckM. The CheckM taxon set option was first run to produce a lineage set file. Then the CheckM analyze option was run, using the bacterial domain set while providing 843 genomes as input (i.e., 75 Snodgrassella genomes, 35 Neisseriaceae genomes, and the 733 genomes mentioned above). The CheckM qa option was then used to produce the marker files. The custom script Checkm-to-OGs.py allowed us to reconstruct OGs from CheckM marker files, selecting only unicopy OGs, thereby producing 76 OGs. The OGs were aligned using MAFFT v7.453 (73), run with the anysymbol, auto and reorder parameters. Conserved sites were selected using BMGE v1.12 (69) with moderately severe settings (entropy cut-off = 0.5, gap cut-off = 0.2). A matrix of 843 organisms by 12,930 unambiguously aligned amino-acid positions (6.72% missing character states) was produced using SCaFoS v1.30k (70), with default settings. A phylogenomic tree was produced with the same protocol as above for large phylogenomics.
Specific gene analysis. A distinct anvi'o pangenomic data set using 75 genomes of Snodgrassella and P. corticis (GCF_001590725.1) was constructed using the same protocol described above. The custom script anvio_OGs-filtration.py was used to determine which genes occurred specifically in a given group of organisms. This specific gene analysis was performed on a set of 4,685 orthologous groups. We used the same options as for the core genes, with the exception that the number of representatives of a group was set to 0.6 (60%). The inflation number was set to 10 in anvi'o (67) with the value of 60% recovered orthologous groups, even if the inflation parameter was too strict for some of them. All members of a group had to comprise an OG before it was considered specific. An orthologous enrichment was then performed to compensate for the potential lack of sensitivity in orthology inference. In addition, to enrich the OGs with potentially absent members of the group due to the 60% threshold, the enrichment could also add any Snodgrassella from other groups to control that the added sequences were indeed specific and not the result of a false orthology inference. We performed the orthologous enrichment with Forty-Two v0210570 (75, 76) (available at https:// metacpan.org/dist/Bio-MUST-Apps-FortyTwo). We used the 843 genomes that included the 75 Snodgrassella genomes to perform the enrichment. The custom script Confirm-OGs.py was used to validate the specific OGs based on the criteria that all Snodgrassella of the considered group, but no Snodgrassella from other groups, must be present in the OGs after enrichment. We performed this approach on all groups defined in the present study, as well as all pairwise combinations of them, including one with all groups together (i.e., all Snodgrassella strains). All nodes of the tree have also been tested to assess the presence of specific genes shared between groups defined in this study. A total of 107 individual gene phylogenies were computed by aligning the sequences using MAFFT v7.453 (73) with the same options described above, selecting the conserved sites with BMGE v1.12 (69) with the same options described above, and inferring the trees with RAxML v8.1.17 (71) with 100 bootstrap replicates under the PROTGAMMALGF model.
Metabolic analyses. The functions of the group-specific OGs were first determined using NCBI COGs (68) if the information was available as a product of the anvi'o workflow. COG pathways and COG functions were used to label the metabolic functions. When COG analyses yielded no results, Mantis (53) was used with default settings, and the information from the consensus annotation file was used to label function. OGs without any hit in COG or Mantis analyses were considered unknown genes, while OGs having hits with proteins of unknown function were classified as "Unknown function." Horizontal gene transfer. GTDBtk (77) was used to infer the GTDB taxonomy (47) of the 561 phylotype genomes (75 Snodgrassella and 486 representatives of the other bacterial bee gut phylotypes [described above]). MetaCHIP v1.10.15 (60) with default settings was then used to infer HGT. The group-specific genes were then searched in the MetaCHIP results using BLASTp v2.2.28 (78) with a threshold of 98% of identity on 95% of query length. Twenty-seven out of 107 single-gene phylogenies, containing Snodgrassella and at least one additional bacterial genus, generated during the specific gene analysis were manually inspected for HGT.
Biochemical characterization. Biochemical characteristics were determined for the bumblebee isolates LMG 28360 T , R-53528, R-54236, LMG 30236 T , and R-53680, and for the type strain of S. alvi, NCIMB 14803.
Growth was tested on nutrient agar (Oxoid), tryptic soy agar (TSA) (Oxoid), brain heart infusion (BHI) agar (BD Difco), Columbia agar (Oxoid) supplemented with 5% sheep blood, and All Culture (AC) agar (2% tryptose, 0.3% beef extract, 0.3% yeast extract, 0.3% malt extract, 0.5% dextrose, 0.02% ascorbic acid, 1.5% agar [all wt/vol]) after 2, 3, and 4 days of incubation at 37°C in a CO 2 -enriched atmosphere (6% CO 2 , 15% O 2 ) provided by CO 2 Gen Compact sachets (Oxoid). Hemolysis of sheep blood was checked. Growth was also tested at 37°C in ambient atmosphere (0.04% CO 2 , 20.95% O 2 ) on AC agar and in an anaerobic atmosphere (9 to 13% CO 2 , ,1% O 2 ) provided by AnaeroGen sachets (Oxoid) on AC agar and on AC agar supplemented with 10 mM KNO 3 . For the tests mentioned below, cultures were incubated in a CO 2 -enriched atmosphere. The temperature growth range was tested after 2 days of incubation on AC agar at 4, 15, 20, 28, 37, 40, and 45°C. Cell and colony morphology, motility, oxidase and catalase activities and Gram stain reaction were assessed on cultures grown for 2 days on AC agar at 37°C. Motility was determined by examining wet mounts in broth by phase-contrast microscopy. Oxidase and catalase activities and Gram staining were tested using conventional procedures (79). The effect of NaCl on growth was investigated in AC broth supplemented with different concentrations of NaCl (0 to 10% with 1% intervals [wt/vol]) after 3 days of incubation. The pH range for growth was evaluated after 3 days of incubation in AC broth buffered at pH 4.0 to 9.0 at intervals of 1 pH unit using the following buffer systems: acetate buffer (pH 4.0 to 5.0), phosphate buffer (pH 6.0 to 8.0), and Tris-HCl (pH 9.0). Nitrate reduction and denitrification and urease, indole and H 2 S production were tested using standard microbiological procedures (79) after 72 h of incubation.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, PDF file, 0.7 MB.