The Mouse Intestinal Bacterial Collection (miBC) provides host-specific insight into cultured diversity and functional potential of the gut microbiota

Intestinal bacteria influence mammalian physiology, but many types of bacteria are still uncharacterized. Moreover, reference strains of mouse gut bacteria are not easily available, although mouse models are extensively used in medical research. These are major limitations for the investigation of intestinal microbiomes and their interactions with diet and host. It is thus important to study in detail the diversity and functions of gut microbiota members, including those colonizing the mouse intestine. To address these issues, we aimed at establishing the Mouse Intestinal Bacterial Collection (miBC), a public repository of bacterial strains and associated genomes from the mouse gut, and studied host-specificity of colonization and sequence-based relevance of the resource. The collection includes several strains representing novel species, genera and even one family. Genomic analyses showed that certain species are specific to the mouse intestine and that a minimal consortium of 18 strains covered 50–75% of the known functional potential of metagenomes. The present work will sustain future research on microbiota–host interactions in health and disease, as it will facilitate targeted colonization and molecular studies. The resource is available at www.dsmz.de/miBC.

M olecular techniques have revolutionized the investigation of microbial ecosystems, but, as a consequence, culture-based approaches have been neglected. Interpretation of the gigantic amount of data obtained by high-throughput sequencing requires key isolates for precise taxonomic and functional classification. Large-scale cultivation of bacteria has primarily been focused on the human gut 1,2 . Yet, mouse models allow functional insights to be gained into host-microbe interactions while testing the causal role of microbiota in health and disease. However, the mouse gut microbiota has been poorly characterized, and bacterial isolates with reference genomes from mice are scarce. Improving the availability of representative isolates from the mouse gut microbiome is essential for future studies using gnotobiotic mice, particularly because the first reference gene catalogue of the mouse gut microbiota shows very limited overlap with human gut microbial gene diversity 3 . Furthermore, the host species from which bacteria originate influences colonization processes and the effects on host physiology [4][5][6] . Hence, concerted actions to archive and describe bacterial isolates in a host-specific fashion together with appropriate genomic databases are urgently needed 7,8 . Pioneering work in the 1960s focused on the isolation of gut bacteria from the mouse intestine 9 , but classification was limited to phenotypic features, and the majority of strains obtained in these and subsequent studies have been lost over the years. A very few species remain available (for example, members of the so-called Altered Schaedler Flora, ASF), but only under restricted conditions.
In the present work, a comprehensive collection of mouse gut bacteria was established and made available to the scientific community, including corresponding draft genomes. Important questions in the field were also addressed: 'Is species diversity host-specific?' 'To what extent do cultured bacteria cover the ecosystem diversity as assessed through molecular methods?'

Results
A comprehensive collection of bacteria from the mouse intestine. The primary aim of the work was to establish the Mouse Intestinal Bacterial Collection (miBC) and make it available to the scientific community.
A total of approximately 1,500 pure cultures were isolated from the intestine of mice, from which strains were selected on the basis of colony and cell morphology as well as 16S rRNA gene sequence to cover as much diversity as possible at the species level. In an effort to unify information scattered throughout the literature, we also added eight of our own previously published mouse-derived bacterial species [10][11][12][13][14][15][16][17] and four species published by others [18][19][20][21] .
The diversity of miBC members based on taxonomic lineage is presented in Fig. 1. The collection contains 100 strains representing 76 different species from 26 families belonging to the phyla Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria and Verrucomicrobia (Supplementary Table 1). It is dominated by Firmicutes (74% of strains), reflecting the dominance of this phylum in the mouse intestine. All isolates were deposited at the German Collection of Microorganisms and Cell Cultures (DSMZ) to ensure long-term storage and availability. A specific list linked to the metadatabase BacDive 22 was created to allow rapid query of the collection by users (www.dsmz.de/miBC). Hence, miBC is a unique tool that can serve as a reference for cultivated fractions of the mouse gut microbiota.
Culturing effort brings novel bacterial diversity and functions to light. The second purpose of the project was to isolate and describe new members of the mouse gut microbiota.
Based on 16S rRNA gene sequencing, 15 strains were characterized by values with <97% identity to any species described so far, and draft genomes were generated. Whole proteome-based phylogenomic analysis showed that the isolates were distant from known relatives with available genomic information ( Supplementary  Fig. 1). Genome-derived 16S rRNA gene and in silico DNA:DNA hybridization (DDH) estimates (see Methods) were below specieslevel thresholds (98.1% 16S rRNA gene sequence identity or DDH value of 70%, respectively) ( Supplementary Fig. 2, Supplementary Tables 2 and 3) 23 . This confirmed that the isolates represented novel bacterial taxa, including four species, ten genera and one family. Proposed names are indicated in quotation marks throughout the manuscript and detailed taxonomic descriptions are provided below. Microscopic pictures are shown in Supplementary  Fig. 3, and enzymatic profiles in Supplementary Table 4. Detailed characterization revealed that 'Flintibacter butyricus' can grow on the amino acids glutamine and glutamate (but not leucine, asparagine, lysine, arginine and aspartic acid), producing up to 7 mM butyrate from 20 mM substrate. Moreover, miBC provides the first cultivable member of the 'Muribaculaceae' (family S24-7), which is known to be a dominant bacterial group in the mouse gut 6 . To confirm the latter observation and assess the occurrence of all novel bacteria within miBC, draft genomes were compared to metagenomic species (MGS) 24 from a comprehensive shotgun sequencing study of the mouse gut metagenome 3 .
A BLAST search revealed that 5 of the 15 isolates had many strong matches with the mouse gene catalogue (Supplementary Fig. 4 and Supplementary Table 5). The other ten isolates had <200 gene hits (Supplementary Table 5), suggesting that they either represent subdominant populations in this cohort of mice, or are not detectable in the mouse gut using the most advanced molecular methods. The five novel isolates well represented in the mouse catalogue showed a high gene coverage (85-99% of genes at 95% sequence identity), allowing assessment of their prevalence in the faeces of 184 mice ( Supplementary Fig. 4). Two species were found in the majority of mice: 'Flintibacter butyricus' (98% prevalence) and 'Enterorhabdus muris' (76%). Only 'Flintibacter butyricus' was present in all eight mouse strains from all six animal facilities (five commercial suppliers), suggesting a widespread colonization of the mouse gut by this amino-acid-degrading butyrate producer.
In summary, the large-scale culture-based work carried out to create miBC demonstrates the relevance of isolating and characterizing bacteria, as it allows the description of novel species that can be dominant and/or functionally important. miBC captures the majority of currently cultured bacteria in the mouse gut. The third aim of the work was to test the relevance of miBC by assessing the coverage of high-throughput 16S rRNA gene amplicon inventories. Therefore, we used caecal samples from 93 mice housed in various facilities in Europe and America (Supplementary Tables 6 and 7).
Phylogenetic profiles of dominant bacterial populations clustered according to animal facilities (Fig. 2a), characterized by significant differences in taxa richness, in the relative abundance of Bacteroidetes and Firmicutes (Fig. 2b) and by the presence of indicator species (Supplementary Table 8). This emphasizes the rationale of working with a multi-centre data set to test the representativeness of the collection. As we isolated the first cultured member of family S24-7 and assessed its host-specificity (next section), we investigated the distribution of reads classified as S24-7 members. They were detected in 86 of 93 mice ( Supplementary Fig. 5), indicating a widespread distribution, in contrast to the MGS data presented above (Supplementary Fig. 4 and Supplementary Table 5).
Local BLAST analysis of the 772 detected operational taxonomic units (OTUs) against miBC revealed 46 hits at the species level (≥97% identity), and 105 at the genus level (≥95%), representing 23.5% of all sequence reads (Fig. 2c). Despite the relatively stringent quality-check and filtering used for generating the final 16S rRNA gene short reads data set and corresponding OTU table, these percentages may underestimate the cultured fractions due to the presence of spurious OTUs in the high-throughput sequence data. Looking at the most common and dominant molecular species (referred to as core OTUs from here on), the 42 OTUs detected in ≥50% of the mice at a mean relative abundance of ≥0.5% represented 56.8% of the total reads. Nine of these OTUs matched miBC strains at the genus level (Fig. 2d), accounting for 30% of the core reads. When searching for additional cultured bacteria (from environments other than the mouse gut) that represented any OTUs with no match to miBC, 21 and 48 were identified at the species and genus levels, respectively, accounting for only 7.5% extra sequence abundance (Fig. 2c). Of the 33 core OTUs with no match to miBC, six matched other cultured bacteria (one at the species and five at the genus level; Fig. 2d), accounting for 5.7% of the core reads.
These data indicate that the created bacterial collection already covers important dominant lineages, although the majority of mouse gut bacteria remain to be cultured.
Gut bacterial species diversity is host-specific. Published work shows that the origin of bacteria influences colonization processes and host physiology 4,5 , but there are only a few reports on the host specificity of gut bacterial taxa in human versus mouse 3,25 . The fourth aim of the project was to investigate the presence of miBC species in different ecosystems, especially gut environments, based on the entire pool of samples in the Sequence Read Archive (SRA) using IMNGS (www.imngs.org). Raw output data are provided in Supplementary Table 9.
We first looked at the coverage of main SRA sample categories (environment, n = 17,884; host gut, n = 21,755; other host habitats, n = 15,414) based on the entire diversity in miBC. The percentage of data sets where all species in the collection represented >1% total reads was highest in host gut-derived samples (Fig. 3a), supporting the common sense that bacterial occurrence depends on major environmental features in different ecosystems. The resolution of analysis was then refined to mouse and human gut samples. Almost one-third of the 6,001 mouse samples (28.6%) were characterized by a cumulative abundance coverage >50% (that is, the sum of sequences matching that of miBC species at >97% identity was more than 50% total reads per sample). This was nearly twice as much as for human samples (16.6% of 11,705) (Fig. 3b). These human samples seem to contain relatively high proportions of short 16S rRNA gene amplicons similar to 16S rRNA genes of   miBC strains (at 97% identity) and probably do not correspond to representative ecosystems from the human gut, but rather to specific diet-or health-related conditions where a limited number of taxa dominate. Altogether, these data speak in favour of better coverage of 16S rRNA gene data sets from the mouse than human intestine using strains that were isolated from mice to query the database.    Individual values appear as grey dots. c, Coverage of the 772 molecular species by members of the mouse collection in terms of OTU number (histograms) and percentage of reads (pie chart). d, Phylogenetic tree of core OTUs and their match to miBC members. F, animal facility; Gen., genus level (95% sequence identity); OTU, operational taxonomic unit; Sp., species level (97% sequence identity).
Next, to determine whether certain species may be host-specific, we categorized the 76 miBC species according to their prevalence in human and mouse gut samples (for details see Methods). We identified 32 shared species (present in both human and mouse) and 16 that were characteristic for the mouse intestine (Fig. 3c). Interestingly, three previously published mouse-derived species   (Enterorhabdus spp., Acetatifactor muris) 10,13,17 and five novel taxa isolated in the present study ('Acutalibacter muris', 'Enterorhabdus muris', 'Turicimonas muris', 'Muribaculum intestinale', 'Flintibacter butyricus') were indeed found to be enriched in the mouse intestine using IMNGS. The remaining 28 species were either rare or characterized by low prevalence. Among the 16 mouse-enriched species, 9 were also present when considering a relative abundance of >1% total reads, indicating that these species are dominant in the mouse gut microbiota. These findings suggest that colonization by certain bacterial species, or at least their occurrence in dominant communities as detected by sequencing, is host-specific. A selection of 18 shared or mouse-enriched species for establishment of a minimal bacteriome (MIBAC-1) to model the mouse gut ecosystem are shown with asterisks in Fig. 3c. This bacterial consortium was further investigated using genomic approaches.
Genomic novelty and representativeness of cultured mouse gut bacteria. To evaluate the relevance of miBC and the minimal consortium of 18 species (MIBAC-1) at the metagenomic level, draft genomes of the 76 miBC species were compared to published shotgun metagenomes from the mouse (n = 15) and human (n = 21) gut (Supplementary Table 10) and to two metagenomes generated in-house (caecum and faeces of one specific-pathogen free (SPF) wild-type mouse). First, a local database of miBC genomes and the metagenomes in a presence/ absence binary code (1/0) of protein families (PFAM) was created. Deterministic incremental selection of best-fitted genomes showed that 12 to 15 species sufficed to cover ∼80% of known functional diversity in metagenomes (Fig. 4a). It also showed that cumulative information in miBC genomes plateaued at approximately 90% coverage and that coverage of mouse vs. human metagenomes was higher (P = 1.66e-10, Kolmogorov-Smirnov test).
Next, we investigated coverage of the mouse metagenomes (n = 17) by comparing miBC and the minimal bacteriome MIBAC-1 with consortia from the literature, that is, the ASF 26 and the Simplified Human Intestinal Microbiota (SIHUMI) 27 . Coverage analysis included not only the PFAM approach, but also protein sequence coverage by BLAST analysis (that is, independent of functional annotation) for determination of both homology (≥80% length coverage, E < 10 −5 ) and close sequence match (≥80% both coverage and sequence similarity). As expected, both PFAM and sequence coverages were highest for the entire collection, and coverage decreased with the number of species included in minimal consortia (18 in MIBAC-1 versus 8 in each of ASF and SIHUMI) (Fig. 4b). To test the relevance of MIBAC-1 (obtained by data-driven selection), we analysed 10 selections of 18 strains picked randomly from miBC (MIBAC-R1 to 10) (Supplementary Table 11). The designed minimal bacteriome MIBAC-1 was characterized by higher coverage than the mean of random sets for both sequence homologues and close matches but not for PFAM (Fig. 4c). This strengthens the rationale for having selected these strains and emphasizes the known shallow resolution of analysis when looking at broad functional categories and the more detailed specificity when looking, for instance, at genes 3 .
Finally, PFAM diversity was investigated by heatmap and dendrogram analysis. The mouse collection and the metagenomes formed a cluster that was distant from minimal bacteriomes ( Supplementary Fig. 6). Moreover, although ∼10% of PFAMderived information captured by shotgun sequencing was not contained in miBC (Fig. 4a), 437 PFAM detected in the collection and minimal bacteriomes (representing 2.7% of the total 16,231 PFAM in all data sets) were absent from metagenomes, 129 of which were unique to miBC (Fig. 4c). These PFAM (Supplementary Table 12) are thus not captured by metagenomics, probably because they originate from taxa that are subdominant or lost during sample preparation.

Discussion
The extensive use of mouse models is in stark contrast to the lack of reference bacterial strains from the mouse intestine. In the present work, we have created a publicly available collection of bacteria from the mouse gut and provide important insight into host-specific bacterial diversity and functions.
Besides delivering an array of mouse gut bacteria to the research community, miBC contains representative strains of novel taxa, which represent 15% of the collection (∼1.0% of original isolates tested). Thus, a substantial number of novel bacteria can still be discovered even via classical bacteriological methods, some of these bacteria being apparently of particular functional relevance. For example, mice are colonized by a clade of bacteria from the order Bacteroidales, originally referred to as MIB (mouse intestinal bacteria) 28 and nowadays misleadingly classified as family S24-7 or Porphyromonadaceae spp. depending on the database. The present work is the first to report the isolation of a cultured member of this important family in the mouse gut ('Muribaculum intestinale'), opening avenues for functional studies. The collection also includes one strain of each species Intestinimonas butyriciproducens and 'Flintibacter butyricus' able to produce butyrate not only from sugars but also amino acids. A human isolate of I. butyriciproducens does so via a metabolic pathway so far unknown in gut environments 29 , highlighting the value of isolating bacteria to unravel functional novelty.
Investigating the mechanisms that underlie the specificity of bacterial colonization in the gut is important to understand the population dynamics associated with resilience, resistance to invaders, or survival of exogenous strains for therapeutic purposes. Occurrence analysis of the 76 miBC species following a large-scale 16S rRNA gene amplicon approach revealed that 20% were enriched in mice, including members of all major phyla. These results add precision to recent surveys on taxonomic comparison between human and mouse metagenomes, proposing that lactobacilli and several genera within the Firmicutes (including Coprobacillus and Turicibacter spp. within Erysipelotrichaceae, Anaerotruncus within Ruminococcaceae, Marvinbryantia within Lachnospiraceae, and Pseudoflavonifractor within Clostridiales) are characteristic of mice 3,25 .
We found that members of the proposed novel family 'Muribaculaceae' are specific to the mouse gut, even though their presence in humans has been reported by others using quantitative polymerase chain reaction (qPCR) 30 . This discrepancy may be linked to primer specificity, and additional work is needed to assess the occurrence of these bacteria in other mammalian species, including pigs 31 . Recent data comparing human and mouse microbiota reported mouse-specific colonization by this family 6 . Via colonization of germfree mice with microbial communities from mammals, insects and environmental samples, the authors of ref. 6 demonstrated that colonization success (maintained taxa richness and similarity to input samples) was highest for gut samples, indicating ecosystemselective pressure on microbial communities 6 . Competitive experiments with human versus mouse microbiota revealed transient colonization by human indicator species, but the gut ecosystem was dominated by mouse species after 14 days (>99% of the community), especially members of family S24-7 (proposed to be renamed 'Muribaculaceae'). These observations are in accordance with our findings supporting the notion of host-specific gut colonization.
In contrast to common statements in the literature on the vast majority of gut bacteria not being cultured, work by Goodman and colleagues 32 reported that approximately 56% of 16S rRNA gene sequences detected in human faecal samples belong to readily cultured species. A recent review also reports that the majority of most dominant species detected by molecular analysis have cultured representatives 33 . Analysis of high-throughput 16S rRNA gene amplicons from a mouse cohort covering different facilities revealed a set of 42 core bacterial OTUs in the mouse gut, which was covered up to 21% (OTU diversity; 9 of 42) and 30% (number of reads) by miBC at the genus level. This is the best available to date, but still lower than the reference values for humans 32 .
A very recent study reported that the majority of human faecal bacteria can actually be cultured 1 . This highlights the need for further work to isolate and characterize novel mouse gut bacteria, the 33 core OTUs without match to miBC representing 'most wanted taxa'. Moreover, these data support the existence of substantial differences in gut bacterial diversity and composition between mouse facilities, which can have dramatic consequences on host phenotypes 34 . This emphasizes the need for better characterization and standardization of complex gut microbial communities based on the knowledge and bacterial strains gained in the present study and by others since pioneering work in the 1960s 35,36 .
The representation of cultured bacteria within miBC was also assessed at the metagenomic level. The entire collection covered 60-90% of the functional potential in mouse faecal metagenomes. This high coverage reflects the limitation of the analysis to known functions or homologous protein sequences. Narrowing measurements to the 18-species consortium MIBAC-1, which can be used as a proxy for native mouse gut ecosystems, revealed coverages of 55-75% (PFAM and sequence homologues) and 20% (close sequence matches). Minimal consortia of bacteria already exist in the literature, but strains were selected based on educated guesses or for specific purposes and either originated from the human intestinal tract or are not easily available 26,27,37 . In contrast, the minimal bacteriome presented here contains strains that originate from the mouse intestine, is publicly available, was selected on the basis of comprehensive sequence-based approaches, and was overall characterized by a higher coverage of mouse faecal metagenomes. Hence, it represents important progress towards the standardization of mouse models, as emphasized by others 35,38 . However, additional work is required for establishment of the consortium in vivo. Demonstrating stable colonization, which can be influenced by many ecological factors, is indeed very important and challenging, but is beyond the scope of the present study.
In conclusion, the significance of our work to the field and the broader community is manifold: (1) it provides the basis for genetic studies that will eventually improve the resolution of meta-omics analyses of the mouse gut microbiome; (2) unrestricted access to miBC strains is a major advance for the research community, especially for functional studies on cause-and-effect relationships and for standardizing experiments among laboratories; and (3) the collection includes mouse-enriched taxa and strains with specific metabolic functions such as butyrate production, allowing ecological studies and assessment of the impact of specific strains on host physiology and disease onset. We acknowledge the fact that miBC is not an exhaustive collection; important bacterial taxa are still missing (for example, segmented filamentous bacteria or species of the phyla TM7 and Deferribacteres) and other microorganisms such as fungi and archaea, as well as viruses, are also important to investigate 39,40 . Moreover, key metabolic functions such as the conversion of bile acids require more detailed investigation. Also, strain diversity is most probably very important in gut environments 41 , but the collection offers resolution at the strain level for only a few species (especially lactobacilli), as the aim was to cover as much diversity as possible at the species level. Nevertheless, recent metagenomic work 41 also emphasizes the necessity to obtain reference genomes and thus the benefit of isolating and describing bacteria, as done extensively in the present work. Hence, knowledge of the mouse gut bacterial diversity made available via miBC can be viewed as a foundation that now requires effort by the entire community of gut microbiome researchers for further development.

Methods
Mouse samples for cultivation. The use of mice was approved by the local authorities in charge (animal welfare authorization 32-568, Freising District Office; Regierungspräsidium Freiburg T05-28). Laboratory mice were housed in conventional or specific pathogen-free facilities at the WZW School of Life Sciences (TU München), the Institute for Medical Microbiology and Hygiene (Universitätsklinikum Freiburg) or the Rodent Center (ETH Zurich). Samples from mice captured in the wild were obtained as described previously 42 . Starting materials for bacteria isolation included fresh faeces collected from living mice, mucosal samples, as well as small intestinal, caecal or colonic content collected from mice that had been euthanized by CO 2 inhalation or neck dislocation. The origins of bacteria (those eventually included in the collection) in terms of mouse genotype and gut location are provided in Supplementary Table 1. The working area and dissection set were cleaned and mice were copiously sprayed with 80% (vol/vol) ethanol before dissection in order to avoid contamination. In some instances, mice were dissected inside the anaerobic workstation to prevent any contact of gut samples with oxygen.
Culture media. All quantities are per litre of medium.
A II: Brain-heart-infusion (BHI), 18.5 g; yeast extract, 5 g; trypticase soy broth, 15 g; K 2 HPO 4 , 2.5 g; hemin, 10 µg; glucose, 0.5 g; palladium chloride, 0.33 g; agar, 15 g. After autoclaving, add Na 2 CO 3 , 42 mg; cysteine, 50 mg; menadione, 5 µg; fetal calf serum (complement-inactivated), 3% (vol/vol). Adapted from Aranki and  Bacterial strains isolation. Gut samples were re-suspended (1:10 wt/vol) in reduced buffered solutions or broth media (see composition above in the section 'Culture media'). Mucosal samples were prepared as described previously 10 . Tenfold dilution series of gut suspensions were plated on agar media (see composition above) and bacteria were allowed to grow for 2-30 days under aerobic conditions, or in an anaerobic chamber containing a mixture of hydrogen, nitrogen and carbon dioxide (5:10:85) or hydrogen and nitrogen only (10:90). Single colonies were streaked at least three times onto fresh agar plates before transfer into broth media. Bacterial smears formed after growth of low dilutions were also re-streaked to obtain single colonies of low-abundant bacteria. Culture purity was ensured by observing colony morphology as well as cell morphology by light microscopy after Gram-staining. For identification and phylogenetic analysis of isolates, DNA was extracted from pure cultures and 16S rRNA genes were amplified and sequenced as described previously 15 . Electropherogram quality was checked and contigs were built using Bioedit 45 . Sequences were identified using EzTaxon 46 . The value of 97% 16S rRNA gene sequence identity, which is a widely used and generally accepted though rather conservative threshold, was chosen to delineate novel taxa in a consistent manner across bacterial phyla 23 . Further characterization of the strains is detailed in the next section. Routine media used for subculturing strains included reduced Wilkins Chalgren anaerobic (WCA) broth (Oxoid) supplemented with cysteine (0.05% wt/vol) and DTT (0.02%) as reducing agents. Cryo-stocks were prepared from freshly grown cultures by mixing bacterial suspensions 1:1 with filter-sterilized glycerol in culture medium (40% vol/vol) before freezing at −80°C. Live and cryo-cultures of isolates were shipped to the German Collection of Microorganisms and Cell Cultures (DSMZ), where stocks for long-term storage were prepared and detailed taxonomic characterization was carried out. Strain information (including culture conditions) is available online at www.dsmz.de/miBC. Strain characterization. Analyses included cell morphology by microscopy, enzymatic testing, cellular fatty acids, diaminopimelic acid, automated ribotyping and mass spectrometry analysis using a MALDI-biotyper (Bruker). Matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) sample preparation was performed according to protocol 3 as described by Schumann and Maier 47 . Ribotyping was carried out using the automated Riboprinter microbial characterization system (Dupont, Qualicon). Sample preparation and analysis were performed according to the manufacturer's instructions and EcoRI restriction enzyme was used to generate the DNA fragments. Protocol 1, published by Schumann 48 was applied to screen Gram-positive bacterial cells for the presence of diaminopimelic acid. Two loops of wet biomass were added to 200 µl of 4 N HCl in 2 ml glass ampoules, which were heat-sealed and kept for 16 h at 100°C. After cooling, hydrolysates were pressed through a charcoal bed using a pipette bulb and dried in a gentle air stream at 35°C until all traces of acid were eliminated. Hydrolysates were dissolved in 100 µl distilled water and a volume of 2 µl was spotted on the baseline of cellulose thin-layer chromatography (TLC) plates (Merck, P/N 1.05577) using a glass capillary. For separation of the diamino acid, a solvent system was used according to ref. 49 (32 ml methanol, 4 ml pyridine, 7 ml distilled water and 1 ml 12 N HCl; all from Sigma-Aldrich). TLC plates were sprayed with ninhydrin reagent after development. Spots became visible after heating (100°C, 5 min). Cellular fatty acids were determined from cells grown on blood agar plates or collected from pre-reduced liquid media. Fatty acids were analysed as methyl ester derivatives obtained from 10 mg dry cell material by saponication, methylation and extraction using the modifications by Kuykendall et al. 50 of the method by Miller 51 . The fatty acid methyl esters were separated and identified using the Microbial Identification System (MIDI) Sherlock version 6.1 (TSBA40 database). Enzymatic testing was done according to the manufacturer's instructions (Biomérieux). Cell morphology was examined using phase-contrast microscopy (Zeiss Axioscope A.1, ×100 Plan-Neofluar oil-immersion objective, Ph3; Zeiss Axiocam MRc; Axiovision software). Slides were coated with a layer of highly purified agar (2%).
High-throughput 16S ribosomal RNA gene analysis. For analysis of mouse gut microbiota based on 16S rRNA gene amplicon sequencing with ensuing determination of core mouse gut bacteria, caecal samples were obtained from a total of 93 mice housed in eight different facilities in Europe and America (Supplementary Table 6). The aim was to obtain a data set that was not specific to one facility, and thus was as representative as possible of bacterial communities in the distal mouse intestine.
Mice were killed and caeca were immediately removed, placed in cryo-vials and snap-frozen in liquid nitrogen. Samples were stored at −80°C until shipped on dry ice to the University of Nebraska at Lincoln and stored again at −80°C until further processed. Once all samples were collected, they were thawed under anaerobic conditions and diluted 1:10 in sterile phosphate-buffered saline with 10% glycerol. Aliquots (200 µl) of each sample were used for DNA extraction following procedures described elsewhere 52 . 16S rRNA gene tag sequencing (MiSeq, Illumina) of V5-V6 regions was performed at the University of Minnesota Genomics Center using primers 784F (5′-RGGATTAGATACCC′) and 1064R (5′-CGACRRCCATGCANCACCT) 53 .
Raw sequence reads were processed using IMNGS (www.imngs.org) with a pipeline developed in-house based on UPARSE 54 . Details about the analysis have already been described elsewhere 55 . OTUs clustered at 97% sequence identity and occurring at a relative abundance of ≥0.1% total reads in at least one sample were analysed further. The analysis delivered 14,906,121 quality-and chimera-checked reads (160,281 ± 60,686 per sample) that clustered in 772 OTUs (344 ± 45 OTUs per sample) (Supplementary Table 7 and Supplementary Sequence File).
For phylogenetic analysis, the evolutionary history was inferred by using the maximum likelihood method based on the 'General Time Reversible model' and using 200 bootstrap replications in MEGA6 (ref. 56). The tree with the highest log-likelihood (representing the most probable tree topology) was selected. The percentage of trees in which the associated taxa clustered together was shown next to the branches (the tree was condensed to values ≥50%). All positions with <70% site coverage were eliminated. That is, fewer than 30% alignment gaps, missing data and ambiguous bases were allowed at any position. There were a total of 261 positions in the final data set.
Integrated 16S rRNA gene amplicon studies. To evaluate the prevalence and abundance of miBC members in gut microbiota, other host-derived ecosystems and environmental samples, we used an integrated metagenomic platform developed in-house and used in previous work (www.imngs.org) 16,57. All 16S rRNA gene amplicon studies available in the SRA 58 were extracted and organized in samplespecific databases. The IMNGS build 1510 used in the present study contained a total of 55,073 samples, including 6,001 and 11,705 from the mouse and human gut, respectively. The corresponding databases were searched using UBLAST 59 based on nearly full-length 16S rRNA gene sequences of the 76 miBC species as queries to ensure good coverage across all studies, independent of 16S rRNA gene regions. Results were filtered on length (>200 nt), coverage (>70% read length) and similarity (>97 or 99%).
To determine whether certain species were host-specific, we looked at the number of SRA-derived samples positive for each species included in the collection, that is, those mouse or human samples in which sequences matched the corresponding 16S rRNA gene sequence at the level of 97% similarity. Of note, each OTU with a miBC match collected via IMNGS was subsequently assigned to one single closest sequence in the miBC reference set in order to avoid redundancy issues in the case of genera with closely related species (for example, for lactobacilli, enterococci, staphylococci). Prevalence had to be at least 5% in mouse gut samples so that species were considered for discriminative analysis between human and mice. The parameters used to categorize species were as follows: (1) rare: the species had a prevalence of <1% of mouse samples; (2) low prevalence: <5% mouse samples; (3) shared: the percentage of positive samples from human gut was at least as high as those from mice; (4) shared and dominant: definition in category 3 held true also when considering only SRA-derived OTUs occurring at a relative sequence abundance of >1% total reads (dominant OTUs); (5) mouse-enriched: the percentage of positive gut samples from mice was at least twice that from human subjects; (6) mouse-enriched and dominant: the given species was also enriched in mice when considering only dominant SRA-derived OTUs (as defined in category (4)).
For the interpretation of results, it is important to remember that the analysis described above is based on short 16S rRNA gene amplicons, which can affect similarity-based results when compared to full-length gene analysis. Moreover, although it is reasonable to assume that the large-scale analysis (including thousands of samples from many different studies) guarantees a certain degree of representativeness, sample descriptions in SRA can be imprecise and the analysis certainly includes outlier samples for a given ecosystem (for example, from hosts with peculiar pathophysiological conditions).
Genome sequencing and processing. A total of 53 draft genomes were obtained in the present study (EBI project accession no. PRJEB10572). The 53 strains were selected following three criteria: (1) novelty; (2) relevance according to the IMNGS and MGS analysis (mouse-enriched, prevalent or dominant); and (3) need to fill diversity gaps (species with no genome yet available). The miBC collection also includes 12 strains, for which novel genomes have already been deposited and will be published in a separate study (SRA accessions are provided in Supplementary  Table 1). Genomes from 11 additional species in the collection were retrieved from NCBI (Supplementary Table 1). The corresponding raw sequence files were processed as for the genome sequences generated in-house.
Genomic DNA was obtained from pure cultures by precipitation after mechanical lysis following a protocol published previously 60 . DNA libraries were prepared using the TruSeq DNA PCR-Free Sample Preparation Kit (Illumina). The protocol was optimized (DNA shearing and fragment size selection) to improve assembly quality 61 . Libraries were sequenced using the Illumina MiSeq system according to the manufacturer's instructions.
Reads were assembled using Spades v3.6.1 with activated BayesHammer tool for error correction and MismatchCorrector module for post-assembly mismatch and indel corrections 62  Genome-based analyses. 16S rRNA gene sequences were extracted from both the mouse isolate genomes and reference genomes using RNAmmer (ref. 65). All pairwise similarities among these sequences were calculated from exact pairwise sequence alignments using recommended settings 23 .
For digital DNA:DNA hybridization (dDDH), the Genome-to-Genome Distance Calculator 2.0 (GGDC), a web service freely available at http://ggdc.dsmz.de, provided a genome sequence-based delineation of (sub-)species by reporting dDDH estimates as well as their confidence intervals 66 . This approach was shown to have several advantages over alternative species delineation approaches 66,67 , without mimicking the pitfalls of conventional DDH.
A whole-genome phylogeny (based on the proteome data) was inferred using the latest version of the Genome-BLAST Distance Phylogeny (GBDP) method 66,68 . Here, pairwise proteome comparisons (including pseudo-bootstrap replicates) were performed under the greedy-with-trimming algorithm and further recommended settings 69 . The final tree was inferred using FastME v2.07 with TBR postprocessing 70 . GBDP settings were trimming algorithm, formula d5 and an e-value threshold of 10e-8. The tree was rooted at the mid-point 71 and numbers above branches are greedy-with-trimming pseudo-bootstrap support values from 100 replicates. Only support above 60% is shown.
The difference in genomic G+C content was used to delineate species. When computed from genome sequences, the G+C content differences vary no more than 1% within species 67 .
Similarity search against the mouse gene catalogue. The mouse gene catalogue 3 was searched against each of the 15 genome databases separately, using BLASTn (ref. 72). The gene matched criteria were percent identity ≥95% over 100 bp or more and e-value ≤10×10 -5 (very similar results were obtained using 80% coverage threshold with ≥95% identity. The number of gene matches assigned to metagenomic species (MGS), as defined previously 3,24 , was counted (see Supplementary Table 5 for details). The occurrences of the five MGS through the mouse cohort sampled in ref. 3, with 85% of the genes matching the novel isolates, were used to generate Supplementary Fig. 1b.
Metagenome analysis. Metagenomic DNA was extracted from mouse intestinal contents as described above for pure cultures 60 . Libraries of 300 bp insert sizes were prepared from 500 ng fragmented DNA (Covaris S220 AFA System; duty factor 10%, peak incident power 175 W, 200 cycles per burst, 40 s) for each sample using the NEBNext Ultra DNA Library Prep Kit for Illumina (E7370S, New England Biolabs) according to the manufacturer's instructions. Samples were barcoded and sequenced on a TruSeq Rapid PE flowcell (PE-402-4001; Illumina) using an Illumina HiSeq 2500 sequencer with TruSeq Rapid SBS chemistry (FC-402-4001) and the 2 × 100 bp paired-end read module. Real-time analysis (RTA) software (1.17.20) was used for image analysis and base calling. Sequence files (.fastq) were generated with the CASAVA BCL2FASTQ Conversion Software (1.8.3). Paired-end sequencing reads were assembled into metagenomics contigs using Megahit 73 . Contigs with a length of >500 nt were classified with Kraken 74 and those classified as bacteria were kept for further analysis.
Three complementary approaches were followed to determine metagenomic coverage by cultivated bacteria. First, a local database was created of all miBC genomes and the metagenomes in a presence/absence binary code (1/0) of protein families (PFAM). Deterministic incremental selection of best fitted genomes was performed to assess coverage of the metagenomes by all miBC species (n = 76), or by species included in the minimal bacteriome MIBAC-1 (n = 18; this study), SIHUMI (n = 8) 27 and ASF (n = 8) 26 . Second, ORFs in metagenomic contigs were selected and initial annotation was carried out with prokka 64 . The predicted metaproteome of miBC and the three minimal bacteriomes were used for creation of protein BLAST databases. Sequence coverage of metagenomes was calculated based on a BLASTp search. ORFs that significantly (expected value E < 10 −5 ) aligned with a database entry protein at minimum length coverage ≥80% were used for the calculation of functional coverage (that is, at the level of homologous proteins). Third, from the ORFs retained in step (2), only those with ≥80% sequence similarity were used for calculation of close match sequence coverage. Of note, all newly generated or database-derived metagenomes used in the present analysis were completely independent of the samples used for strain isolation.
Description of novel bacteria. The descriptions are based on genome sequence analysis, enzymatic testing and chemotaxonomy (cellular fatty acids and detection of meso-diaminopimelic acid). Genome-based analyses included whole proteomebased phylogenomic GBDP analysis, dDDH, 16S rRNA gene sequence analysis and differences in G+C content of DNA. First, the relative subtree height (RSH) of any given putative novel bacteria within the phylogenomic tree relative to the RSH of related species allowed reliable conclusions on the taxonomic properties of novel strains. Second, a dDDH value of <70% indicated affiliation of an isolate to a novel species. Third, a 16S rRNA gene sequence identity of ≤94.5% was considered strong evidence for different genera and ≤86.5% for distinct families 75 . Fourth, because within-species differences in the genome-based G+C content of DNA are almost exclusively <1%, larger differences strongly supported the status of distinct species. Finally, percentage of conserved proteins (POCP) analysis was done using the IMG software tool Genus definition 76,77  Description of Acutalibacter gen. nov. Acutalibacter (A.cu.ta.li.bac'ter. L. adj. acutalis tapered, pointed; N.L. n. bacter rod; N.L. masc. n. Acutalibacter a rod-shaped bacterium with tapered ends, pertaining to the cell morphology of the type strain of the type species).
The closest phylogenomic neighbour is Clostridium sporosphaeroides. Both genomes cluster distantly from Clostridium leptum. They do not represent members within the Clostridium sensu stricto cluster. The dDDH is 23% and POCP is 41%, indicating the status of a separate genus, for which the name Acutalibacter is proposed. The dDDH value between the genome of strain KB18 T and Ruminococcus bromii L2-63 is 25.6%. The G+C content of genomic DNA of the type strain is 54.6 mol%. The type species is Acutalibacter muris.
Description of Bacteroides caecimuris sp. nov. Bacteroides caecimuris (cae.ci. mu'ris. L. n. caecum caecum; L. gen. n. muris of a mouse; N. L. gen. n. caecimuris from the caecum of a mouse). The closest phylogenomic neighbour is Bacteroides acidifaciens and the dDDH value between both genomes is 40.2%.
Description of Blautia caecimuris sp. nov. Blautia caecimuris (cae.ci.mu'ris. L. n. caecum caecum; L. gen. n. muris of a mouse; N. L. gen. n. caecimuris from the caecum of a mouse). The closest phylogenomic neighbour is Blautia wexlerae and the dDDH value between both genomes is 25.6%. The G+C content difference between strain SJ18 T and Blautia wexlerae is 1.6%.
Nine alpha-amylases, which can be assigned to glycoside hydrolase family GH13 and catalyse the glycosidic linkage of glycosides, are detectable within the genome.  1.2, Prokka_03476). The glucokinase is also involved in degradation of trehalose. Trehalose uptake is carried out by a trehalose-specific phosphotransferase system (PTS; Prokka_03240). The strain also possesses trehalose import proteins (Prokka_02649, 00179), transport system permease (Prokka_0062, 0067) and the operon repressor TreR (03239).
Description of Cuneatibacter gen. nov. Cuneatibacter (Cu.ne.a.ti.bac'ter. L adj cuneatus wedge-shaped; L. masc. n. bacter rod; N.L. masc. n. Cuneatibacter a rodshaped bacterium with wedge-shaped ends). The nearest phylogenomic neighbours are Clostridium clostridioforme and Clostridium symbosium. It is placed into the Lachnospiraceae cluster, apart from the type species of the genus Clostridium, Clostridium butyricum. POCP between C. butyricum and C. clostridioforme or C. symbosium is 39.9% and 39.7%, respectively. These low values confirm that both are not members of the Clostridium sensu stricto cluster. POCP between the genome of C. clostridioforme and C. symbosium is 36.5% and dDDH is 22.2%. The dDDH values between strain BARN-424-CC-10 T and C. clostridioforme or C. symbosium are 21.6% or 22.3%, respectively. The difference in G+C content of DNA is 1.4% for both comparisons. All these values confirm the separate genus status of the isolate. The G+C content of genomic DNA of the type strain is 49.1 mol%. The type species is Cuneatibacter caecimuris.
Description of Extibacter gen. nov. Extibacter (Ex.ti.bac'ter. L. neut. n. extum bowels or entrails of an animal; N.L. masc. n. bacter rod; N.L. masc. n. Extibacter a rod isolated from the intestine). The nearest phylogenetic neighbour is Clostridium hylemonae. POCP between the genome of C. hylemonae and Clostridium butyricum, the type species of the genus Clostridium, is only 14.3%, indicating the separate genus status of C. hylemonae. The dDDH value between both Clostridium genomes is 28.4% and between C. hylemonae and strain 40cc-B5824-ARE T 23.8%. The G+C content of genomic DNA of the type strain is 47.9 mol%. Type species is Extibacter muris.
Description of Extibacter muris sp. nov. Extibacter muris (mu'ris, L. gen. n. muris of a mouse). Cells stain Gram-positive, and are strictly anaerobic rods up to 3 µm in