A 16S rRNA Gene and Draft Genome Database for the Murine Oral Bacterial Community

Mouse model studies are frequently used in oral microbiome research, particularly to investigate diseases such as periodontitis and caries, as well as other related systemic diseases. We have reported here the details of the development of a curated reference database to characterize the oral microbial community in laboratory and some wild mice.

have all demonstrated the mouse bacterial community to have a simple and relatively stable composition, with a major proportion of cultivable components, particularly in the specific-pathogen-free (SPF) laboratory mouse strains.
Despite this inherent simplicity of the mouse oral microbiome, it has significant relevance as a model to investigate and understand the mechanisms of human oral diseases such as periodontitis (7,9,10). A primary reason for this has been the parallels observed in the nature of the initiation and development of disease in experimental studies, specifically the development of a dysbiotic microbiome (characterized by increased total microbial loads) and soft tissue destruction with gingival inflammation (2,11,12), which are often comparable to that seen among humans (13,14). Also, since the microbial genera observed are often similar to the predominant ones seen in humans (15), these animal models are also useful for understanding host-microbiota interactions and homeostasis mechanisms in health and disease.
However, the lack of adequate information in the public domain about mouse oral bacterial isolates from various sources, as well as poorly curated 16S rRNA gene sequences in the public databases, may lead to non-or misidentification of the organisms, which could thereby affect the outcome of such studies. A host-specific curated reference database for murine oral microbial populations in the public domain would enable researchers to accurately, reliably, and consistently identify the bacterial communities in experimental samples. Databases provide an improved and more accurate characterization of bacterial communities and allow easy comparison of work from different laboratories.
Similar databases have already been developed to characterize the oral microbiome in humans (15,16) and other mammalian host species such as cats and dogs (17,18). More recently, researchers have characterized and generated a database for the mouse intestinal bacterial community (19,20), as well as chronicled the collection of genes in the mouse gut metagenome (21).
Here, we report a curated and well-characterized database of the oral bacterial population in mice, with representative genome sequences, which should greatly benefit researchers as a reference for oral microbiome studies in health and disease using laboratory mouse models.

RESULTS
Assignment of mouse oral taxa. To date, 325 16S rRNA gene sequences from murine oral bacterial isolates have been analyzed and found to constitute 103 mouse oral taxa (MOT) ( Table 1 and Fig. 1). Twelve of the assigned MOTs (12.36%) represent novel, previously unidentified species that need further characterization in order to be assigned a formal species name. Representative 16S rRNA gene sequences for these novel taxa have been submitted to the National Center for Biotechnology Information (NCBI) database and are available under accession numbers MN095260 to MN095271. The 16S rRNA gene sequences for all the isolates analyzed in this study are available to download under accession numbers MW175535 to MW175859.
Diversity of the murine oral bacterial community. The mouse oral taxa are distributed across four bacterial phyla ( Fig. 1): Firmicutes (54 taxa), Proteobacteria (27 taxa), Actinobacteria (14 taxa), and Bacteroidetes (8 taxa). The Firmicutes phylum, having the greatest number of taxa, is represented by species of the Streptococcus, Staphylococcus, Lactobacillus, Gemella, Enterococcus, Aerococcus, Jeotgalicoccus, Granulicatella, Facklamia, Dubosiella, and Ileibacterium genera. Proteobacteria are comprised of members of the Enterobacter, Klebsiella, Shigella, Escherichia, Pasteurella, Providencia, Proteus, Actinobacillus, Acinetobacter, Neisseria, Rodentibacter, Rhodanobacter, and Devosia genera. The Bacteroidetes phylum comprises the Bacteroides, Parabacteroides, Porphyromonas, Helicobacter, Muribaculum, and Dysgonomonas genera, while Actinobacteria are represented by the Micrococcus, Sanguibacter, Microbacterium, Rothia, Corynebacterium, Actinomyces, Cutibacterium, and Turicella genera. At the species level, the species Streptococcus danieliae MOT10 was most frequently observed among the samples tested. The periodontal pathogen Porphyromonas gingivalis, albeit never isolated in the laboratory from the murine swab samples, was identified among the amplicon sequencing data of the 16S rRNA gene. The species has been observed in the wild mouse species and occasionally in experimental SPF laboratory mice, but never in healthy, untreated SPF mice. Laboratory versus wild mouse oral bacterial community. The SPF laboratory mice included in this study represented diverse background strains (C57BL/6J, C3H/Orl, CD-1, BALB/c) and were also obtained from a range of diverse sources (commercially purchased, in-house colonies, conventionalized from germfree mice, and genetic knockouts). In addition, samples were also collected from the formerly wild wood mouse Apodemus sylvaticus and the wild house mouse Mus musculus domesticus. Betadiversity analyses of the oral microbial populations of the sampled mice ( Fig. 2) showed distinct separation in the clustering of all three groups of mice. However, a number of shared species have also been observed between them, including Lactobacillus murinus (MOT93) and Streptococcus danieliae (MOT10) ( Table 1).
Culture versus 16S rRNA gene community profiling. Custom reference data sets for community profiling analysis were constructed using the current version of the database and used for data analysis. Analysis of the Illumina MiSeq amplicon sequencing data (the V1-V2 region of the 16S rRNA gene) in SPF mouse samples using the custom reference data set revealed that 98% of the total number of reads could be assigned to species level. Further comparative analysis of culturing data with this amplicon sequencing data set revealed a significant consensus in the SPF mouse samples, with all of the cultured species being represented at comparable levels with the next-generation sequencing (NGS) data (Fig. 3). Among the wild Mus musculus domesticus samples, 95% of the reads were identified to the species level using the custom reference data set, even though very little consensus has been observed with the culturing data in terms of abundance of individual species (Fig. 4). This could be explained by the fact that the species that show significantly higher abundance in the NGS data set (Muribacter muris, Neisseria species 1, and Porphyromonas gingivalis) are predominantly slow growers and therefore could have been missed being detected in the 48-h culturing protocol followed in this study. In the samples from the formerly wild wood mouse Apodemus sylvaticus, the proportion of reads positively identified using the custom data set fell to 83%, even though a higher consensus between culturing and NGS was observed compared to the wild Mus musculus domesticus, despite the increased diversity (Fig. 5).
Current versions of the reference data sets have been constructed for both the mothur and DADA2 pipelines and are available to download at https://figshare.com/s/ 2470f05ab77cdf40b2f8.
Mouse oral bacterial genomes. Draft genomes of 55 murine oral bacterial isolates have been generated, sourced from all the mouse groups included in this study ( Table 2). Representative genome sequences have been uploaded to the NCBI bacterial genome database and are publicly available at NCBI BioProject PRJNA671681. After assembly, the genomes had a mean contig number of 135 (6134) and GC% ratios ranging from 28 to 73%.
Predominant bacterial genera in laboratory mice. The following three genera were found to be of particular interest due to their presence in a wide range of samples tested. (i) Streptococcus. S. danieliae MOT10 was the most commonly detected species in all SPF laboratory mouse groups from multiple sources and mouse strains (Fig. 6A). This species was also isolated from the wild Mus musculus domesticus. Streptococcus species 5 MOT45 was isolated from the wood mouse A. sylvaticus (Fig. 5) and was found to be very closely related to, but distinct from, S. danieliae (16S rRNA gene sequence similarity, 98.4%) forming a murine S. danieliae cluster.
Some isolates of S. acidominimus MOT11 were observed in a group of SPF mice belonging to the BALB/c strain, whereas a few novel streptococcal species were also isolated in single occurrences (Streptococcus species 1, 2, and 3; MOTs 12 to 14).
(ii) Lactobacillus. With only two exceptions, all of the lactobacilli in mice were identified as L. murinus MOT93 (Fig. 6B). The two exceptions belonged to the L. taiwanensis-gasseri-johnsonii cluster at .99% identity and could not be distinguished based on the 16S rRNA gene sequence analysis. In silico DNA-DNA hybridization gene sequence analysis identified them as L. johnsonii MOT51 at 72.1% genome identity.
(iii) Gemella. All Gemella isolates from laboratory mice were found to belong to a novel, as yet unnamed, species, Gemella species 2 MOT43 (Fig. 6C). The nearest phylogenetic neighbor is the canine oral species, Gemella palaticanis at 97.4% 16S rRNA gene sequence identity. Another single, novel Gemella isolate was observed in A. sylvaticus wood mice and designated Gemella species 1 MOT33.

DISCUSSION
We report here the details of a curated murine database to represent the diversity of the oral microbiome in laboratory and wild mouse populations.
The low diversity of the mouse oral bacterial community, especially in SPF laboratory mice, has particularly stood out in this characterization. The presence of such a FIG 3 Relative abundances in the SPF laboratory mice (n = 13). We compared oral microbial population analyses in three background strains of SPF mice by laboratory culture (red) and Illumina MiSeq 16S rRNA gene amplicon sequencing (green) amplifying the V1-V2 region of the 16S rRNA gene. The amplicon sequencing data were analyzed using DADA2 v1.8 against a custom-made taxonomy reference data set from this database. The top 10 most relatively abundant taxa identified after normalization of counts are listed. small and specifically targeted population makes accurate species level identification especially significant for improving the outcomes of studies. In addition, the collection of draft genomes of the representative MOTs from the database is also of great benefit for microbial population studies in mice. This repository of genomes should enable the generation of quality reference data sets for metagenomic and metatranscriptomic analyses of mouse-based oral studies.
A comparative analysis of the culture and 16S rRNA gene community profiling data has further confirmed the low diversity of the lab mouse oral microbiome (see Fig. S1 in the supplemental material), which is dominated by four species-level taxa. We have reported here only a representative subset of the samples we have sequenced in order to demonstrate the development of the database. However, over the course of multiple experimental studies, we have shown the consistency in the results of these population analyses (2,7). The diversity of oral species-level taxa, in comparison, was found to be higher in the wild mouse species. Such variations in the microbial diversity and load within host species has been reported in the gut microbiome of other organisms and has largely been attributed to the influence of diet and environment (22,23).
Another observation of this study has been the variations observed between different strains of the laboratory mice; we have previously also observed this between batches of the same strain of mice obtained from commercial sources (unpublished data), which has also been reported in some older studies (24). Recently, Abusleme et al. also reported the variations observed in the oral microbial communities in C57BL/6 mice purchased from two major commercial animal suppliers and the increased stability of a particular type of population when the mice were cohoused (8). This raises a very important issue of having a well-curated reference database, as well as internal controls, to ensure accuracy in results in microbiome studies.
Among the three predominant host-specific MOTs identified, S. danieliae MOT10 is of particular interest due to its dominance of the oral bacterial community in the SPF laboratory mice. A recently proposed species of the Streptococcus genus, S. danieliae MOT 10 was originally described based on a murine cecal isolate for which the authors suggested an oral/upper respiratory tract origin (25). There have been few other reports of the organism so far (26)(27)(28), all of them of murine origin. The organism has also been reported to be one of the key drivers in the establishment of the oral microbial community in laboratory SPF mice after the eruption of teeth (8). Gemella species 2 has been isolated from oral samples of multiple strains of laboratory mice and needs further characterization to be taxonomically identified as an official bacterial species. Of these three species, L. murinus is the only species that was part of the Altered Schaedler Formula, the community of microbes that were used to originally colonize gnotobiotic mice to develop SPF laboratory mice as we know them today (29,30). Similar examples of host specificity have also been reported in other mouse microbiomes, including the colonization of segmented filamentous bacteria (31) and Muribaculaceae (19) in the mouse gut microbiome. This specificity also strongly indicates to the phenomenon of host-microbe coevolution and fitness characteristics (32).
In addition, there has been an increasing interest in recent times in the relevance of wild rodent models in research (33,34), particularly for understanding natural progressions of certain diseases in relation to the microbiome. A study involving microbial transfer of wild mice into laboratory mice has demonstrated the role of the natural or FIG 5 Relative abundances in the wood mouse Apodemus sylvaticus (n = 16). We compared oral microbial population analyses in the formerly wild wood mouse Apodemus sylvaticus, sampled from two sources (sets 1 and 2), by laboratory culture (red) and Illumina MiSeq 16S rRNA gene amplicon sequencing (green) amplifying the V1-V2 region of the 16S rRNA gene. The amplicon sequencing data were analyzed using DADA2 v1.8 against a custom-made taxonomy reference data set from this database. The top 10 most relatively abundant taxa identified after normalization of counts are listed.
wild microbiota in disease resistance and protective immune mechanisms (35). Further, a more recent study by the same group also showed that laboratory mice bred from such wild mice (referred to as wildlings) exhibited the elevated microbial diversity of the parent wild mice accompanied by a stability to perturbations such as antibiotics and diet, implying that a natural or wild immunity could be more comparable to that seen in the diversity of human illnesses (36). Hence, we considered it pertinent to include the bacterial taxa from some of these wild mouse samples (formerly wild wood mice [Apodemus sylvaticus] and wild Mus musculus domesticus mice) in the database, in addition to cultured oral isolates which could be used for in vivo and in vitro experimentations in the future. We fully appreciate that at present we report this from a limited number of sources and therefore representing only a fraction of the actual diversity observed in the wild, but we have plans to further expand this resource with the inclusion of more diverse sources based on availability. It is also pertinent here to point out the presence of certain taxa in this murine oral microbial population potentially of intestinal origin, which could be attributed to the coprophagic behavior in murine populations. Compared to the mouse intestinal bacterial collection database, we notice the presence of some shared taxa with our database, particularly species belonging to the order Lactobacillales (19). Equally, oral bacteria might also be found in the gut, particularly lactobacilli because of their aciduric nature. However, despite this overlap, the oral microbiome in mice remains distinct, far simpler, and less diverse than in the murine gut microbiome (see Fig. S2 and S3), as we have demonstrated by amplicon sequencing and characterization of a limited subset of murine fecal samples.
It is necessary to stress that this study primarily describes the development of a framework for this murine oral microbial database, which by its nature remains a work in progress. Work is under way with collaborators at the Forsyth Institute, Cambridge, MA, to make the final version of this database publicly available as the Mouse Oral Microbiome Database (MOMD), on the lines of the Human Oral Microbiome Database (HOMD) (16). This will enable the addition of sequences from other researchers in the field; these will be curated and made publicly available. Meanwhile, the cultured isolates analyzed for this database can be made available to interested researchers on request by contacting the authors. Further work is also being undertaken using sequence analysis and cloning to characterize the uncultured as well as the identified but unnamed novel species, including their taxonomic nomenclature, for the further expansion of the database.
We hope that this should enable researchers across the world to access and develop suitable reference data sets for both culture and culture independent studies of the murine oral microbiome.

MATERIALS AND METHODS
Mouse sampling and ethics. All animal experiments were conducted in accredited facilities in accordance with the UK Animals (Scientific Procedures) Act 1986 (Home Office license number 7006844). Fieldwork for the wild Mus musculus domesticus work was approved by the Animal Welfare Ethical Review Body (AWERB) at the Department of Zoology, University of Oxford.
Conventional SPF C3H/Orl and BALB/c mice were maintained in individually ventilated cages (IVCs) at the animal care facilities of Queen Mary University of London. Conventional SPF C57BL/6J mice were maintained in IVCs at the animal care facilities of King's College London. SPF C57BL/6J and CD-1 mice were also purchased commercially from Charles River Laboratories UK. In all, 191 SPF mice have been sampled over the course of 6 years for various experimental studies, and isolates obtained from the swabbing of these mice have been used for the analysis and development of this database.
One set (set 1) of formerly wild, now laboratory-bred, A. sylvaticus wood mice sampled in 2014 (n = 20) was originally established from wild wood mice that were live trapped from the UK woodlands and are now housed at the University of Edinburgh. Wild wood mice sampled in January 2017 (n = 39) were housed at the facilities at Fera Science (Sand Hutton, York), and the colony began with wild-caught wood mice from the grounds of the Institute; these animals had been laboratory-bred for 20 years and are referred to as set 2.
Wild house mice (Mus musculus domesticus [n = 21]) were sampled on the island of Skokholm in Wales, UK. These mice were captured temporarily in traps, sampled, and then released back into the wild.
Bacterial culturing. The murine oral cavity was swabbed for 30 s, using sterile fine-tip rayon swabs (VWR International), while the animal was held in a scruff. The swab was then placed in a tube containing 100 ml of reduced John's transport medium (see Text S1 in the supplemental material). Swabs from wild mice were stored at 220°C before being transported on ice to the laboratory. Serial dilutions of the suspension were spread onto blood agar plates containing 5% defibrinated horse blood (TCS Biosciences, UK) and incubated for aerobic and anaerobic (80% N 2 , 10% H 2 , and 10% CO 2 ) growth at 37°C for 48 h in a Don Whitley anaerobic chamber. The CFUs of predominant cultivable bacteria on each plate were counted. On an average, four to six different colony types could be identified on each blood agar plate. Every different colony morphology type observed was isolated and purified by subculture by restreaking the samples twice on fresh blood agar plates. Once the purity was established, the cultured isolate was cryopreserved for storage in Microbank bead tubes (Prolab Diagnostics) in duplicate. Briefly, a loopful of pure culture was aseptically transferred into the manufacturer's cryopreservative liquid containing beads, the tube was inverted four to five times for emulsification and allowed to stand for 2 min. Any excess liquid was aseptically removed, and the bead tube was then frozen at 270°C.
16S rRNA gene amplification and sequencing. Genomic DNA for each isolated bacterial strain was extracted by using a GenElute bacterial DNA kit (Sigma-Aldrich), following the Gram-positive protocol according to manufacturer's instructions, and used as a template for PCR. The 16S rRNA gene in the bacterial strains was amplified using modified versions of the universal 27FYM and 1492R 16S rRNA gene primers with built-in redundancies (see Text S2), using Phusion Green Hot Start II High Fidelity PCR Master Mix (Thermo Fisher Scientific). PCR conditions were as follows: initial denaturation at 98°C for 30 s; followed by 25 cycles of 98°C for 10 s, 47°C for 45 s, and 72°C for 30 s; followed in turn by a final extension at 72°C for 10 min. The amplified products were purified using Macherey-Nagel NucleoSpin gel and PCR Clean-Up (Fisher Scientific), followed by Sanger sequencing using the universal M13 primers M13 uni(-21) and M13 rev(-29) (Eurofins Genomics). For certain samples, internal primers for the 16S rRNA gene (342R, 357F, 519R, 907R, 926F, 1100R, 1114F, and 1392R) were also used for sequencing to improve the sequence coverage accuracy. All primer sequences have been provided in Text S2.
Allocation of mouse oral taxa. The forward and reverse sequences for each sample were assembled using the CAP3 assembly tool (37). Sequences were identified by BLAST interrogation of the NCBI nucleotide database. A sequence identity threshold of 98.5% was used for assignation to species, which is consistent with current recommendations (38) and the value used previously for the related human and canine oral databases (17,18). Sequences were aligned by means of the CLUSTALW algorithm in Bioedit (39), and maximum-likelihood phylogenetic trees for each genus were constructed using MEGA7 (40) with 100 bootstrap replicates. Each species-level taxon was assigned an MOT number. For taxa that were not identified as validly proposed species at 98.5% identity, a novel species-level designation was assigned as "Genusname_species1," and a new MOT number was allocated. For isolates that could not be definitively distinguished by 16S rRNA gene sequence analysis, each of the possible matching species was assigned a unique MOT ID in order to ensure maximum diversity capture. Assignment of taxa was also carried out using 16S rRNA gene amplicons selected from the MiSeq sequencing data that were not assigned an ID based on the existing reference database and then following the same BLAST protocol as described above. Visualization and annotation of the final phylogenetic tree of the identified 103 MOTs was performed using the web interface of iTOL v4 (41).
MiSeq 16S rRNA gene sequencing library preparation and DNA sequence analysis. Whole genomic DNA was extracted from the above swabs using the DNeasy PowerSoil kit (Qiagen) according to the manufacturer's instructions. Cell lysis was performed in the PowerBead tubes provided with the kit by bead beating on a vortex at maximum speed for 20 min. PCRs were performed with Phusion Green Hot Start II High Fidelity PCR Master Mix (Thermo Scientific) targeting the V1-V2 variable regions of the 16S rRNA gene using fusion primers 27F-YM (AGAGTTTGATYMTGGCTCA) and 338R-R (TGCTGCCTCCCGTAGRAG) combined with MiSeq adaptors and barcodes to achieve a double indexing system. The PCR conditions were as follows: initial denaturation for 5 min at 95°C; followed by 25 cycles of 95°C for 45 s, 53°C for 45 s, and 72°C for 45 s; followed in turn by a final extension of 72°C for 5 min. The amplified PCR products were cleaned and normalized in equimolar amounts by using a Sequal Prep normalization plate kit (Thermo Fisher Scientific). Extraction kit controls and PCR negative controls were included in the amplification plates, as well as sequencing pools. Pooled amplicons were sequenced at the Barts and the London Genome Centre using an Illumina MiSeq 2 Â 250 flow cell for paired-end sequencing. The generated reads were quality checked, filtered, trimmed, denoised, dereplicated, and assembled into amplicon sequence variants (ASVs) using the DADA2 v1.8 pipeline (42). The assembled ASVs were then assigned taxonomy at the genus and species level using a custom-formatted reference database constructed using the taxa included in this database. Compared to the mean of total reads in the murine swab samples, the negative controls generated a very low percentage of reads (,0.1% in the PCR control and ,1% in the kit control). Once this was confirmed, the negative samples were eliminated from further analyses. The generated ASV counts were normalized for sequencing depth by using the median of ratios method in the DeSeq2 (43) package in R, followed by beta-diversity and relative abundance analyses of the microbial population. Graphical analysis and plots were created using the R packages phyloseq (44) and ggplot2 (45). The raw sequencing reads have been uploaded to the NCBI SRA database (accession no. PRJNA642845).
Bacterial genome sequencing. A selection of bacterial isolates from the database were cultured, purity checked, and genomic DNA extracted using the GenElute Bacterial Genomic DNA kit (Sigma-Aldrich). These isolates are representatives of the cultured microbial population observed in various murine studies and include multiple candidates of the more commonly observed Streptococcus, Lactobacillus, and Staphylococcus species-level taxa. Genomic DNA libraries were prepared by MicrobesNG UK using Nextera XT Library Prep kit (Illumina, San Diego, CA) and sequenced on the Illumina HiSeq using a 250-bp paired-end protocol. Reads were adapter trimmed using Trimmomatic 0.30 with a sliding window quality cutoff of Q15 (46). De novo assembly was performed on samples using SPAdes version 3.7 (47), and contigs were annotated using Prokka 1.11 (48). Further genome analysis and annotation was also performed using the RAST server (https://rast .nmpdr.org/) (49). Phylogenetic distance and relatedness of certain isolates were determined using the genome-to-genome distance calculator, in the form of an in silico DNA-DNA hybridization (50).
Data availability. The 16S rRNA gene V1-V2 region amplicon sequencing data from this study are available from the NCBI SRA database under accession no. PRJNA642845. Draft genome sequences of representative murine oral bacterial isolates are available to download from NCBI BioProject no. PRJNA671681. 16S rRNA gene sequences of the novel, unnamed bacterial isolates from this study are available under NCBI accession numbers MN095260 to MN095271. The 16S rRNA gene sequences for all the isolates analyzed in this study are available under NCBI accession numbers MW175535 to MW175859. Custom taxonomy reference data sets of the database for NGS analysis using mothur and DADA2 pipelines are available to download from https://figshare.com/s/2470f05ab77cdf40b2f8.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, DOCX file, 0.01 MB.

ACKNOWLEDGMENTS
This study was funded by the UKRI Medical Research Council (award MR/P012175/2). W.G.W. is supported by NIH-NIDCR (grant R37 DE016937). The wild Mus musculus work was funded by a NERC fellowship NEL011867/1 to S.C.L.K. The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank D. Randall and F. Filomena for providing access to their murine bacterial isolates for inclusion of data in this study. We thank Melanie Clerc at the University of Edinburgh and the staff at Fera for assistance in collecting the wood mouse samples.
S.J., W.G.W., and M.A.C. contributed to conception, design, data acquisition, and analysis and interpretation and drafted and critically revised the manuscript. J.A.-O. contributed to data acquisition, analysis, and interpretation and drafted and critically revised the manuscript. A.H., E.H., R.S., S.C.L.K., and A.B.P. contributed to data acquisition and critically reviewed the manuscript. All authors gave final approval and agree to be accountable for all aspects of the work.
We declare there are no competing interests.