Supragingival Plaque Microbiome Ecology and Functional Potential in the Context of Health and Disease

Oral health has substantial economic importance, with over $100 billion spent on dental care in the United States annually. The microbiome plays a critical role in oral health, yet remains poorly classified. To address the question of how microbial diversity and function in the oral cavities of children relate to caries diagnosis, we surveyed the supragingival plaque biofilm microbiome in 44 juvenile twin pairs. Using shotgun sequencing, we constructed a genome encyclopedia describing the core supragingival plaque microbiome. This unveiled several new previously uncharacterized but ubiquitous microbial lineages in the oral microbiome. Caries is a microbial community metabolic disorder that cannot be described by a single etiology, and our results provide the information needed for next-generation diagnostic tools and therapeutics for caries.

coassembly using metaSPAdes and subsequent quality control with QUAST (28). The contigs were annotated for phylogenetic and functional content and then binned by k-mer utilization and across sample representation into genome bins. Thirty-four of these passed quality control and were graduated to metagenome assembled genomes (MAGs) ( Table 1; see Table S5 in the supplemental material; see also Fig. 5). MAGs are sets of assembled contigs determined to likely originate from the same organism based on basic nucleotide usage (e.g., k-mer) or coverage across different sample sets; there are established methods for determining if these are pure (29) or complete (30). These are likely to not capture genomic "island regions" that are specific to individual samples but can be considered a consensus core genome for a specific species (31). Contigs annotated as Streptococcus interfered with the k-mer binning but were also easily identified; therefore, we removed these contigs to facilitate the binning of other genomes. The Streptococcus population was examined using the pangenomic analysis package MIDAS (5). Reads from each subject were mapped back to the genome bins to create an assembly-linked matrix of relative abundance, phylogenetically characterized genome origin, genome bin functional contents, variance component estimation, and the phenotypic state of the host (caries or not).
Core supragingival microbiome community, phenotype-specific taxonomic enrichment, and cooccurrence. We observed a genome-resolved microbiome dominated by Streptococcus and Neisseria across the cohort with relatively little variability ( Fig. 1). Many of the taxa previously associated with supragingival biofilms were found (4), but we report the first binned genomes for Gracilibacteria, which have only been previously observed in the oral cavity using 16S rRNA screens. Three genomes for the TM7 lineage (32)(33)(34) were also recovered with substantial differences in genome architecture. Each of the 88 subjects contains the same genomes; these genomes comprise a core supragingival plaque microbiome that has relatively minor but statistically significant changes in proportional abundance. Sequences and genome bins for Archaea, Eukaryota (other than the host), or viruses were not observed. The core supragingival plaque microbiome defined here certainly includes far more than the 34 genome bins and 119 Streptococcus strains recovered here, though they are by nature numerically rare members of the community. These rare organisms could be later captured using deeper sequencing and a combination of reference-based and assembly-centric binning in an iterative manner.
The presence of caries is associated with statistically significant enrichments in the relative abundance of specific taxa when comparing the caries-negative and cariespositive cohorts (Fig. 1B). Namely, the abundance profiles of Streptococcus, Catonella morbi, and Granulicatella elegans were enriched in caries-positive subjects, while Tannerella forsythia, Gracilibacteria, Capnocytophaga gingivalis, Bacteroides sp. oral taxon 274, and Campylobacter gracilis were more abundant in the healthy cohort ( Fig. 1A; Table S2). Community composition profiles were different according to caries progression; Alloprevotella rava, Porphyromonas gingivalis, Neisseria oralis, and Bacteroides oral taxon 274 were significantly enriched in subjects with carious enamel only (Fig. 1A). The Streptococcus community across all microbiomes contained 119 strain genomes detected at high resolution (Fig. 2). Within this mixed Streptococcus community, 86% is composed of S. sanguinis, S. mitis, S. oralis, and an unnamed Streptococcus sp. As with the bulk community, we found statistically significant strain abundance variations associated with the presence of caries in S. parasanguinis, S. australis, S. salivarius, S. intermedius, S. constellatus, and S. vestibularis; this enrichment was specific to rare strains that make up less than 5% of the Streptococcus ( Fig. 2A and B; Table S2). During the global progression of carious lesions from enamel to dentin, we observed a modest statistical enrichment (P Ͻ 0.05) in S. parasanguinis and S. cristatus (Table S2). However, these are general trends considering phenotype subsets for a single time point per individual; this data set does not support a longitudinal analysis, which will be the topic of future studies.
To investigate the role of community alpha diversity in caries, we compared Shannon entropy measures for the healthy and diseased cohorts separately in both the  Fig. 2D. An examination of a genome cooccurrence network revealed a distinct topology illustrated in Fig. 3. For each genome, we also computed PageRank centrality, a variant of eigenvector centrality (35), which allows us to measure the influence of bacterial nodes within our cooccurrence networks. A select few MAGs form a cooccurrence cluster ( Fig. 3; cluster 3, 4; n ϭ 6) containing the 6 most highly connected taxa, 3 of which are statistically enriched in the diseased cohort while 2 are enriched in the healthy cohort. This microbial clique contains the highest-ranking PageRank centrality MAGs in the network and can be cleanly divided into a caries-affiliated subset and a subset enriched in bacteria associated with healthy phenotypes. The Streptococcus strain-level cooccurrence network topology varied greatly between caries-positive and Supragingival Plaque Microbiome Ecology ® -negative cohorts (Fig. 2C). Specifically, we observed that healthy subjects harbored a Streptococcus community influenced by a few key strains with high PageRank measures. In contrast to the healthy cohort, caries-affected individuals harbored a much more dispersed strain-level Streptococcus cooccurrence network.
Metabolic tradeoffs between closely related species. The k-mer-based genome binning unveiled several closely related pairs or triplets of genomes (Fig. 5A). These include Actinomycetes (group I), Alloprevotella (group II), "Candidatus Saccharimonas" TM7 (group III), Gracilibacteria (group IV), and Prevotella (group V) lineages (Table 1). This allowed us to examine the extent to which metabolic potential varies for ubiquitous microbial taxa in the human supragingival microbiome, two of which are from recently identified bacterial phyla (i.e., TM7 and Gracilibacteria). As shown in Fig. 5B, substantial variations in metabolic potential distinguish the different MAGs with emphasis on the Alloprevotella group. Similarly, the MAGs also display different abun-    Fig. 5A) contain one representative that has a high genetic component and another that is environmentally acquired, differing in the potential for nitrate reduction, biosynthesis of sulfur-containing amino acids, PRPP, and sugar nucleotides. Alloprevotella MAG II.B contains complete metabolic pathways for phosphoribosyl pyrophosphate (PRPP) biosynthesis from ribose-5-phosphate (M00005) and cysteine biosynthesis from serine (M00021); these metabolic modules are completely absent in the Alloprevotella MAG Three MAGs for the recently discovered candidate phylum TM7 (32-34) were recovered, with each of them showing high heritability estimates. The recent cultivation of TM7 from the oral microbiome revealed a parasitic nature with Actinobacteria as hosts (34). With our recovered TM7 genomes, we observe very little cooccurrence patterns between TM7 MAG III.C and any other taxa. Curiously, TM7 MAGs III.A and III.B have an inverse relationship, in terms of cooccurrence profiles (rho ϭ Ϫ0.34), suggesting different hosts or functional niches. TM7 MAG III.B is positively correlated with taxa associated with the caries-positive subjects-Streptococcus, C. morbi, and G. elegans-while MAG III.A is negatively associated with these taxa. TM7 MAG III.A contains complete pathways for dTDP-L-rhamnose biosynthesis (M00793), F-type ATPase (M00157), putative polar amino acid transport system (M00236), energy-coupling factor transport system (M00582), and SenX3-RegX3 (phosphate starvation response) two-component regulatory system (M00443) that are entirely absent in TM7 MAG III.B. TM7 MAGs III.A and III.B provide unique functional capabilities to the entire community, containing components for F420 biosynthesis (M00378) and SasA-RpaAB (circadian timing-mediating) twocomponent regulatory system (M00467), respectively.
Two MAGs with phylogenetic affinity to Actinomyces have been recovered in our analysis. Actinomyces MAG I.A exclusively has all of the components for the rhamnose transport system (M00220) while Actinomyces MAG I.B exclusively has components for the manganese transport system (M00316) compared to the rest of the community. Both Actinomyces MAG I.A and MAG I.B are the only genomes to contain components for the DevS-DevR (redox response) two-component regulatory system (M00482).
Two MAGs for Gracilibacteria were recovered. Mapping of recent human oral WGS samples from the extended Human Microbiome Project (37) to these genomes show they are also found in North American subjects (Table S4). Metabolic reconstructions describe the capability for anoxic fermentation of glucose to acetate, as well as auxotrophies for vitamin B 12 and a preponderance of type II and IV secretion systems. Both Gracilibacteria cooccurrence profiles show negative relationships with Veillonella sp. oral taxon 780 while showing positive relationships with Capnocytophaga gingivalis, suggesting competitive and commensal relationships, respectively. Gracilibacteria MAG IV.B is the only organism in our recovered oral community that has components for the BasS-BasR (antimicrobial peptide resistance) two-component regulatory system (M00451) and the cationic antimicrobial peptide (CAMP) resistance arnBCADTEF operon (M00721) while MAG IV.A exclusively contains components for the NrsS-NrsR (nickel tolerance) two-component regulatory system (M00464). Gracilibacteria MAG IV.A contains complete pathways for phosphate acetyltransferase-acetate kinase (M00579), the conversion of acetyl-CoA into acetate, while being completely absent in MAG IV.B. Gracilibacteria MAG IV.B contains complete components of the ABC-2 type transport system (M00254), which is completely absent in MAG IV.A.

DISCUSSION
Community-scale profiles for caries. In this study, we took a hybrid approach of utilizing read mapping for taxa with excellent genome representation (e.g., Streptococcus) while generating new reference genomes de novo for less well represented organisms. These genome bins also allow for improved databases for future read mapping analyses and indeed identified taxa present in previous oral microbiome studies (37) that were overlooked. Our results suggest that caries status can be accurately associated with and, potentially, diagnosed by profiling specific taxa beyond Streptococcus mutans. We observed a loss of community diversity within the diseased subjects (Fig. 1C), a finding previously reported in 16S rRNA amplicon studies (21), with an increase in Streptococcus strain diversity (Fig. 2D). The observed differences in the overall community structure, as seen from our cohort study, suggest that profiling the abundance of multiple taxa may present an opportunity for caries diagnosis and preventative methods. This finding is also functionally relevant as many organisms, other than Streptococcus, statistically enriched within the caries-positive subjects are also both anaerobic glucose fermenters and acidogenic, a trend extending to strainlevel Streptococcus populations as well. In the healthy cohort, the Streptococcus community exhibits less variation in terms of abundance and cooccurrence at the strain level. This is not the case in the diseased cohort, in which we observe higher Shannon entropy (Fig. 2). In an independent analysis, unsupervised clustering of diseased and healthy cohorts showed that ecological states in healthy subjects consistently have a Streptococcus community abundance that is either lower than or comparable to the rest of the healthy cohort (Յ0.27), with the exception of cluster 7 (0.44), while being much more unpredictable in the diseased cohort (see Fig. S3 in the supplemental material). This phenomenon may be the result of grouping carious lesions of various degree into a single classification. However, this finding also supports the fact that caries is a dynamic and progressive disease whose rate of progression can change dramatically over short periods of time. These results suggest that caries onset cannot be described by a single bacterium but should be described as the perturbation of an entire ecosystem, consistent with hypotheses sourcing the disease from metabolic and community-driven origins.
The dynamics of the Streptococcus community, at both the genus and strain levels, explain the observation of low connectivity in the diseased cooccurrence network, supporting our hypothesis that caries onset is a community perturbation. The highly connected cluster in the microbial community cooccurrence network (Fig. 2C) can be interpreted as influential organisms that drive the dynamics of other taxa in the community. The aforementioned high-PageRank microbial clique can be subdivided into a subset enriched in diseased subjects (Fig. 3, cluster 3) and a subset enriched in healthy individuals (Fig. 3, cluster 4). It is possible that disease state is influenced by a balance between these microbes, though empirical in situ studies are required to move past speculation.
Caries is a community-scale metabolic disorder. Our metabolic and community composition analyses challenge a single-organism etiology for caries and coincide with previously published ecological perspectives (19,23,25,26,38,39). These authors suggested that the changes in functional activities of the microbiome were the major cause of caries while simultaneously suggesting regime shifts (e.g., see reference 40) in community composition. They posited that increases in sugar catabolism potential are the key markers of caries progression; here we observed enrichments in nearly a dozen phosphotransferase sugar uptake systems. A novel insight was the enrichment for diverse sugar uptake pathways instead of just those associated with glucose, notably, glucose, galactitol, lactose, maltose, alpha-glucoside, cellobiose, and N-acetylgalactosamine phosphotransferase uptake pathways were all enriched. Takahashi and Nyvad (7) proposed an "extended ecological caries hypothesis," suggesting that community metabolism regulates adaptation and selection; once acid production starts to proceed, it provides evolutionary selection for adaptations to variable pH. Analogous to the expansion of the S. mutans-centric paradigm to the entire community, the structural potential of the community for sugar catabolism diversifies greatly in caries-positive states. This suggests that a healthy phenotype has self-stabilizing functional potential; numerous sugar compounds are not easily taken up by the community, preventing the associated pH decrease. In contrast, the progression to a caries state increases the diversity of sugar compounds available to the community catabolic network, thereby facilitating a pH decrease from an increased proportion of dietary input. This likely leads to increased pH fluctuations in both frequency, magnitude, and, potentially, duration. This is entirely consistent with the hypothesis that caries dysbiosis results from a community-scale metabolic shift and that progression has a feed-forward evolutionary adaptation pressure. At this point, it is difficult to identify the originating environmental pressure that might positively select for catabolism of diverse sugars, though host diet is a possible explanation.
The increase in sugar uptake potential is unsurprising but also increases confidence in the other trends that have not been described before. Within caries-positive microbiomes, we observed a notable enrichment of numerous two-component histidinekinase response-regulator pairs, which are responsible for transcriptional changes in response to environmental stimuli. It is likely that this represents a community adaptation to increased variations in biofilm pH due to increased sugar catabolism, as well as a more dynamic community-scale microbial interaction network. Pathways for antibiotic resistance were also enriched in the communities of caries-positive subjects. To our knowledge this has not been previously reported. However, this could be an artifact of database bias, considering that carious lesions are associated with enrichment of Streptococcus, which has an abundance of well-characterized antibiotic resistance gene systems. An alternative explanation leverages the observation that carious states are characterized by inherently chaotic community profiles. Specifically, healthy caries-free states are associated with a relatively stable Streptococcus community, whereas the caries-positive Streptococcus communities are far more dynamic and diverse (Fig. 2D). It is possible that increased pH temporal gradients associated with caries select for antibiotic resistance due to increased antagonistic microbial interactions. The possibility that antibiotics included in toothpaste influence the prevalence of antibiotic resistance pathways could not be tested here.
The most surprising and difficult-to-explain enrichment in community functional potential associated with caries is the gain of metal transport pathways involved in the efflux of Zn or Mn, Ni, and Co. This is not likely due to a host response, as this normally involves host acquisition of Fe, Mn, and Zn, along with Cu efflux (41)(42)(43). We propose that the degradation of enamel, and subsequently dentin, results in the release of locally toxic levels of trace metals for the biofilm constituents, thereby providing an environmental selection pressure for the detoxification of these elements. This parallels geomicrobiology surveys where microbial activity degrades physical substrates like rocks and inorganic minerals (44). It also follows that other governing principles based on geomicrobiology systems may apply to the human supragingival microbiomes. Functional plasticity within poorly described core supragingival microbiome lineages. We often ascribe functional potential to species based on type strains. To some extent, this is sensible as function and phylogeny are interlinked, and this has resulted in computational tools that associate phylogenetic loci, such as 16S rRNA, with functional potential (45). However, the genome plasticity makes it difficult to extrapolate function with confidence in some cases. For example, it has been shown that some strains of S. mutans can be less acidogenic than other streptococcal species (46), which could affect the organism's pathogenicity in the context of cariogenesis. This phenomenon may serve to explain why we did not witness an enrichment of S. mutans in the diseased microbiota, an observation noticed in previously published studies (47)(48)(49). Our genome-centric approach revealed rather dramatic differences in functional potential between MAGs in the core supragingival microbiome. These include changes in vitamin and amino acid auxotrophy, environmental sensing, and nitrate reduction. Closely related genomes also exhibited differences in patterns of abundance and cooccurrence across the 88 subjects. The differences in functional potential and genome autecology suggest these taxonomically similar microbes have different metabolic roles in the supragingival microenvironment as mentioned in previous microbiome studies from the gut consortia (50) and in vitro biofilm cultures (51). Many of the newly described core supragingival microbiome MAGs greatly increase our genomic knowledge about previously poorly described lineages. In particular, to date only two Alloprevotella genomes have been published, only one TM7 oral microbial genome has been described, and this is the first description of oral Gracilibacteria. In all cases, multiple genomes with different autecology and metabolic capabilities were recovered.
Alloprevotella. The type strain, Alloprevotella rava, historically referred to as Bacteroides melaninogenicus, is an anaerobic fermenter producing low levels of acetic acid and high levels of succinic acid as fermentation end products while being weakly to moderately saccharolytic (52). Only two reference genomes are available for Alloprevotella, including A. rava and Alloprevotella sp. oral taxon 302 (37). As mentioned above, our analysis recovered two MAGs of A. rava, one of which has a high genetic component (MAG II.A; A ϭ 0.64) and the other of which is environmentally acquired (MAG II.B), according to the ACE model. The environmentally acquired Alloprevotella MAG II.B has reduced potential for cysteine biosynthesis, specifically the conversion of methionine to cysteine (M00609), and appears to be a methionine auxotroph; the potential for methionine biosynthesis from aspartate (M00017) and the methionine salvage pathway (M00034) are both higher in MAG II.A (Fig. 5B). Furthermore, Alloprevotella MAG II.A is completely lacking all the components for cysteine biosynthesis from serine, but can synthesize cysteine from methionine. The amino acid auxotrophies can be satisfied by the amino acids found in saliva, but the divergence in cysteine biosynthesis pathways is striking. A similar scenario exists for cationic antimicrobial peptide (CAMP) resistance: Alloprevotella MAG II.A contains more envelope protein folding and degradation factors (e.g., DegP and DsbA) while phosphoethanolamine transferase EptB is found in the environmentally acquired MAG II.B. Interestingly, the heritable strain contains more components for two-component regulatory systems (M00504, M00481, and M00459), transport systems (M00256 and M00188), and nisin resistance (M00754) than environmentally acquired MAG II.B. A final key metabolic acquisition novel to Alloprevotella MAG II.B relative to both cultivated and uncultivated Alloprevotella is the presence of components necessary for dissimilatory nitrate reduction. The oral production of nitrite from dietary or saliva-derived nitrate is the first step in the enterosalivary nitrate circulation (16), and this is the first implication of Alloprevotella in this cycle. The environmentally acquired Alloprevotella MAG II.B has a larger genome with a slightly higher CheckM-calculated contamination than MAG I.A, suggesting that there may be several Alloprevotella rava strains that are acquired by the environment with similar k-mer content. Interestingly, Alloprevotella MAG II.B is also enriched in enamel caries compared to caries that have progressed to the dentin layer, indicating that the environmentally acquired strains may play a role in the onset of carious lesions. Longitudinal metagenomic studies would be necessary to determine if the environmentally acquired Alloprevotella rava displaces the heritable strain with age and diet.
TM7. TM7 has been found in a variety of environments using cultivationindependent methods such as 16S rRNA sequencing. The first described genomes were assembled from wastewater reactor (53) and groundwater aquifer metagenomes, while the first cultivated strain was derived from an oral microbiome (34). A comparison of those three genomes revealed highly conserved genomic content and, generally, low percent GC (34). The recovery of three TM7 MAGs (named oral_TM7_JCVI III.A, III.B, and III.C) greatly adds to our knowledge about this unique lineage. Even the most coarsegrained comparative genomics analysis, that is, average GC content, revealed that MAGs III.A, III.B, and III.C are quite divergent (Fig. 6). TM7 MAG III.B is most similar to the cultivated strains at 35% GC, while MAG III.A is 46% GC and MAG III.C is 55% GC. TM7 MAG III.A is almost certainly a pangenome of multiple TM7 strains of relatively intermediate GC content based on the bimodal GC profile and the CheckM contamination of 159.28, though they were clearly separated in 5-mer space (Fig. 5). TM7 MAG III.B has an inverse cooccurrence pattern with MAG III.A (rho ϭ Ϫ0.34) and no cooccurrence relation with MAG III.C (rho ϭ 0.03) in this study (Fig. 6A), strongly suggesting they have different ecological roles. Altogether, these TM7 genomes likely each represent a new family within the TM7 phylum.
The cultivated TM7 is an obligate epibiont parasite of oral Actinomyces (34). However, we did not observe strong cooccurrence patterns between the TM7 MAGs and any of the Actinobacteria, though it is not known if these relationships should have a strong cooccurrence or a lagged predator-prey cycle. The TM7 MAG III.B genome contains a near-complete pentose phosphate pathway, which indicates that this MAG has gained at least one form of energy-generating sugar metabolism relative to the other TM7. The TM7 MAG III.A genome uniquely contains the pathway for the synthesis of cofactor F420, though none of the components for methanogenesis. Instead, it is likely a cofactor in a flavin oxidoreductase of unknown function (54) and possibly involved in nitrosative (55) or oxidative stress (56). Given the likely production of nitrite by Alloprevotella and other members of the supragingival biofilm via dissimilatory nitrate reduction, the ability to detoxify nitrite has a protective role for both TM7 MAG III.A and its putative host.
Gracilibacteria. The presence of Gracilibacteria in the human oral microbiome was first reported in 2014 through 16S rRNA screening, with two 16S clades from human oral or skin samples (33). The first genome for Gracilibacteria was a single amplified genome from a deep-sea hydrothermal environment (57), and here we describe two Gracilibacteria MAGs that, analogous to TM7, are differentiated by fundamental genomic characteristics. Gracilibacteria MAG IV.A has a percent GC of around 37% while MAG IV.B averages 25% GC, making it one of the lowest-GC genomes reported (Fig. 6B). Both Gracilibacteria MAGs use the alternative opal stop codon. The Gracilibacteria MAGs detected here are most phylogenetically similar to those recovered from the East Central Atlantic hydrothermal chimney (57,58). Both of our recovered Gracilibacteria MAGs appear to be anaerobic glucose fermenters producing acetate as a product, while containing numerous putative auxotrophies for vitamins and amino acids (see Table S5 in the supplemental material). Each genome bin contains at least two secretion systems (type II and IV) and CRISPR/CAS9 systems. Based on the small genomes, low GC content, and limited metabolic potential, we propose that these organisms are likely intracellular, or epibiont, parasites of other bacteria analogous to TM7. The two oral Gracilibacteria differ moderately with regard to functional potential and genome autecology. Gracilibacteria MAG IV.B contains an enrichment in functional potential for cationic antimicrobial peptide (CAMP) resistance (M00721, M00722, and M00728) and twocomponent pathways suggesting more metabolic interactions than MAG IV.A. Gracilibacteria MAG IV.B also uniquely contains an ABC-2-type polysaccharide or drug efflux system. Gracilibacteria MAG IV.A is also significantly enriched in healthy individuals, suggesting that this MAG may be beneficial for maintaining stable microbial solution states with regard to caries status.
Gracilibacteria MAGs IV.A and IV.B strongly cooccur with one another, suggesting a commensal or synergistic relationship. Gracilibacteria MAG IV.A has negative cooccurrence with Veillonella sp. oral taxon 780 and Actinomyces MAG I.B (rho ϭ Ϫ0.549 and Ϫ0.497, respectively) with a positive cooccurrence with Capnocytophaga gingivalis and Prevotella MAG V.B (rho ϭ 0.467 and 0.422, respectively). Gracilibacteria MAG IV.B has negative cooccurrence with Prevotella oulorum and Veillonella sp. oral taxon 780 (rho ϭ Ϫ0.589 and Ϫ0.516, respectively) with a positive cooccurrence with Cardiobacterium hominis and Capnocytophaga gingivalis (rho ϭ 0.486 and 0.462, respectively). Veillonella plays a role in the anaerobic fermentation of lactate to propionate and acetate by the methylmalonyl-CoA pathway. As mentioned above, Capnocytophaga gingivalis is found along the Corynebacterium base of the biofilm but in high density within the annulus due to a high demand for carbon dioxide (59). This may suggest that Gracilibacteria resides in the annulus of the biofilm, moderately interacting with Corynebacterium in regions where carbon dioxide is accessible. The enrichment in cooccurrence of Gracili-bacteria MAG IV.B with Neisseria oralis compared to MAG IV.A (rho ϭ 0.314 and 0.122, respectively) may suggest that MAG IV.B colonizes the biofilm periphery in microaerophilic environments.
Conclusions. Collectively, we have provided a holistic overview of the juvenile supragingival plaque microbiome. Many of the same genomes were found across all 88 subjects, describing a core microbiome. With the presence of caries, the abundance of constituents of the core microbiome changes to a variety of ecological states where the community networks are perturbed. The phenotypes of health and carious lesions should be characterized not only by the abundance of taxa but also by the functional potential of the community. Sugar-fermenting bacteria are always enriched in carious states, as are the abundance and diversity of sugar uptake pathways. This strongly supports the idea that caries phenotype is a community metabolism disorder. Carious lesions are also accompanied with an enrichment in environmental sensing and antibiotic resistance, suggesting that acid production provides a selection pressure. Finally, we provide genomic information about the metabolic diversity of several organisms represented by poorly described microbial lineages.

MATERIALS AND METHODS
Study design. Our objective was to compare the metagenomic signatures, in dental plaque, of children with and without dental caries. Our hypotheses were that (i) there would be a measurable difference in species and diversity between the two groups and (ii) some species present in plaque samples associated with caries will have been previously implicated in cariogenesis. Dental plaque samples were collected from participants of the University of Adelaide Craniofacial Biology Research Group (CBRG) and the Murdoch Children's Research Institute (MCRI)'s Peri/Postnatal Epigenetic Twins Study (PETS) (60). The PETS (n ϭ 193) and CBRG (n ϭ 292) cohorts were composed of twins 5 to 11 years old. Twins from the state of Victoria, Australia (PETS), were recruited during the gestation period. All contactable twins from the PETS cohort will be eligible for participation. Inclusions were those twins whose parent consented to this particular wave of the study and who were recruited into the study during the gestation period. Ethics approval was attained from the University of Adelaide (Adelaide, Australia) Human Research Ethics Committee, The Royal Children's Hospital Melbourne (Parkville, Australia) Human Research Ethics Committee, and the J. Craig Venter Institute Institutional Review Board.
Dental plaque samples were obtained at the commencement of a dental examination. The participants had not brushed their teeth the night preceding the plaque collection and on the day of collection. Additional data were collected from three separate questionnaires completed by the parents during the period from consent to prior to the dental examination being undertaken. The combined questionnaires consisted of 132 questions regarding oral health, dietary patterns and general health and development. Fifteen of these were extracted for use by JCVI for this analysis.
The entire dentition of each participant was assessed using the International Caries Detection and Assessment System (ICDAS II) (61). The ICDAS II is used to assess and define dental caries at the initial and early enamel lesion stages through to dentinal and final stages of the disease. Examiners were experienced clinicians who had undergone rigorous calibration and were routinely recalibrated across measurement sites to minimize error. Caries experience in each participant was initially reduced to a whole-mouth score, and three classifications were utilized: no evidence of current or previous caries experience, evidence of current caries affecting the enamel layer only on one or more tooth surfaces, and evidence of previous or current caries experience that has progressed through the enamel layer to involve the dentin on one or more tooth surfaces (including restorations or tooth extractions due to caries). For the purpose of this analysis, we classified disease phenotypes in twins as presence of caries in enamel or dentin.
The number of pairs selected for metagenomic sequencing was constrained by budget; thus, a subset of the broader clinical cohort was subsampled. Twin pairs were selected for sequencing manually by examining ordination plots from the broader 16S rRNA gene sequencing study (27) and then selecting (i) twins of the same phenotype that were closely related and (ii) twins discordant for caries that were divergent in ordination space.
Sample collection, DNA extraction, library prep, and sequencing. Plaque sample collection and DNA extraction were conducted as specified in reference 27. Libraries were prepared using the NEBNext Illumina DNA library preparation kit according to manufacturer's specifications (New England Biolabs, Ipswich, MA). Metagenomic libraries were sequenced using the Illumina NextSeq 500 High-Output kit for 300 cycles following standard manufacturer's specifications (Illumina Inc., La Jolla, CA). Subjects without caries in enamel or dentin are referred to as healthy while subjects with caries in either enamel or dentin are referred to as diseased unless otherwise noted.
Metagenomic coassembly. Kneaddata (62) was used for quality trimming on the raw sequencing reads, as well as screening out host-associated reads. Assembly was performed using SPAdes genome assembler v3.9.0 (metaSPAdes mode) (63) with the memory limit set to 1,024 GB. Reads were pooled, and initial assembly attempts resulted in exceeding memory limits. Therefore, each library was subsampled randomly to 25% of the total for the final coassembly. Quality-trimmed reads from all samples were mapped back to the final metagenomics coassembly to generate abundance profiles across all 88 subjects.
Genome binning. The t-Distributed Stochastic Neighborhood Embedding algorithm (64,65) applied to center-log-transformed k-mer profiles (k ϭ 5) implemented in VizBin (66) raises a memory error when computing metagenomics assemblies of this scale. To address this issue, we developed a semisupervised iterative linear algebra technique to extract metagenome assembled genomes (MAGs) from the deluge of contigs in the de novo assembly. The pipeline is as follows: (step 1) use a conservative contig size threshold of 2,500 nucleotides for the initial binning; (step 2) eigendecomposition to calculate a representative vector in the direction of greatest variance (i.e., the 1st principal component [PC1]) of (2a) the coverage profiles and (2b) the k-mer profiles for each visually identified bin (66) of larger contigs; (step 3) subset the remaining contigs between the bandwidth of 300 and 2,500 nucleotides while computing the Pearson correlation between each bin's PC1 and each contig in the smaller-contig-length subset; (step 4) extract the contigs (n ϭ 500 contigs per bin) with the highest Pearson correlation to each bin's PC1 from steps 2a and 2b; (step 5) merge the results of step 4 with the coarse bins from step 1 and recalculate the embeddings to generate finer-scale bins; and (step 6) iterate step 5 until convergence to yield the finalized draft genome bin (MAG). The quality of each of the finalized draft genome bins, or MAGs, was assessed for completeness, contamination, and strain-level heterogeneity using CheckM (v1.07) (29). Streptococcal annotated contigs were set aside during the binning process due to their promiscuous k-mer usage between strains.
Annotation. Annotation methods were as described in reference 67 with slight modifications. Open reading frames (ORFs) were called with FragGeneScan (v1.16), except the Gracilibacteria bins, which were called with Prodigal (v2.6.3), using the Candidate Division SR1 and Gracilibacteria genetic code (trans_ table ϭ 25) (68,69). For ORF annotations, domains were characterized using a custom compilation database with HMMER (v3.1b2) and functionality was further assessed via best-hit BLAST results and manual curation (70,71). Contig and draft genome bin taxonomic assignments were determined by calculating a running sum of percent amino acid identities of the ORFs for each bin grouped by their respective taxonomy identifier from PhyloDB (https://github.com/allenlab/PhyloDB). PhyloDB is a comprehensive database of existing NCBI genomes, JGI-only single amplified genomes, and the MMETSP (72). The maximum weighted taxon at the species level was used to classify each MAG. The phylogenetic tree traversal was conducted using ete3's NCBITaxa object, implemented in Python, to extract labeled hierarchies from taxonomic identifiers (73).
Microbial community composition. Metagenomic reads were mapped to contigs using CLC with a minimum spatial coverage of 50% and a minimum percent identity of 85%. The resulting count tables were transformed by adapting the transcript per million (TPM) (74) calculation for use on contigs to incorporate length, coverage, and relative abundance into the measure, an essential normalization for metagenome assemblies due to the inherently wide distribution of contig lengths. Summations of TPM values grouped by bin assignment were used as abundance values for draft genome composition and downstream analysis unless otherwise noted.
Strain-resolved Streptococcus abundances were calculated using the Metagenomic Intra-Species Diversity Analysis System (MIDAS) (5) with the subsampled subject-specific reads. MIDAS subprograms "genes" and "species" were run with default settings of 75% alignment coverage, 94% percent identity, and mean quality greater than 20. Streptococcus strains were parsed from the subject-specific count matrices to build a master count matrix containing all Streptococcus strains with respect to each individual twin.
Statistical analyses. Pairwise log 2 fold change (logFC) profiles were calculated between groups by combinatorically computing the logFC between each nonredundant pair of the comparison groups and taking the mean of the distributions. All P values were calculated using the Mann-Whitney U test unless specifically noted otherwise. The statistical significance values were set at an inclusive threshold of 0.05 for determining enrichment in the following contexts: (context 1) MAGs via microbial abundance between diseased and healthy phenotypes, (context 2) functional module via phylogenomically binned functional potential (PBFP) profiles (see below) between diseased and healthy phenotypes, and (context 3) functional modules via PBFP profiles that were statistically significant between the groups identified in context 1. Spearman's correlation coefficients are denoted as rho unless otherwise noted.
Cooccurrence network analysis. Pairwise Spearman's correlations were used to robustly measure monotonic relationships between microbial abundance profiles, calculated for the MAGs and Streptococcus strains separately. MAG abundance profiles were standardized by TPM normalization, log transformation, and z-score normalization for each subject. In the Streptococcus strain-level cooccurrence analysis, strains were dropped from the calculation if they were not present in at least 10% of the samples. To account for domain errors during log transformation, due to zero values in the Streptococcus strain-level counts matrix, a pseudocount of 1eϪ4 was added to the entire dataframe. The WGCNA R package was used to calculate the adjacency and topological overlap measures (TOM) of the weighted cooccurrence network (75,76). The TOM similarity matrix was used to construct the fully connected undirected NetworkX graph structure (77). The dendrogram visualizations were constructed from the dissimilarity representation of the TOM matrix (i.e., 1 Ϫ TOM Similarity ) using ward linkage in SciPy (v1.0). PageRank centrality values (35) were computed on the fully connected weighted networks using the implementation provided in NetworkX. PageRank centrality is a variant of eigenvector centrality which allows us to measure the influence of bacterial nodes within our cooccurrence networks. We implemented PageRank centrality because it can be applied to fully connected weighted networks while many other, more common measures (e.g., Katz centrality) are not applicable in this setting. All analyses were conducted in Python (v3.6.4), and figures were generated using Matplotlib (v2.0.2) unless otherwise noted (78).
Variance component estimation. Assessing the additive genetic and environmental factors driving MAG abundance as determined by the ACE model (79), controlling for sex, age, and health phenotype, was described previously in reference 27. The ACE model assumes that the variability of a given attribute is explained by additive (A) genetic effects, the shared/common (C) environment, and nonshared/unique environmental (E) factors. To these ends, the MAG abundance data were standardized by the following procedure: (i) calculating proportions of each bin (i.e., relative abundance of summed TPM values per bin); (ii) log transformation of normalized microbial abundances; and (iii) z-score normalization for each draft genome bin. The ACE model for variant component estimation was implemented using the mets package in R (80).
Functional potential profiling. To determine the functional components of MAG, we translated ORFs for each bin to build a putative proteome. The University of Kyoto's Metabolic And Physiological potential Evaluator (MAPLE v2.3.0) (81,82) was used to compute the Module Completion Ratios (MCRs) representing KEGG pathways, complexes, functions, and specific signatures. The MCR is calculated using a Boolean algebra-like equation previously described in reference 82. In order to identify latent interactions between these KEGG modules, MAGs, and specific subjects, we developed a metric that incorporates genome coverage, proportion, and MCRs referred to from this point forward as Phylogenomically Binned Functional Potential (PBFP). With this measure, we were able to investigate differences in functional modules within the context of subject metadata (e.g., caries status). Subject-specific PBFP profiles were computed by summing matrix D (n ϭ subjects, m ϭ contigs) across the contig axis with respect to draft genome bin assignment to produce matrix X (n ϭ subjects, p ϭ MAGs). Matrix multiplication was computed for the MCRs in matrix C (p ϭ MAGs, q ϭ modules) and the abundance measures in X to yield a transformed matrix A (n ϭ subjects, q ϭ modules) with subject-specific PBFP profiles for subsequent analysis.
Data availability. New methods are available at https://github.com/jolespin/supragingival_plaque _microbiome. All reads and assemblies are available in BioProject PRJNA383868. The overall coassembly is available through biosample SAMN10133834. The genome bines are available through biosample accession numbers SAMN10134551 to SAMN10134584. The individual reads for each library are available through accession numbers SRR6865436 to SRR6865523.

ACKNOWLEDGMENTS
We thank all twins and their families; Grant Townsend and Nicky Kilpatrick for their dental expertise; Tina Vaiano, Jane Loke, Anna Czajko, Blessy Manil, Chrissie Robinson, Mihiri Silva, and Supriya Raj for their expertise and assistance with collection of data and samples; and Anna Edlund for constructive comments on an early draft of the manuscript.