Heterogeneous lineage-specific arginine deiminase expression within dental microbiome species

ABSTRACT Arginine catabolism by the bacterial arginine deiminase system (ADS) has anticariogenic properties through the production of ammonia, which modulates the pH of the oral environment. Given the potential protective capacity of the ADS pathway, the exploitation of ADS-competent oral microbes through pre- or probiotic applications is a promising therapeutic target to prevent tooth decay. To date, most investigations of the ADS in the oral cavity and its relation to caries have focused on indirect measures of activity or on specific bacterial groups, yet the pervasiveness and rate of expression of the ADS operon in diverse mixed microbial communities in oral health and disease remain an open question. Here, we use a multivariate approach, combining ultra-deep metatranscriptomic sequencing with paired metataxonomic and in vitro citrulline quantification to characterize the microbial community and ADS operon expression in healthy and late-stage cavitated teeth. While ADS activity is higher in healthy teeth, we identify multiple bacterial lineages with upregulated ADS activity on cavitated teeth that are distinct from those found on healthy teeth using both reference-based mapping and de novo assembly methods. Our dual metataxonomic and metatranscriptomic approach demonstrates the importance of species abundance for gene expression data interpretation and that patterns of differential expression can be skewed by low-abundance groups. Finally, we identify several potential candidate probiotic bacterial lineages within species that may be useful therapeutic targets for the prevention of tooth decay and propose that the development of a strain-specific, mixed-microbial probiotic may be a beneficial approach given the heterogeneity of taxa identified here across health groups. IMPORTANCE Tooth decay is the most common preventable chronic disease, affecting more than two billion people globally. The development of caries on teeth is primarily a consequence of acid production by cariogenic bacteria that inhabit the plaque microbiome. Other bacterial strains in the oral cavity may suppress or prevent tooth decay by producing ammonia as a byproduct of the arginine deiminase metabolic pathway, increasing the pH of the plaque biofilm. While the benefits of arginine metabolism on oral health have been extensively documented in specific bacterial groups, the prevalence and consistency of arginine deiminase system (ADS) activity among oral bacteria in a community context remain an open question. In the current study, we use a multi-omics approach to document the pervasiveness of the expression of the ADS operon in both health and disease to better understand the conditions in which ADS activity may prevent tooth decay.

eukaryotes (1).Through a series of enzymatic modifications, arginine ingested from food, endogenously produced by the host, or synthesized by resident bacteria is broken down by the arginine deiminase system (ADS) to produce citrulline, ornithine, carbon dioxide, ammonia, and ATP (2)(3)(4).Along with acting as an important source of energy, carbon, and nitrogen production, the ADS pathway is protective against low pH environments through the production of ammonia (2,5,6).Furthermore, research has demonstrated that ADS activity affects the assembly and coaggregation of bacteria in the plaque biofilm matrix, prevents the colonization of the cariogenic bacterium Streptococcus mutans, and raises the pH of the oral environment, all of which may inhibit caries development (7)(8)(9)(10)(11)(12).Moreover, the growth of bacteria in the presence of arginine in vitro or administration of arginine orally in vivo increases both ADS activity and the pH of the oral environment, and numerous clinical trials have documented a decrease in caries incidence when an arginine-supplemented toothpaste was used (6,(13)(14)(15)(16).Given the capacity of ADS activity to buffer the effects of highly acidogenic microbes, ADS-competent bacteria that inhabit oral biofilms (i.e., dental plaque) have been extensively studied in the context of oral health and the prevention of tooth decay (6,(17)(18)(19).Of particular interest are the mechanisms of ADS pathway regulation in streptococci, many of which are important early colonizers of the oral cavity (20)(21)(22)(23).Arginine and ADS-competent oral symbionts may therefore be candidate pre-and probiotic treatments, respectively, yet little is known about ADS activity among members of the oral cavity in a mixed microbial community context.
In the current study, we used a multivariate metatranscriptomic, metataxonomic, and in vitro assay approach to characterize the composition and function of the supragingival plaque microbial community in health and late-stage caries development.We found that while some Streptococcus sp. have higher expression of the ADS operon in health (i.e., Streptococcus oralis and Streptococcus sanguinis), others are more highly expressed in disease (e.g., Streptococcus parasanguinis).Importantly, the upregulation of genes involved in ADS activity for diseased teeth communities is limited to those with a low abundance of Streptococcus mutans, which may indicate an antagonistic relationship between these groups.Moreover, we find diverse non-Streptococcus bacteria with higher ADS operon expression in health when compared to disease.For instance, two poorly characterized members of the genus Leptotrichia have relatively high ADS expression in health and may represent an untapped source of genetically diverse potential probiotic strains.In addition, our results highlight the importance of community composition and within-species, lineage, or strain-level resolution in the interpretation of metatranscrip tomic data from a mixed microbial community.We found that community composition is significantly correlated with observed differences in ADS gene expression profiles and that, within species, only some lineages (i) possess a full ADS operon, (ii) have relatively high community abundance, and (iii) have high ADS expression, all of which have important implications for the development of a probiotic.Moreover, comparing our reference database-mapping approach to that generated using a de novo assembly approach documents substantial genetic diversity within lineages that are not well represented in publicly available databases (e.g., Human Oral Microbiome Database, HOMD).Finally, we calculated coverage across the ADS operon in a subset of species and found that post-transcriptional regulatory activity may affect the efficacy of some strains to modulate the pH of the oral environment, a pattern which is substantiated by de novo assembly of full-length ADS transcripts in our data set.Assessment of the relative contributions of transcriptional versus post-transcriptional control would require extensive strain-level analyses.Finally, while the aim of this study is to characterize the microbial community and ADS activity in health and disease, the vast majority of observed functional variation between healthy teeth and those with carious lesions is driven by diverse and discordant gene expression among communities associated with severe tooth decay.The results of this study highlight the functional and taxonomic volatility in late-stage caries development, identify alternate microbial candidates for probiotic development in the context of the ADS pathway, and highlight the importance of accounting for community composition when interpreting differential gene expres sion in mixed microbial contexts.

Increased microbiome diversity and citrulline production in health
We collected 60 supragingival plaque samples from 54 adults using a standard protocol (see Materials and Methods).Multiple teeth were sampled from a subset of individuals (n = 3) to document intraindividual functional variation.Each plaque sample was classified as either originating from a caries-free tooth (PF) or a tooth with an active, dentin cavitation or caries lesion (PD).In total, 27 plaque samples were collected from teeth with an active dentin cavity (PD) and 33 from a caries-free tooth (PF).Study participants included 35 females and 25 males ranging in age from 18 to 65 years old (average 31 ± 11 years).
First, to characterize the diversity of the oral microbiome and account for the differences in community composition driving gene expression, we used a high-reso lution metataxonomic approach targeting a fragment of the bacterial rpoC gene to profile the community composition of each sample.We chose this approach as (i) it is more cost-effective than metagenomic sequencing, (ii) the rpoC gene is a single-copy core gene in bacteria and therefore should be a better representation of community membership as compared to other commonly used marker genes (e.g., 16S rRNA hypervariable regions), and finally (iii) the rpoC gene has high phylogenetic resolution at the species and strain level (24) (see Materials and Methods), which allows us to directly compare community composition to gene abundance data using a referencebased mapping approach.These data were compared to citrulline assay and metatran scriptomic results generated from the same samples to clarify the role of community composition and membership in ADS gene expression in health and disease using both biochemical and RNA sequencing techniques.
We found that the oral microbiome of healthy teeth (PF) is more diverse than those with caries (PD) as measured by Shannon diversity (Wilcoxon rank sum test: P = 0.003; Fig. 1a), and we found that citrulline production is significantly higher in PF as compared to PD samples (Wilcoxon rank sum test: P = 0.0004).Variation in citrulline production is significantly (albeit weakly) correlated with beta-diversity when controlling for tooth type (PD vs PF) (PERMANOVA: R 2 = 0.03; P = 0.03) (Fig. 1b).Importantly, however, not all PF samples have high citrulline production nor do all PD samples have low production.For example, a single PD sample (UF68PD) had unexpectedly high citrulline production (Fig. 1b), which may be the consequence of significantly higher ADS activity by Streptococcus oralis as detected by metatranscriptomic sequencing (see below) (Fig. 1c; Table S1).Specifically, although the proportion of S. oralis assigned to this sample is small in our paired metataxonomic data set (<1% of the total community), transcripts from S. oralis make up 29.56% of all ADS transcripts originating from UF68PD.The median proportion of transcripts assigned to this species in PF is 12.44% (±13.17%) while in PD it is 1.80% (±15.91%)making UF68PD more similar to other high-citrullineproducing PF samples in terms of the proportion of transcripts assigned to this species.This is in stark contrast to four low-citrulline-producing PF samples (UF33PF, UF36PF, UF40PFR, and UF48PF), where the proportion of transcripts assigned to S. oralis was less than 1% each.In addition to S. oralis, sample UF68PD also had a high number of ADS gene transcripts assigned to Leptotrichia sp.oral taxon (HMT) 215, Kingella oralis, and Actinomyces naeslundii (Table S2), which were also found to have higher ADS expression in PF as compared to PD.

Unequal expression of individual arcABC genes
While arginine catabolism is a common metabolic pathway found across diverse bacterial groups, it is dispensable and the operon is absent or truncated in many oral bacteria, even among closely related phylogenetic groups (Table S3) (25).Moreover, the gene content and organization of the ADS operon vary within and between ADScompetent bacteria (26)(27)(28)(29)(30).As such, we first identified ADS-competent oral taxa in the Human Oral Microbiome Database (31) through homologous gene clustering of three highly conserved arc genes that produce enzymes necessary for ADS activity (i) arcA (arginine deiminase EC: 3.5.3.6), (ii) arcB (ornithine carbamoyltransferase EC: 2.1.3.3), and (iii) arcC (carbamate kinase EC: 2.7.2.2).Only those bacteria with all three genes in synteny were included in differential gene expression analyses in our reference-based mapping approach (see Materials and Methods).Using this manually curated collection of ADS operons as a reference, we next compared the relative abundance of transcripts   abundance volcano plot of metatranscriptome data.Colored points represent genes that were both statistically significant and passed log fold change abundance thresholds (see text).Labeled points represent all taxa upregulated for PF and six select taxa upregulated for PD (see text) that (i) had the highest adjusted P-value for either PD (pink) or PF (blue) and (ii) had all three arc genes marked as differentially abundant.One exception shown here is the S. sanguinis genes arcC1 and arcB, which, despite having a significantly higher expression of the arcA gene, did not pass thresholds.HOMD strain identifiers along with species identifications are shown in boxes.(d) Proportion of reads mapping to each of the three major arc genes in the ADS operon after manual curation for genes that were present in synteny within a contig in both PF and PD samples.(e) The proportion of RNA seq reads that mapped to arcABC genes among different genera.(f ) Proportion of reads that mapped to arcABC across different Streptococcus species (strains collapsed to species).
assigned to individual arc genes across PF and PD samples.As only those bacteria with arcABC organized in a syntenous operon were considered, we expected the proportion of transcripts that mapped to each gene within an operon to be approximately equiva lent to one another.Interestingly, for both PF and PD samples, while the proportion of transcripts assigned to arcA and arcB were similar, the proportion of transcripts mapping to arcC tended to be lower.This is especially true of transcripts mapping to the arc genes in PD samples, where the proportion for arcA is nearly three times higher than arcC (Fig. 1d).This may be due to methodological biases (e.g., incomplete or scaffold-level genome assemblies used as reference) but also may be indicative of differences in operon usage or structure in ADS-competent bacteria that are potentially associated with health status.For example, specific genes within an operon may be regulated through mRNA degrada tion depending on the transcriptional needs of the cell (32)(33)(34), internal transcription promoter or termination sites, or through other forms of gene expression regulation (e.g., transcriptional attenuation through secondary DNA structures that prevent transcription of the full operon) (35).Notably, for this analysis, we only considered reference operons that were (i) complete (i.e., all three genes present) and (ii) present on the same contig (to reduce the impact of incomplete genomes), which suggests that these patterns are biological and not methodological in nature.
Of those transcripts that mapped to one of the three arc genes analyzed here, most were assigned to Streptococcus, but we also detected ADS arc gene expression for other genera including Leptotrichia, Kingella, Lactobacillus, Actinomyces, Treponema, and several low-frequency groups (Table S2; Fig. 1e).While both PD and PF samples had a high frequency of transcripts assigned to Streptococcus parasanguinis (PF: 27.88%, PD: 39.68%) and Streptococcus anginosus (PF: 16.11%, PD: 20.97%), PF samples had a comparatively higher proportion of transcripts assigned to S. sanguinis (PF: 20.94%, PD: 5.28%) and S. oralis (PF: 13.15%, PD: 9.84%) as compared to PD (Fig. 1f).To better understand how structure and regulation might impact the production and survival of transcripts, we mapped our quality-filtered metatranscriptomic data to the full ADS operon of six strains of interest (Streptococcus sanguinis SEQF2018, S. parasanguinis SEQF1919, Streptococcus gordonii SEQF1066, S. oralis SEQF1998, Leptotrichia oral taxon 212, and Leptotrichia oral taxon 215) and generated coverage plots (Fig. 2).We chose these specific taxa as they demonstrate the range of coverage patterns and intergenic region size variation we observed in the data set across different ADS-competent species.Operon coverage for S. oralis and both lineages of Leptotrichia exhibits a wave-like pattern, with inconsistent coverage levels across the full operon, while coverage for S. sanguinis, S. parasanguinis, and S. gordonii is wave-like in some regions but relatively consistent in others, particu larly across arcB and to a lesser extent, arcA.Interestingly, we found that some of the most precipitous drops in coverage correspond to intergenic regions in the operon, the structure of which varies across the selected taxa presented here.For example, while S. sanguinis, S. parasanguinis, S. oralis, and S. gordonii have an intergenic region between arcA and arcB spanning approximately 51-85 base pairs long, both Leptotrichia lineages have an intergenic region between arcA and arcB spanning approximately 142-146 base pairs.Additionally, all Streptococcus sp.included in this analysis have an intergenic region ranging from between 86 (S. parasanguinis) and 313 base pairs (S. oralis) separating arcB and arcC.In contrast, both Leptotrichia sp. have only two or three base pairs separating the two genes.Importantly, the two most commonly observed species, S. sanguinis and S. parasanguinis, have markedly lower arcC coverage as compared to arcA and arcB, which may indicate differential regulation of this gene post-transcription.It is possible that drops in coverage for these two species correspond to hotspots or promoters in the operon for transcript degradation and intergenic regions may play a functional role in post-transcriptional regulation of the operon.Notably, the heterogeneity in operon structure and mRNA abundance observed in this study is consistent with the knowledge that ADS expression is known to be complex and responsive to a variety of environmen tal inputs [e.g., reference (36)].A full table of arc operon coordinates and lengths of Streptococcus sp. can be found in Fig. 2 and the two Leptotrichia sp.analyzed here can be found in Table S4.

Strain-level differential expression of the ADS operon
Given the heterogeneity of ADS operon structure and presence or absence among closely related oral bacteria, we performed differential expression analyses at both the species level, defined as the total of all transcripts that aligned to a named oral species in the HOMD database, and at the strain level, defined as those transcripts that aligned to a particular reference genome within the HOMD database or to a unique reference sequence generated using de novo assembly (see below).We take this tiered approach as previous research has demonstrated that ADS activity can vary significantly among closely related oral bacteria (28), which has implications for the development of probiotic treatments.Importantly, however, the HOMD database-based strain perspective has limitations.Specifically, a strain within the database used as a reference may not be present within a sample.Consequently, when a sequence read aligns to a specific genome within the database, we interpret this as the expression (RNA) or presence (DNA) of a strain from the same lineage as the strain in the database, and this is understood when we refer to strains throughout the manuscript.

Species-level differential expression of the ADS operon
Next, as some species in our reference database have multiple representative strains (predominantly Streptococcus) and some strains were more highly expressed than others in the above analyses, we collapsed our results by species designation to compare to strain-level results (Table S5).While most differential expression results remained significant between groups after collapsing at the species level (n = 16), 11 were no longer identified as having higher or lower expression in either group (Tables S5 and S6), reflecting strain-level gene expression heterogeneity.
To compare our species and strain-level results further, we next built a maximum likelihood phylogeny using a concatenation of the ADS genes (arcA, arcB, and arcC) for all Streptococcus species detected in our metatranscriptomic data and identified strains within species that had higher or lower expression or abundance among health groups.For example, while there are a total of 16 representative genomes for Streptococcus oralis in HOMD (as of database version 9.15a) and we assigned transcripts to four of them (Fig. 3), only a single S. oralis strain had all three genes significantly upregulated in health (SEQF1998) and collapsing at the species level produced an insignificant adjusted (FDR) P value (0.8) (Tables S6 and S7).Similarly, while there are eight representative strains in HOMD for S. intermedius and all eight had mapped transcripts in our data set, only one had higher ADS gene expression in PD samples (SEQF1706) and the collapsed data were insignificant (FDR = 0.07).Importantly, unlike other Streptococcus groups where the average log fold change and base mean are relatively identical (indicating that most transcripts assigned to those lineages mapped equally well to other members of the clade), all S. oralis identified in the current data have a high average base mean, but only SEQF1998 had a high average log fold expression (Fig. 3).

ADS transcript abundance and species abundance are correlated
Because we anticipate community structure to be an important confounding variable when comparing gene expression activity between PF and PD samples (Fig. 1b), we next calculated the correlation between log fold change of the abundance of ADS-competent species as measured by rpoC amplicon sequencing to the log fold change of gene expression for the ADS operon as measured by our reference-based mapping approach (Table S8).By inferring the taxonomic origin of individual transcripts and amplicon sequences with the same reference database, we can then directly compare patterns of community abundance and gene expression levels for specific bacterial species.In this way, we can better understand how community membership and abundance affect our interpretation of gene expression data.Log fold change for ADS expression was calculated at the species level by first summing all read counts per gene for each HOMD species and then averaged over the three arc genes to account for the differences in individual gene expression levels.We found that fold change of ADS expression and species abundance was highly correlated (R 2 = 0.62, P = 0.001), and in most cases, species with high differential expression levels are also found at higher abundance in their respective health category.For example, while transcripts assigned to the ADS pathway in Streptococcus sanguinis are more highly abundant among healthy samples, the relative abundance of amplicon sequence variants (ASVs) assigned to S. sanguinis is also higher in health, which suggests that the high abundance of transcripts assigned to the ADS pathway is linked to a higher abundance of this species in health as compared to disease.Only 48.15% of PD samples had at least one rpoC amplicon assigned to S. sanguinis, while the species was observed among all PF samples.Moreover, among those samples with detected levels of S. sanguinis, it was more highly abundant in PF as compared to PD (PF: 3.7% ± 4.5% relative abundance, PD: 0.5% ± 1.0% relative abundance).Similarly, while transcripts assigned to the ADS pathway in S. parasanguinis are more highly abundant in PD, the relative abundance of S. parasanguinis is also higher in PD as compared to PF when the amplicon data is considered (average 1.1% ± 2.3% of the total microbial community as compared to 0.4% ± 0.9% of the total community among PF) (Fig. 4a through c).This is despite the fact that there was an approximately equivalent proportion of PF and PD samples with rpoC reads assigned to S. parasanguinis (52.85% and 60.61%, respectively).Interestingly, we found a subset of ADS-competent bacteria that did not follow a simple pattern of higher community abundance coupled with higher ADS transcript expression.For example, Bulleidia extructa, S. constellatus, and S. intermedius are more highly abundant in PF as compared to PD samples but have higher ADS expression in PD.Conversely, Cryptobacterium curtum, Parvimonas micra, and S. anginosus have higher ADS expression in PD with very little difference in amplicon abundance between PD and PF samples.While these taxa have high ADS activity in disease that diverges substantially from amplicon sequence variant abundances, they are also relatively rare across individuals in this data set (Fig. 4a) and therefore may be less robust to different oral environments.

De novo assembly of transcripts provides higher ADS strain resolution
As described above, an ongoing challenge in microbial metatranscriptomics is that, lacking other evidence (e.g., metataxonomics or whole genome sequencing), the community of microorganisms across samples is largely unknown.Moreover, the composition of publicly available genomic databases tends to be heavily skewed toward model or pathogenic organisms.For example, of the 555 genomic sequences deposited in HOMD v9.15a for Streptococcus, 45% are the cariogenic pathogen S. mutans.Con versely, only 16 Leptotrichia genomes are available for reference within the database.As such, reference-based metatranscriptomic analysis underestimates true strain and species-level diversity in gene expression.Given the limitations of a reference-based mapping approach to characterize the full functional profile of a metatranscriptomic data set of unknown community composition, we next performed an additional de novo assembly of our data set to document the diversity of ADS transcripts across all taxo nomic groups.
On average, we generated 364,811 (±159,936) assembled transcripts per sample (Table S9), of which 1,723 were annotated with at least two of the arcABC genes and 568 had all three genes annotated in synteny.Most transcripts were truncated, which supports our previous interpretation of post-transcriptional regulation of the operon using our reference-based mapping approach.Of those ADS transcripts with all three genes present, 132 had one or more genes upregulated in disease (PD) consisting mostly of transcripts assigned to Streptococcus sp.(35%) followed by Lactobacillus ultunensis (11%) and Limosilactobacillus fermentum (8%).Comparatively, 252 unique transcripts were upregulated in health (PF), 32% of which were assigned to Leptotrichia sp.oral taxon 212 followed by Streptococcus sp.(27%) and Kingella oralis (18%) (Table S10).The results of this analysis complement our reference-based approach and document the high within-strain diversity of ADS operon structure and activity in health and disease.For example, we found relatively consistent and high expression of ADS genes among the majority of transcripts assigned to Leptotrichia (Fig. 5; Table S11), while expression   S12).The results of this analysis demonstrate the diversity of the ADS operon among oral taxa, particularly in understudied groups like Leptotrichia.

Streptococcus mutans antagonism
Next, as ADS activity and ADS-competent bacteria have previously been shown to reduce the ability of Streptococcus mutans to proliferate within the plaque biofilm, we compared the proportions of ADS-competent bacteria and S. mutans in both PD and PF samples.We found that while PD samples as a whole had a higher prevalence across samples (77.78%) and overall community proportion (12.43% ± 18.47%) of S. mutans as compared to PF (prevalence: 46.88%, average proportion: 2.40% ± 8.44%), 55.56% had low (<5%) S. mutans abundance (Fig. 7a) and S. mutans abundance was not correlated with citrulline production (Fig. S1).Next, we performed selbal analysis (37) to identify groups of taxa that are most associated with citrulline production and found that the balance in the ratio between Veillonella parvula and Corynebacterium matruchotii have the highest association with the total amount of citrulline detected in our samples (Fig. 7b; R 2 = 0.32).vincentii.We found no significant correlation between the relative abundance of these taxa and S. mutans in our ASV data set.Upregulation of the ADS pathway is dispropor tionately skewed by PD samples with low S. mutans prevalence, and only a single gene assigned to Streptococcus infantis (arcB) is upregulated in PD samples with high S. mutans (Fig. S2a; Table S13).Overall, samples with high S. mutans abundance also have a lower relative abundance of S. oralis and S. sanguinis with the exception of two PF samples (Fig. S3).Interestingly, both PF samples (UF17PF and UF55PF) with a relatively high abundance of S. mutans also have a large proportion of total ADS transcripts assigned to S. sanguinis.For example, 48% of rpoC amplicons derived from sample UF55PF were assigned to S. mutans, while 14% were assigned to S. sanguinis, yet 52% of all ADS transcripts originating from this sample were assigned to S. sanguinis.

Stress response and virulence genes are upregulated in disease
Finally, while this study focused on gene expression for a single metabolic pathway, arginine deiminase, we found 210,295 genes that had significantly different expressions between PD and PF samples, 63,268 (30%) of which are unknown hypothetical proteins.Much of the functional diversity at the community level is driven by PD samples, which exhibit low functional and community cohesion as compared to the PF samples (Fig. 8a).Acidipropionibacterium acidifaciens, Fig. 8b)].Genes highly expressed in PD samples as compared to PF include a variety of stress response or virulence factors including those involved in biofilm formation, acid tolerance, the production of antimicrobial pepti des, and response to environmental stressors.For example, pflB (pyruvate format-lyaseencoding) genes are key factors in the colonization of Salmonella in the gut facilitated by host cell apoptosis (38), clpC encodes an ATPase involved in response to environ mental stress and proteolytic activity (39,40), dnaK protects against a wide variety of environmental stressors and promotes biofilm growth and lactic acid fermentation at high temperatures in lactic acid bacteria (41)(42)(43), adhE promotes the production of mutacin by S. mutans that has antagonistic activity against a wide range of Gram-posi tive bacteria including other members of Streptococcus (44,45), and genes encoding fimbriae (Fimbrial subunit type 1) are involved in biofilm development.Interestingly, the downregulation of genes encoding the major and minor fimbriae (fimA and mfa1) have previously been shown to be disrupted by ADS activity in the oral cavity (46).We also find multiple Streptococcus mutans genes upregulated in the disease, which are key factors in carbohydrate metabolism, including ccpA, codY, pacL, gtfA, gtfC, levD, and fruA (47,48).Transcripts assigned to these genes are derived from two strains of S. mutans: UA159 and NN2025, which are also the dominant S. mutans strains found in the rpoC data set (44% and 52% of all rpoC amplicons assigned to S. mutans, respectively).Like our ADS pathway results, the abundance of S. mutans appears to be driving most of the expression-level data (Fig. 8c), where 78% of PD samples have rpoC reads assigned to S. mutans as compared to 48% of PF samples.Interestingly, though, Propionibacterium acidifaciens is found in very low abundance across the data set, even for those samples with high P. acidifaciens gene expression.About 59.26% of PD and 0% of PF samples had rpoC amplicons assigned to this species with an average abundance of 1.25% (± 4.29%) in PD.Similarly, on average 1,088 transcripts (±1,870) were assigned to P. acidifaciens in PD samples as compared to 0.10 (±0.09) in PF.Assuming this is not the result of the disproportionate representation of S. mutans as compared to P. acidifaciens in our database (247 vs 2 representative genomes), these results may indicate that this species may be an important driver of tooth decay, even at low frequency in the oral cavity.Importantly, two strains assigned to both S. mutans and P. acidifaciens are represented in Fig. 8b and contribute relatively equally to the variance between PF and PD as genes found in these strains are too similar or identical to one another to distinguish between the genome of origin (ranging from 99.5% to 100% genetic identity).

DISCUSSION
Manipulation of ADS activity in the oral cavity either through the administration of prebiotics in the form of arginine toothpaste or the development of ADS-competent probiotic strains (6,10,11,13,14,16,19,49) are promising prophylactic treatments for the prevention of tooth decay, which remains a major global public health problem (50).The results of this study complement and build upon previous investigations of the significance of ADS activity in the oral cavity in preventing tooth decay by providing a multivariate analysis of differential gene expression of ADS-competent bacteria in the context of a mixed microbial ecosystem.
In agreement with previous studies (4,6,11), we found that while not completely lacking ADS activity, PD samples generally had lower citrulline production from arginine in vitro (Fig. 1b) and a lower proportion of ADS-competent bacteria compared to PF samples as measured by both amplicon frequency and transcript frequency (Fig. 5c and  1d).Interestingly, however, we detected 13 strains of bacteria with upregulation of the ADS operon in caries disease.Most strains identified as upregulated in PD make up a very small proportion of the total oral community (e.g., Limosilactobacillus fermentum and Lactobacillus ultunensis), which may indicate that to be effective in preventing tooth decay, the ADS operon must be expressed by a taxon or group of taxa that reach a certain abundance threshold.Alternatively, some taxa may have higher ADS activity in PD but not secrete sufficient levels of ammonia at an adequate rate into the surrounding biofilm matrix to negate environmental acidification, independent of community abundance.Previous research has demonstrated that oral bacteria must be able to generate alkali products above a minimum threshold to prevent acidification of the environment by sugar metabolism (51) and caries development in experimen tal animals (52).Not all ammonia produced by alkali-generating metabolic pathways is rapidly released into the extracellular environment.Consider, for example, that while Streptococcus mutans lacks the ADS operon, it catabolizes agmatine to produce ammonia, CO 2 , and ATP through the agmatine deiminase (AgDS) pathway, which closely resembles arginine catabolism through the ADS.AgDS contributes to acid tolerance and growth of S. mutans by ATP and intercellular ammonia production, the latter of which increases cytoplasmic pH, apparently without a substantial effect on the pH of the plaque biofilm (53).Moreover, as the ADS pathway is highly regulated, specific environmental conditions may be necessary to trigger higher ADS activity in some species that are not conducive to the prevention of tooth decay.For example, low pH increases the transcription of the ADS pathway in Streptococcus gordonii (36).It may be the case that the ADS pathway is upregulated at low pH in some taxa where cellular survival is possible but not the prevention of enamel demineralization.ADS activity can persist in some bacterial species (e.g., Streptococcus) at extremely low pH ( 5) and thus higher transcript expression of the ADS operon may continue even in highly acidic environments.As tooth demineralization is a consequence of not only environmental acidification but also the duration of a low pH environment in the plaque biofilm, potential preventative strains of bacteria should have higher ADS expression independ ent of environmental conditions excluding the availability of arginine itself (28).Finally, our ADS operon coverage analysis and de novo assembly of full ADS transcripts suggest that regions of the operon may undergo differential post-translational regulation, which may, in turn, lead to lower ammonia production in some strains or perhaps in cer tain environmental contexts (54).Further research on the regulatory mechanisms and metabolic behavior of the ADS-competent bacteria in PD samples detected here would clarify their efficacy in the prevention of tooth decay.
While we find ADS activity is upregulated in both health and disease, it is important to note that upregulation of the ADS pathway in PD samples was limited to those with low (<5%) Streptococcus mutans abundance, which may suggest that ADS activity prevents the proliferation of this cariogenic taxon in the oral cavity or, alternatively, that the presence of S. mutans inhibits ADS activity.Moreover, the oral environment across samples in both health and disease may be more or less conducive to ADS activity.For example, ADS activity is dependent on the availability and concentration of arginine as a secondary carbon source and can, in some taxa, be repressed by the oxygen tension in particular sites in the oral cavity (55,56).Additionally, coverage plots and de novo assembly of ADS transcripts generated in the current study suggest the presence of gene-specific regulation of the ADS operon among certain groups, particularly of the arcC gene in Streptococcus sp.This post-transcriptional regulation may result from competition for substrates necessary for other metabolic pathways in certain environmental conditions.For example, the anaerobic growth of Streptococcus thermophilus is strongly dependent on carbamoylphosphate synthetase activity (CPS), which synthesizes carbamoyl phosphate from glutamine, CO 2 , and two ATP molecules as a precursor for arginine and pyrimidine biosynthesis (57).The CPS synthetase domain is found in nearly all organisms (58), and its activity is strongly inhibited by arginine (3, 57, 59); thus, in anaerobic environments where arginine is depleted and CPS activity is high, arcC activity may be suppressed through selective mRNA degradation to prevent the catabolism of carbamoyl phosphate to ATP and NH 3 , instead favoring its use as a substrate for arginine and pyrimidine production.Importantly, in taxa lacking CPS, the arginine deiminase pathway can replace CPS as the primary provider of carbamoyl phosphate for pyrimidine biosynthesis (60).Finally, the higher abundance of ADS-com petent bacteria may lead to nutritional resource competition with S. mutans and other cariogenic bacteria that require arginine and arginine-containing peptides, all of which may contribute to antagonism between S. mutans and ADS-competent bacteria in both PF and PD.Given that oral health exists on a continuum and tooth decay is a progres sive disease, the relative contribution of the oral environment, microbial community structure, environmental secretion of alkaline products of metabolism, post-transcrip tional mRNA regulation, and the presence or absence of specific strains in promoting health or disease likely fluctuates over time.
Despite this variability, we identified lineages for Streptococcus oralis, S. sanguinis, Streptococcus sp.HMT 056, and Leptotrichia sp.HMT 212 and 215, which may contain effective probiotic strains.Moreover, we detected taxa that may have important roles in the structural development of plaque, the inclusion of which may be necessary for probiotic success.For example, previous research identified Corynebacterium matrucho tii as one of the taxa that increased with regular brushing with fluoride + arginine toothpaste while species of Veillonella decreased (15).In the current research, we find both C. matruchotii and Veillonella parvula to be associated with high or low citrulline production, respectively.In addition, C. matruchotii plays a significant role in biofilm formation, can induce the mineralization of plaque, and recently was shown to increase the growth rate of various Streptococcus sp. in vitro (61)(62)(63).It, therefore, may be a key structural taxon for the colonization and proliferation of ADS-competent bacteria in the oral cavity.
Importantly, the analysis of multiple Streptococcus genomes for each species highlighted that ADS upregulation is heterogeneous among lineages identified by our metatranscriptomics data set (e.g., S. oralis = 25%, S. sanguinis = 17%), that not all lineages annotated by HOMD possess the full ADS operon (e.g., S. oralis = 25%, S. sanguinis = 92%; Table S3), and that de novo assembly of the ADS operon can identify multiple distinct transcripts for the ADS operon assigned to a single species.Additionally, we found that relatively high proportions of ADS-competent bacteria do not equate to lower proportions of cariogenic bacteria in all cases.For example, two PF samples in our data set had both high frequency of S. mutans and S. sanguinis as measured by rpoC sequencing (Fig. S3).However, as these samples also had high ADS activity by S. sanguinis, one possible interpretation is that this activity has prevented the development of a cavity in these teeth, despite the presence of S. mutans.Alternatively, samples with a high relative frequency of S. sanguinis and S. mutans but low ADS activity for S. sanguinis (e.g., UF68PD) may reflect strain-level differences in ADS competency.Given that some lineages of S. sanguinis do not have a full ADS operon (Table S3), it may be that some strains are not capable of antagonizing S. mutans even at relatively high frequency.The importance of strain-level resolution in determining candidate probiotics to promote ADS activity in the oral cavity has previously been described among strains of closely related Streptococcus species (28).Our results confirm the importance of strain-level upregulation of ADS in the context of oral health for different Streptococcus species (e.g., S. oralis, S. sanguinis, and Streptococcus sp.HMT 056) and identify potential candidate species that are as of yet understudied for their role in the promotion of oral health.For example, we found two strains (lineages) of Leptotrichia (HMT 212 and HMT 215) that had high ADS activity in PF and comparatively high read abundance and prevalence across samples in the current data set.Moreover, de novo assembly of full ADS operons documents high levels of diversity within this group, which may contribute to health or disease.Conversely, our results support the suggestion that lineages of bacteria that are typically associated with health (e.g., non-mutans Streptococcus sp.) may be important contributors to the etiological development of caries (64,65).Along with potential probiotic uses, identification of specific bacterial lineages consistently associated with health and with known beneficial properties in the oral cavity may be used as additional markers of risk assessment beyond the presence of cariogenic taxa, though further investigation into the prevalence and functional activity of the taxa identified here is necessary to clarify their role in maintaining favorable pH in the oral cavity.
Analysis and interpretation of metatranscriptomic data sets generated from mixed microbial communities are complicated by the differences in community composition and membership (66,67), particularly when the microbial community found in comparative groups is highly dissimilar.Thus, accounting for the differences in com munity composition between groups is critical (68,69).Here, we compared log-fold changes in a parallel metataxonomic data set to investigate the impact of community composition and species abundance on our metatranscriptomics results.Our paired metatranscriptomic and metataxonomic results demonstrate that often higher gene expression is a consequence of differential community membership or abundance and not truly neutral differences in gene expression profiles.For example, all three Strepto coccus species found here to have higher ADS activity on healthy teeth typically make up a much larger proportion of PF communities, and so more transcripts may be a simple function of more cells contributing to the effect.Conversely, taxa found at low frequency may be functionally more active in some circumstances (e.g., Limosilactobacil lus fermentum, Lactobacillus ultunensis, and Propionibacterium acidifaciens).Given that, a good candidate probiotic will need not only to efficiently catabolize arginine to produce ammonia but also have the ability to effectively colonize the plaque biofilm, comparing genome and transcript frequency is an important consideration.Moreover, transcript abundance alone does not necessarily precisely correlate to actual in vivo biochemical activity, and a variety of confounding factors including strain-level translational efficiency and enzyme stability may impede our accurate interpretation of activity levels alone.For the purposes of this study, we focus on taxa with both high RNA transcript abundance and comparatively high ASV abundance as these taxa have (i) high ADS expression that is not directly linked to high relative abundance in a sample and (ii) are prevalent across multiple individuals (and thus may be more efficient colonizers of the plaque biofilm).
Finally, we find that while ASV diversity is lower in PD as compared to PF samples (Fig. 1a), functional diversity is much more dispersed as compared to healthy teeth, which tend to have a much more consistent functional profile (Fig. 6a).The main functional differences between PD and PF communities are largely driven by a variety of genes associated with virulence factors and carbohydrate metabolism in known cariogenic taxa, and these effects are largely individualistic in scope, which suggests the presence of high functional volatility during caries development.Interestingly, we detected two major taxa driving the majority of differentially abundant genes in PD vs PF, Streptococ cus mutans, and Propionibacterium acidifaciens, even though P. acidifaciens occurs at low frequency for both groups in our corresponding amplicon data set.While this may be the result of taxonomic biases inherent in amplicon data sets, it may also demonstrate the important functional activity by relatively low-frequency taxa.Moreover, as P. acidifaciens biofilm formation is inhibited by S. mutans growth (70), a better understanding of the functional interaction between these species may elucidate their impact on the total ecology.The volatility of the functional profile of PD teeth, coupled with lowered diversity, suggests that candidate probiotics should not be limited to a single species.
Results of the current study highlight potential candidates for probiotic panels, including oral bacteria where pH modulation through the ADS pathway has previously been identified (e.g., Streptococcus sp.) (2), as well as those that are less well characterized (e.g., Leptotrichia sp.).In conclusion, while our multivariate approach substantiates the role of the ADS pathway in health and disease, it highlights the importance of account ing for taxonomic shifts in the analysis of functional differences in mixed microbial ecosystems such as dental plaque, which will impact the efficacy of any caries treat ment, and suggests that probiotic development may benefit from further strain-level investigation of a wider assortment of species for ADS activity and pH modulation potential in mixed microbial communities.

Samples and study design
A total of 54 adult individuals were recruited for this study.Carious lesions were diagnosed using the International Caries Detection and Assessment System II (ICDAS-II) visual criteria (71) and the lesion activity (72) scoring system by a calibrated examiner (M.M.N.) as previously described (11,73).Sampling of site-specific plaque samples followed protocols previously published by this group (11).Briefly, study participants were instructed to refrain from oral hygiene procedures for 8 hours prior to plaque sampling, after which supragingival plaques were collected from individual teeth and classified as caries-free (PF) (ICDAS score = 0) or with an active dentin carious lesion (PD) (ICDAS score ≥ 4).PD samples were recovered from the internal surface of the lesion.Full metadata for each sample can be found in Table S14.

Citrulline assay
A total of 74 supragingival plaque samples were tested for their potential to generate citrulline from arginine via the arginine deiminase system using a protocol previously validated and published by our group (28).Briefly, plaque samples were dispersed using external sonication for two cycles of 30 seconds with cooling on ice during the interval.The cells were then harvested, washed, and resuspended in 10 mM Tris-Maleate buffer and further permeabilized with toluene-acetone (1:9) for the measurement of ADS activity.The total protein concentration of the plaque sample was also measured using a BCA protein estimation kit (Pierce, Waltham, MA, USA) with a known concentration of bovine serum albumin as a standard.The ADS activity levels in the plaques were normalized to protein content and represented as nanomoles of citrulline generated per milligram of protein.

DNA and RNA extraction
Plaque samples were stored in RNAlater (ThermoFisher Scientific, Invitrogen, Waltham, MA, USA) immediately after sampling.DNA and RNA were extracted simultaneously following the mirVana miRNA isolation kit for RNA protocol (Ambion, ThermoFisher Scientific, Waltham, MA, USA).We chose this isolation kit as it is designed to efficiently isolate small RNAs (<200 nt), as well as total RNA, which play a crucial role in regulating bacterial global gene expression and are routinely used for microbiome studies (74)(75)(76).Cells were washed and collected by centrifugation for 10 minutes at maximum speed.A quantity of 300 µL of mirVana kit lysis/binding buffer, 300 µL acid-phenol:chloroform, and 250 µL of 0.1 mm sterilized zirconia-silica beads (BioSpec Products, Bartlesville, OK, USA) were added to the samples.Samples were bead beaten twice for 30 seconds with a 15-second interval in between.Samples were centrifuged at maximum speed for 5 minutes at room temperature for clear separation of aqueous and organic phases.For RNA isolation, the aqueous phase was collected and processed according to the manufacturer's protocol.For DNA isolation, the organic phase from the mirVana mRNA isolation was stored at 4°C until use.An extraction buffer containing 0.1 M NaCl, 10 mM Tris-HCL (pH 8.0 ± 0.2), 1 mM EDTA, and 1% SDS was prepared.The final pH of the buffer was adjusted to pH 12 (±0.2) with 10 N NaOH.We next added 250 µL of the extraction buffer to the organic phase containing DNA followed by an incubation at 4°C.After centrifugation at 10,000 g for 20 minutes, the aqueous phase was recovered and immediately stored at −80°C for an hour after the addition of 1/15 vol of 7.5 M ammo nium acetate and 100% ethanol.The samples were then centrifuged for 20 minutes at 10,000 g, and the supernatant was discarded.The pellets were air dried and washed with ethanol before being resuspended in 17 µL of molecular-grade water.

RNA library preparation and sequencing
Total RNA samples were processed twice with MICROBEnrich and MICROBExpress bacterial mRNA enrichment kits (Ambion/Life Technologies, Grand Island, NY, USA) to remove eukaryotic mRNA and 16S/23S rRNAs, respectively.The purified mRNA was resuspended in 25 µL of nuclease-free water.The quality of enriched mRNA was analyzed using an Agilent Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).Next, we generated cDNA libraries from 100 ng of enriched mRNA using the NEBNext Ultra directional RNA library preparation kit and NEBNext dual index oligonucleotides for Illumina (New England BioLabs, Ipswich, MA, USA).To ensure proper enrichment of bacterial transcripts, we first sequenced the pooled library using a MiSeq Reagent Nano Kit v2 (300 cycles) at Clemson University.Deep sequencing of the library was next performed by the NextGen DNA sequencing core laboratory of the Interdisciplinary Center for Biotechnology Research at the University of Florida (Gainesville, FL, USA).Samples were sequenced on the NovaSeq 6000 platform on a single S4 flow cell using a paired-end 2 × 150 bp chemistry.

DNA library preparation and sequencing
Each plaque sample was amplified and built into Illumina sequencing libraries using a custom primer set targeting an approximately 478-bp fragment of the bac terial rpoC gene-rpoCF (5′-MAYGARAARMGNATGYTNCARGA-3′) and rpoCR (5′-GMCA TYTGRTCNCCRTCRAA-3′) (77).We adopted this approach as recent research has demonstrated the inability of other common metabarcoding approaches (e.g., 16S rRNA hypervariable regions) to resolve common oral groups (i.e., Streptococcus sp.) (78), and further research identified the rpoC gene as a promising alternative marker (24).We note, however, that like other amplicon-based approaches, rpoC gene fragment metabarcod ing is not free from bias [see reference (77)], and the primers published here have only been tested using oral samples.Each 25 µL PCR reaction consisted of 0.5 µL of the forward and reverse adapter, 10 µL of H 2 O, and 10 µL of Platinum Hot Start PCR Master Mix (2×) (Invitrogen, Carlsbad, CA, USA).PCR thermocycler conditions were as follows: 94°C for 3 minutes followed by 41 cycles of 94°C for 45 seconds, 39.5°C for 1 min, and 72°C for 1 min and 30 seconds.A final elongation step was performed at 72°C for 10 minutes.Amplification of PCR products for each sample was confirmed by both gel electrophoresis and Qubit Fluorometer (Invitrogen, Carlsbad, CA, USA).We processed PCR blanks (molecular-grade water) in parallel to all samples.Prepared libraries were sequenced on an Illumina MiSeq using 2 × 300 paired-end chemistry.

Metatranscriptomic data processing
On average, we generated 326,151,580 (SD ± 102,407,864) raw sequencing reads per sample.We first assessed the quality of the raw reads using FastQC v0.11.9 (79) before quality filtering, trimming adapters, and removing poly-G tails using Cutadapt v3.3 (80).On average, 302,238,342 (SD ± 9,238,698) reads passed our quality filtering thresholds.Next, we performed a preliminary filter on the trimmed reads by first mapping to the SILVA v138 non-redundant 99% SSU and LSU 18S rRNA and 16S rRNA databases (81) and then to the human reference genome (GRCh38) using Bowtie2 v2.4.5 (82).On average 0.02% (SD ± 0.06%) of our quality-filtered reads per sample were mapped to the SILVA database and 3.04% (SD ± 4.43%) to the human genome.We removed any reads that mapped to either the human reference genome or SILVA ribosomal databases using a custom Python script before performing downstream analyses.
For taxonomic and functional assignment, all filtered reads were mapped against the Human Oral Microbiome Database (v9.15) (31) using STAR v2.7.5c (83) with alignIntroMax set to 1 to disable spliced alignments and outFilterMismatchNoverLmax set to 0.05 to increase mapping specificity.On average, 198,385,752 (SD ± 68,279,777) reads mapped to our database per sample.Gene counts were calculated from each sample using the featureCounts function as implemented in the Subread v2.0.1 package (84,85).Given that our reference-based approach included a database of diverse microbial genomes with varying levels of genomic congruence, when running featureCounts, we allowed for multimapping genes (-M), only pairs with both reads aligned were counted (-B), and paired reads that mapped to separate chromosomes (or contigs in the case of incomplete genome reconstructions) were not counted for downstream analyses (-C).
To complement our reference-based approach, we performed an additional de novo assembly on a sample-by-sample basis using Trinity (v2.15.1) (86).On average, we generated 364,811 transcripts per sample (±159,936).We next merged the assembled transcripts and performed a full length dereplication using vsearch (v2.14.1) (87).We annotated the final dereplicated database using prokka (v1.14.5) (88).Finally, we mapped our filtered reads to the de novo reference using STAR (v2.7.5c) (83) in an identical manner as described above.We identified significantly up-or downregulated genes using DESeq2 (89).We built maximum likelihood trees of our de novo ADS transcripts with references from HOMD using mafft (v7.453) (90) and RAxML (v8.2.12) (91), which were visualized in iTOL (v5) (92).As many transcripts were truncated or otherwise incomplete, we built phylogenetic trees with only those transcripts that had annota tions for all three arc genes (arcABC) and trimmed the final alignments using trimAl (v1.4.rev15) with a gap threshold of 0.5, minimum position overlap in a column of 0.5, and minimum percentage of "good positions" a sequence must retain for inclusion of 50 (93).Assembly statistics can be found in Table S8.

ADS operon identification
The ADS operon was identified in the HOMD database by first selecting all genes annotated as those that produced one of the three main enzymes involved in the pathway.While gene content varies within and among groups, the enzymes encoded by three genes (i) arcA (arginine deiminase EC: 3.5.3.6), (ii) arcB (ornithine carbamoyltrans ferase EC: 2.1.3.3), and (iii) arcC (carbamate kinase EC: 2.7.2.2) are necessary for ADS activity and are thus the focus of this manuscript (1,36).Because gene annotations are rarely consistent across published data sets, we first performed homologous gene clustering using the HOMD database to verify arc gene identity using BLAST (102) and the Markov Clustering algorithm implemented in MCLblastline (103).In brief, we created a BLAST database of all protein sequences that had been annotated as one of the three ADS enzymes (i.e., by the EC number) from the HOMD database.We next mapped the full HOMD protein database against our reference ADS database using BLASTp (e value 1e-10), the output of which we used to generate homologous gene clusters using MCLblastline with an inflation parameter of 1.8, which previously had been identified as well suited for high-throughput data (104) and, in practice, can identify homologous gene clusters across diverse groups (105).Genes that did not cluster with any other arc gene in the database (e.g., singleton clusters) and annotated genes that fell into clusters that included a mixture of genes annotated as arcA, arcB, and arcC were removed from downstream analyses.Using results from this analysis, we further identified species with a full ADS operon as those wherein all three genes (arcA, arcB, and arcC) were (i) present and (ii) in synteny with one another in the reference genome.Synteny was manually identified using locus tag assignment.Species, where synteny of the arc genes was interrupted by a single gene (e.g., hypothetical protein), were included in downstream analyses to account for operon variability.A full list of species that carry the ADS operon from this analysis can be found in Table S2.

Differential gene expression and species abundance analyses
We calculated differential gene expression using DESeq2 v1.34.0 (89) in R v4. 1.2 (96).To minimize the effect of large fold changes between groups that were the result of low transcript count, we next ran lfcShrink on our DESeq2 results using the apeglm R package (106).Our threshold for differential expression between groups was an adjusted P-value (FDR) of less than or equal to 0.05 and a minimum fold change of four.Because we expect community composition and species presence to be meaningful confounding variables in comparing the estimates of gene expression across groups, we calculated the correlation between log fold change values in transcript abundance to rpoC gene abundance for a selection of ADS-competent oral taxa (Fig. 5c; Table S6).

Streptococcus ADS operon phylogeny
First, we concatenated the HOMD-annotated arcA, arcB, and arcC genes for each Streptococcus lineage in the HOMD database, which were validated by our homologous gene clustering into a single contiguous sequence using the concat function in SeqKit (v2.3.1).Concatenated sequences were then aligned using mafft (v7.505) (90) and built into a maximum likelihood phylogeny using RAxML (v8.2.12) (91).The phylogeny was midpoint rooted and visualized in FigTree (v1.4.4) (107).Collapsed log fold change and base mean were calculated by taking the average across all three arc genes for each Streptococcus lineage.Lineages that fell outside of an expected major species clade were checked using the BLAST web application (102) and the full NCBI NT database as a reference to ensure that it was annotated with the correct taxonomic assignment.

2 3 4 FIG 1
FIG 1 Significant differences in ADS activity are found when comparing PD and PF, which are predominantly driven by major differences in taxonomic composition.(a) Shannon diversity box and raincloud plots for PD and PF samples.(b) Beta-diversity colored by ADS expression levels as measured by our citrulline assay.Shapes represent the two groups.Samples with unexpectedly high or low citrulline production are labeled on the plot (see text).(c) Differential

FIG 2
FIG 2 ADS operon structure and coverage vary across strains.(a) Log 10 coverage per base of the three arc genes in Streptococcus sanguinis (SEQF2018).(b) Log 10 coverage per base of the three arc genes in Streptococcus parasanguinis (SEQF1919).(c) Log 10 coverage per base of the three arc genes in Streptococcus gordonii (SEQF1066).(d) Log 10 coverage per base of the three arc genes in Streptococcus mitis/oralis (SEQF1998).(e) Log 10 coverage per base of the three arc genes in Leptotrichia sp.oral taxon 212 (SEQF2748).(f ) Log 10 coverage per base of the three arc genes in Leptotrichia sp.oral taxon 215 (SEQF2444).Line colors in all coverage plots represent all reads mapped to health (blue) as compared to disease (red).Gray bars indicate intergenic regions.White internal labels on intergenic regions indicate the size of each region.

FIG 3
FIG 3 Differential expression and abundance of ADS operon vary by Streptococcus lineages.Maximum likelihood phylogeny for the Streptococcus sp.ADS operon (concatenated arcA, arcB, and arcC genes).Asterisk (*) indicates a lineage where all three arc genes were up-or downregulated as measured by DESeq2 analysis.Corresponding heatmap indicates the average log fold change and base mean per aligned lineage.
. of plaque samples where taxon was observed in rpoC dataset 0 4 log10(count) b) Log10 normalized rpoC read count per taxon among all plaque samples

FIG 4 FIG 5
FIG 4 Species level ADS operon expression is significantly correlated to community composition as measured by rpoC amplicon sequencing.(a) The proportion of samples in the full amplicon data set where species listed in panel c were detected.(b) Log 10 normalized read count of amplicon sequences for each species corresponding to panel c.(c) Comparison of log2 fold changes for ADS-competent bacteria in the metataxonomic and metatranscriptomic data set.R 2 and alpha significance values were calculated using a Pearson correlation test.

FIG 7
FIG 7 Evidence for Streptococcus mutans antagonism with ADS-competent bacteria.(a) Relative proportion of amplicon reads for ADS-competent bacteria identified in our metatranscriptomic data set as compared to Streptococcus mutans across all samples sequenced.(b) Balance of taxa most predictive of citrulline production.Scatter plot on the left illustrates the maximum number of taxa whose ratio is associated with the response variable.Plot on the right illustrates the two taxa whose ratio best predicts the maximum amount of variation in the response variable.

FIG 8
FIG 8 Community-level differential gene expression profiles are more individualistic in PD as compared to PF samples.(a) PCA plot showing the differences in functional profiles among samples using the top 500 varying genes between groups.Teeth sampled from the same individual are highlighted in the plot.(b) Hierarchically clustered heatmap of the top 25 genes (determined by highest variance) that differentiate between PD and PF samples.Genes marked by an asterisk are annotated as hypothetical proteins.Different lineages of S. mutans (SEQF1069 † , SEQF1382 # ) and P. acidifaciens (SEQF1851 + , SEQF2583 ○ ) are indicated by corresponding symbols.Identical genes detected in different lineages are indicated by brackets, and corresponding percentages indicate genetic identity between the two genes as determined by blastn alignment.(c) Relative abundance of Streptococcus mutans and Propionibacterium acidifaciens for each sample corresponding to heatmap dendrogram.
These differences are the result of a strong functional signal of cariogenic taxa [and particularly Streptococcus mutans and Propionibacterium acidifaciens (also known as