Genomic Stability and Genetic Defense Systems in Dolosigranulum pigrum, a Candidate Beneficial Bacterium from the Human Microbiome

ABSTRACT Dolosigranulum pigrum is positively associated with indicators of health in multiple epidemiological studies of human nasal microbiota. Knowledge of the basic biology of D. pigrum is a prerequisite for evaluating its potential for future therapeutic use; however, such data are very limited. To gain insight into D. pigrum’s chromosomal structure, pangenome, and genomic stability, we compared the genomes of 28 D. pigrum strains that were collected across 20 years. Phylogenomic analysis showed closely related strains circulating over this period and closure of 19 genomes revealed highly conserved chromosomal synteny. Gene clusters involved in the mobilome and in defense against mobile genetic elements (MGEs) were enriched in the accessory genome versus the core genome. A systematic analysis for MGEs identified the first candidate D. pigrum prophage and insertion sequence. A systematic analysis for genetic elements that limit the spread of MGEs, including restriction modification (RM), CRISPR-Cas, and deity-named defense systems, revealed strain-level diversity in host defense systems that localized to specific genomic sites, including one RM system hot spot. Analysis of CRISPR spacers pointed to a wealth of MGEs against which D. pigrum defends itself. These results reveal a role for horizontal gene transfer and mobile genetic elements in strain diversification while highlighting that in D. pigrum this occurs within the context of a highly stable chromosomal organization protected by a variety of defense mechanisms. IMPORTANCE Dolosigranulum pigrum is a candidate beneficial bacterium with potential for future therapeutic use. This is based on its positive associations with characteristics of health in multiple studies of human nasal microbiota across the span of human life. For example, high levels of D. pigrum nasal colonization in adults predicts the absence of Staphylococcus aureus nasal colonization. Also, D. pigrum nasal colonization in young children is associated with healthy control groups in studies of middle ear infections. Our analysis of 28 genomes revealed a remarkable stability of D. pigrum strains colonizing people in the United States across a 20-year span. We subsequently identified factors that can influence this stability, including genomic stability, phage predators, the role of MGEs in strain-level variation, and defenses against MGEs. Finally, these D. pigrum strains also lacked predicted virulence factors. Overall, these findings add additional support to the potential for D. pigrum as a therapeutic bacterium.

A subset of D. pigrum isolates encode for innate antibiotic resistance to kanamycin and/or erythromycin. Harmless members of human microbiota can serve as reservoirs of antibiotic resistance. Therefore, we searched for antibiotic resistance genes in these 28 D. pigrum genomes finding that six of the isolates encode predicted antibiotic resistance genes. Prediction of antibiotic resistance was determined by querying the genomes in parallel through the Comprehensive Antibiotic Resistance Database (CARD) in the Resistance Gene Identifier (RGI, version 3.1.0) API platform using default settings (6,7). Results were considered significant if either a perfect or strict match based on a protein homolog AMR detection model was detected. Four genomes had a 100% identity match based on the RGI (6, 7) to a kanamycin nucleotidyltransferase ANT(4')-lb that is encoded in integrated sequence homologous to pUB110 (CDC 4709-98, KPL3043, and KPL3065/KPL3086, the latter two having nearly identical genomes) (Fig. S2B). The latter three of these along with another clade 4 isolate (KPL3050) and a clade 1 isolate (KPL3250) also encoded an rRNA adenine N-6-methyltransferase (ErmT). With a strict identity match in the RGI of 86.53%, the predicted ermT gene is located within the CRISPR array of a subtype II-A system ( Fig. 8A-B) in a different location in the genome than the kanamycin nucleotidyltransferase (CS1 vs. pUB110 in Fig. S2A).
Detailed description of D. pigrum restriction-modification (RM) systems. We identified Type I-IV RM systems across the 28 D. pigrum genomes located in three RMS insertion sites (Fig. 7). Type I RM systems typically consist of three separate genes encoding a restriction subunit (hsdR), a modification subunit (hsdM) and a recognition/specificity (hsdS) subunit. These three form a multi-subunit complex to catalyze both restriction and modification activities that generally target bipartite DNA motifs comprising two half-sequences separated by a gap (8). Two D. pigrum isolates, KPL3246 and KPL3264, were found to contain individual Type I systems and as each methylome contained characteristic bipartite motifs, CRTAN7TCNNC and CTAN7TGC respectively, associated with m6A modifications they were assigned as active.
Type II RM systems generally consist of independent restriction endonuclease (REase) and modification methyltransferase (MTase) proteins that do not form a complex, but instead recognize the same target motif and compete for activity. We identified 20 individual Type II systems across the 28 D. pigrum isolates and assigned the target motif to 15 of these based upon methylome analysis and/or REBASE homology to empirically characterized RM systems from other species (Fig. 7A, Table S4). A candidate motif was not detected for an m5C-associated RM system that co-occurred immediately downstream of the G m5 CNGC system in four isolates (KPL3264, KPL3911, KPL3084, and KPL3070). Additionally, we were unable to unambiguously assign motifs to two Type IIG RM systems that occurred in KPL3256 (REBASE assignments: Dpi3256ORF5220 and Dpi3256ORF1810), although well-informed guesses could be made (CCAGT and GACAG, respectively). Type IIG systems are defined by the presence of a single polypeptide including an REase and an MTase domain that share a target recognition domain. A single candidate modified motif for one of these systems was identified during SMRTseq (CC m6 AGT).
Type III RM systems consist of two genes (mod and res) encoding protein subunits that function either in DNA recognition and modification (MTase) or restriction (REase) activities. All Type III REases recognize asymmetrical DNA sequences and the modified DNA bears methyl groups on only one strand of the DNA recognition motif. We found three D. pigrum isolates (KPL3274, KPL3052 and KPL3070) harbored characteristic Type III systems and assigned these to specific hemi-methylated recognition motifs (CAACA, GTCAT, YACAG) detected during methylome analysis (Table S4). In such systems, the REase must interact with two copies of its nonpalindromic recognition sequence and the sites must be in an inverse orientation within the substrate DNA molecule for cleavage, which occurs at a specific distance away from one of the recognition sequences.
Type IV restriction enzymes are technically not true RM systems since they comprise only a restriction enzyme and have no accompanying methylase. These restriction enzymes recognize and cut only modified DNA, including methylated, hydroxymethylated and glucosyl-hydroxymethylated bases. We identified a largely conserved single gene Type IV system in 10 D. pigrum isolates ( Fig. 7; Table S4). However, the targets of Type IV systems cannot be determined through SMRT sequencing and methylome analysis and, therefore, exact determination of its target recognition motif was not possible here.
Nevertheless, there are three well characterized Type IV systems to date each with defined sequence preference and cleavage position (9). The D. pigrum Type IV system is most homologous (99% coverage, 42% identity) to the SauUSI of Staphylococcus aureus; a modified cytosine restriction system targeting S 5m CNGS (where S is C or G), but this level of homology is insufficient to confirm the exact modified motif targeted in the D. pigrum system (10).  (19). Besides these two sets of closely related isolates, other less closely related isolates also shared more than three spacers. KPL3090 and KPL3052 both further apart in Clade 4 shared 7 spacers, which included matches to Enterobacteria phage RB49 (AY343333; spacer 39) (20) and the JP555 plasmid pJFP55H from Clostridium perfringens JP55 strain (NZ_CP013043.1; spacer 47) (21). All spacers sequences with their genome matches can be found in Table S5B.