Comparative genomics reveals the correlations of stress response genes and bacteriophages in developing antibiotic resistance of Staphylococcus saprophyticus

ABSTRACT Staphylococcus saprophyticus is the leading Gram-positive cause of uncomplicated urinary tract infections. Recent reports of increasing antimicrobial resistance (AMR) in S. saprophyticus warrant investigation of its understudied resistance patterns. Here, we characterized a diverse collection of S. saprophyticus (n = 275) using comparative whole genome sequencing. We performed a phylogenetic analysis of core genes (1,646) to group our S. saprophyticus and investigated the distributions of antibiotic resistance genes (ARGs). S. saprophyticus isolates belonged to two previously characterized lineages, and 14.91% (41/275) demonstrated multidrug resistance. We compared antimicrobial susceptibility phenotypes of our S. saprophyticus with the presence of different ARGs and gene alleles. 29.8% (82/275) carried staphylococcal cassette chromosome mobile elements, among which 25.6% (21/82) were mecA+. Penicillin resistance was associated with the presence of mecA or blaZ. The mecA gene could serve as a marker to infer cefoxitin and oxacillin resistance of S. saprophyticus, but the absence of this gene is not predictive of susceptibility. Utilizing computational modeling, we found several genes were associated with cefoxitin and oxacillin resistance in mecA− isolates, some of which have predicted functions in stress response and cell wall synthesis. Furthermore, phenotype association analysis indicates ARGs against non-β-lactams reported in other staphylococci may serve as resistance determinants of S. saprophyticus. Lastly, we observed that two ARGs [erm and erm (44)v], carried by bacteriophages, were correlated with high phenotypic non-susceptibility against erythromycin (11/11 and 10/10) and clindamycin (11/11 and 10/10). The AMR-correlated genetic elements identified in this work can help to refine resistance prediction of S. saprophyticus during antibiotic treatment. IMPORTANCE Staphylococcus saprophyticus is the second most common bacteria associated with urinary tract infections (UTIs) in women. The antimicrobial treatment regimen for uncomplicated UTI is normally nitrofurantoin, trimethoprim-sulfamethoxazole (TMP-SMX), or a fluoroquinolone without routine susceptibility testing of S. saprophyticus recovered from urine specimens. However, TMP-SMX-resistant S. saprophyticus has been detected recently in UTI patients, as well as in our cohort. Herein, we investigated the understudied resistance patterns of this pathogenic species by linking genomic antibiotic resistance gene (ARG) content to susceptibility phenotypes. We describe ARG associations with known and novel SCCmec configurations as well as phage elements in S. saprophyticus, which may serve as intervention or diagnostic targets to limit resistance transmission. Our analyses yielded a comprehensive database of phenotypic data associated with the ARG sequence in clinical S. saprophyticus isolates, which will be crucial for resistance surveillance and prediction to enable precise diagnosis and effective treatment of S. saprophyticus UTIs.


IMPORTANCE
Staphylococcus saprophyticus is the second most common bacteria associated with urinary tract infections (UTIs) in women.The antimicrobial treatment regimen for uncomplicated UTI is normally nitrofurantoin, trimethoprim-sulfamethox azole (TMP-SMX), or a fluoroquinolone without routine susceptibility testing of S. saprophyticus recovered from urine specimens.However, TMP-SMX-resistant S. sapro phyticus has been detected recently in UTI patients, as well as in our cohort.Herein, we investigated the understudied resistance patterns of this pathogenic species by linking genomic antibiotic resistance gene (ARG) content to susceptibility phenotypes.We describe ARG associations with known and novel SCCmec configurations as well as phage elements in S. saprophyticus, which may serve as intervention or diagnostic targets to limit resistance transmission.Our analyses yielded a comprehensive data base of phenotypic data associated with the ARG sequence in clinical S. saprophyticus isolates, which will be crucial for resistance surveillance and prediction to enable precise diagnosis and effective treatment of S. saprophyticus UTIs.99% prevalence), and 6,517 were cloud genes (<15% prevalence).We used core gene alignments to build a maximum-likelihood phylogenetic tree (Fig. 1A; Table S1).The "water striders" shape (38) of our S. saprophyticus tree exhibited a long internal branch separating two very distinct sub-populations (Fig. S1A).This result was consistent with a prior report of S. saprophyticus, though their genomic phylogeny was built on whole-genome single nucleotide polymorphisms (SNPs) mapped to a single reference S. saprophyticus strain ATCC 15305, and they designated the two subpopulations as lineage G and S (30).To confirm that lineage definitions were not biased by sampling, as well as to assign the lineage group of our isolates, we generated a combined core gene alignment of all published S. saprophyticus genomes and the genomes from our work (Fig. S1B).We confirmed that these genomes were separated into two major groups, and all G and S isolates from Lawal et al. belonged to different groups.Thus, we proceeded with utilizing G and S as the lineage names in our study.Among our cohort, 76% (209/275) of S. saprophyticus isolates were from lineage G, which differed by between 16 and 4,429 core gene single nucleotide polymorphisms (cgSNPs) with a whole-genome average nucleotide identity (ANI) of 99.2116%-99.9971%.Our isolates from lineage S (n = 66) had cgSNPs of 0-5,182 with an ANI of 99.2776%-99.9997%(Fig. S1C).The cgSNP distances and whole genome ANI compared between G and S isolates were 8,720-10,967 and 98.5267%-99.1449%,respectively (Fig. S1C).Isolates from lineage G were mostly from North America (160/209), while lineage S had more isolates collected from South America (30/66; Fig. 1B).Intriguingly, the tree shapes of the two lineages are dissimilar, indicating potentially distinct evolutionary patterns.To confirm this assumption, we utilized rhierBAPS (39,40) to hierarchically cluster the core genes of S. saprophyticus.Four clusters were detected at level 1, among which three were from lineage G (cluster 1, n = 139; cluster 2, n = 53; cluster 4, n = 17), and all S isolates were characterized as one cluster (cluster 3; Fig. 1A).Furthermore, using principle coordinates analysis (PCoA) on the presence-absence matrix representing all accessary (i.e., non-core) genes, we found that different lineages or clusters mixed within the plot, indicating that S. saprophyticus accessory gene content does not recapitulate the core gene structure (Fig. S1D).
Next, we identified ARGs encoded by S. saprophyticus and compared their distribu tions between lineages or clusters.We detected 29 ARGs of 9 antimicrobial categories in our cohort.The antimicrobial categories were used to describe the acquired resistance profile in S. aureus (41).All S. saprophyticus isolates carried at least two ARGs, and one isolate (UA-007) had up to 12 ARGs (Fig. 1A).The numbers of ARGs of G isolates (range: [2][3][4][5][6][7][8][9][10][11][12] were significantly larger than those in S isolates (range: 2-10) determined by Wilcoxon rank-sum test (P-value is 0.0016), although both contained a median of four ARGs (Fig. 1C).When comparing among clusters, we only observed differences in ARG numbers carried by isolates from cluster 1 and 3 (P-value is 0.0012 by Wilcoxon rank-sum test; Fig. 1C).We determined antimicrobial resistance (AMR) phenotypes and the βlactamase activities of our S. saprophyticus isolates by disk diffusion and Cefinase assays, respectively.14.91% (41/275) S. saprophyticus isolates demonstrated multidrug resist ance (MDR), defined as the isolate was non-susceptible to at least one agent in more than three antimicrobial categories (41), including β-lactams, folate pathway inhibitor, lincosamides, macrolides, and tetracyclines.Specifically, 7/41 isolates were MDR due to their non-susceptibility against β-lactams, macrolides, and tetracyclines; 24/41 isolates were MDR due to their non-susceptibility against β-lactams, lincosamides, and macro lides; 8/41 isolates were MDR due to their non-susceptibility against β-lactams, folate pathway inhibitor, and macrolides; 1/41 isolate was MDR due to its non-susceptibility against β-lactams, folate pathway inhibitor, macrolides, and tetracyclines; and 1/41 isolate was MDR due to its non-susceptibility against β-lactams, folate pathway inhibitor, lincosamides, and macrolides.We observed no differences in MDR rates between lineages or clusters (χ 2 test, P-values are 0.74 and 0.54, respectively; Fig. 1D).Furthermore, the β-lactamase activity and the resistance rates (the number of resistant isolates to nonresistant isolates) against cefoxitin and oxacillin were significantly different between the two lineages (χ 2 test, P-value is 0.0005, 0.0435, and 0.0015; Fig. 1E); the β-lactamase activity and the resistance rates against clindamycin, doxycycline, TMP-SMX, cefoxitin, and oxacillin were significantly different between clusters (Fig. S1E).To visualize the AMR patterns of each individual isolate, we grouped genetic data with susceptibility pheno types in the core gene phylogenetic tree (Fig. 1F; Table S2).In sum, our globally diverse collection of human pathogenic S. saprophyticus comprises two major lineages that exhibit distinct AMR burdens.Both genotypical and phenotypical data indicate the development of MDR in our cohort.

Non-typeable SCC elements were identified in S. saprophyticus linked with AMR
Staphylococcal cassette chromosome mec (SCCmec) is a genetic mobile element that conveys the central determinant of the broad-spectrum β-lactam resistance encoded by the mecA gene (42).Additionally, SCCmec element often carries site-specific recombina ses designated as cassette chromosome recombinases (ccr) (43,44).To date, 11 SCCmec types have been characterized for S. aureus based on the compositions of their mec and ccr genes (45), and SCC elements that do not carry mec gene have also been observed (46).Given the roles of SCC elements in transmitting methicillin resistance, we assessed the distribution of SCC elements in S. saprophyticus.Among our isolates, 29.8% (82/275) carried ccr genes, and the incidence of mecA gene was 7.6% (21/275; Fig. 2A).There was no difference in SCCmec prevalence between lineages (Fig. S2A).Within the SCC + isolates, rlmH was located at the 5'-end of the SCC element (Fig. S2B and C), suggesting the insertion site of SCC element was overlapping with rlmH, a similar organization to the ones in other staphylococcal species (47).Gene rlmH, encoding rRNA large subunit methyltransferase H, was detected in all (275/275) S. saprophyticus by reciprocal BLAST, signifying their role to serve as recipients of SCC transferring.Additionally, SCC elements in S. saprophyticus contained other characteristic genes which have been reported in methicillin-resistant S. aureus (MRSA) (48), such as capsule gene cluster (cap), copper resistance (cop), cadmium resistance (cad), or the arsenic resistance operon (ars; Fig. S2B  and C).We attempted to determine SCC types (45) in S. saprophyticus based on the gene structure of ccr and mec complexes (Fig. 2B and C).We found that 3/21 of the mec complexes found in our cohort belonged to class B [composed of mecA, a truncated mecR1 resulting from the insertion sequence IS1272 upstream of mecA, and IS431 downstream of mecA (45)] and 16/21 belonged to class A [contains mecA, the complete mecR1 and mecI regulatory genes upstream of mecA, and IS431 downstream of mecA (45)].IS431 sequences were detected in most SCCmec-positive isolates but on separated contigs with mec or ccr genes.Except for that, two isolates (1809848 and CHLA-009) showed the coexistence of mec genes and IS256 in their genomes (Fig. S2B).On the other hand, the compositions of ccr complexes of S. saprophyticus were more diverse and novel compared to those in MRSA.Specifically, 8/21 mecA + SCCmec and 2/61 mecA − SCC elements were identified as carrying two ccr gene complexes.The most common ccr combination was ccrA1/ccrB3 (n = 48), and others included ccrA1/ccrB1 (n = 4), ccrA1/ ccrB2 (n = 1), ccrA1/ccrB4 (n = 1), ccrA2/ccrB3 (n = 1), ccrA3/ccrB3 (n = 3), and ccrA4/ccrB4 (n = 4).Among these, only ccrA1/ccrB1, ccrA3/ccrB3, and ccrA4/ccrB4 were reported in MRSA.In addition, gene ccrC1 was detected in 28 isolates.Two isolates (1612274 and sapwu-046) carried ccrA1 and a new allele of ccrB united ccrB1 (1-875nt) and ccrB3 (940-1,626 nt; Fig. S2B and C).In summary, novel SCC elements in S. saprophyticus, such as the ones carrying new ccr compositions, are non-typeable according to the current SCCmec classification from S. aureus (45).This highlights key differences among staphylococcal species and motivates further studies of the transmission of SCC elements.

S. saprophyticus mutants demonstrate variable resistance patterns to β-lactam antibiotics
Bacteria have developed various mechanisms to combat β-lactam antibiotics.One of the major resistance mechanisms relies on the production of β-lactamase enzymes which hydrolyze the β-lactam ring, thereby inactivating the drug (49).All our S. saprophyticus isolates carried a bla gene (class A β-lactamase, 873 bp), and six of them also encoded blaZ (penicillin-hydrolyzing class A β-lactamase, 846 bp; Fig. 3A).Interestingly, 83.3% (5/6) of blaZ + isolates were from lineage S. Furthermore, penicillin resistance was higher for the isolates carrying mecA or blaZ genes (χ 2 test, both P-values are 0.0005; Fig. 3A  and B).Inference of penicillin resistance using mecA or blaZ as the markers showed good performances with an accuracy (true susceptible and true resistant) of 96.73% (266/275; Fig. S3A).The prediction errors (major error and very major error are 7/265 and 2/265, respectively) were from the isolates only carrying mecA but not blaZ, although no relationship was found between mecA variants and penicillin phenotypes (Fig. S3B).
Next, we generated a hierarchical tree based on the amino acid sequence identities of bla (Fig. 3A) and tested the associations of different mutations with β-lactamase activities and β-lactam resistance.We defined different bla alleles if they carried a single amino acid substitution; alleles were numbered and analyzed if they were present in at least 10 isolates, otherwise were labeled as "Others"; T183, T109, and T54 represented truncated bla genes based on their putative peptide length.The distribution of bla alleles was highly correlated with S. saprophyticus lineages and clusters (χ 2 test, both P-values are 0.0005; Fig. 3A).Isolates with allele 6 and the truncated bla (T54) did not show β-lactamase activity.The alignment of bla alleles highlights the unique mutations in allele 6, A120T and T215D, located in the catalytic domain  referring to the features of β-lactamase (UniProt-Q49V79_STAS1) protein of S. saprophyticus type strain ATCC 15305 (E-value is 8.01e-186).The Delta Delta G (DDG) values of these amino acid substitutions were −0.48 to −0.61 and −1.30 to −1.03 at normal urine pH [5.5-7.54 (50)], predicted by I-Mutant (51), suggesting their potential of decreasing protein stability and influencing β-lactamase productions (Fig. 3D; Fig. S3C).We also observed that 10 out of 21 mecA genes in our cohort were identified in isolates carrying T54 bla (n = 38), which were clustered together in the core gene phylogeny (Fig. S3D), indicating a different evolutionary history of these isolates.Furthermore, given the function of mecA against cefoxitin and oxacillin (Fig. 2D), we compared the resistances of bla alleles in mecA − isolates.The resistance rate against cefoxitin or oxacillin varied by the presence of different bla alleles (Fig. 3E and F), suggesting their roles in resisting these two β-lactam agents in S. saprophyticus.However, we could not only rely on bla alleles or β-lactamase production to predict susceptibility phenotypes of cefoxitin or oxacillin (Fig. S3E), and more genomic determinants needed to be discovered to explain and predict the AMR.

Putative antibiotic resistance determinants were detected in S. saprophyticus against β-lactams
To further address the knowledge gap in genomic determinants of cefoxitin and oxacillin resistance, particularly among mecA − S. saprophyticus (n = 254), we utilized computa tional models to identify genomic correlates of susceptibility phenotypes.All accessory genes present in at least 10 isolates, and 131 unique amino acid substitutions (referred to as "gene alleles" later) across 36 core genes served as candidates for the correlation analysis (Table S3).Analogs of these 36 genes are reported to be essential for β-lac tam resistance in other staphylococci, especially S. aureus (52,53), and are involved in encoding PBPs, cell envelope synthesis, stress responses, nucleotide metabolism, or metal homeostasis (Table S4).105 genes and 15 gene alleles were significantly correlated with cefoxitin resistance (Table S5) detected by MaAsLin2 (54) with a q-value threshold for significance as 0.25.The top features anticorrelated or correlated with cefoxitin, respectively, were gene group_1086 (6/198  Subsequently, we evaluated the ability to infer susceptibility phenotypes from genotype using a Random Forest Classifier (RFC), trained on the presence or absence of all accessory genes (Fig. 4A), gene alleles (Fig. 4B), or the genes with significant correlations tested above (Fig. 4C).The model with associated genes and gene alleles showed the best prediction performance for both cefoxitin and oxacillin (Fig. 4D): for cefoxitin resistance prediction, its area under the receiver operating characteristic (ROC) curve (AUC; 0.7657 ± 0.1151) was significantly higher than AUCs from the models with all accessory genes (0.6886 ± 0.1245) and gene alleles (0.6049 ± 0.1188) by Wilcoxon rank-sum test (P-value is 1.80e-05 and 2.22e-16, respectively); for oxacillin resistance prediction, its AUC (0.7846 ± 0.0511) was significantly higher than the AUC from the model with all accessory genes (0.7291 ± 0.0458) by Wilcoxon rank-sum test (P-value is 1.70e-11) but similar with the AUC from the model using gene alleles (0.7729 ± 0.0461, P-value is 0.3600).

Erythromycin/clindamycin ARGs are possibly transferred by phage elements
Bacteriophages can act as ARG carriers in various environments (35,36,(70)(71)(72).Phageencoded ARGs are considered a substantial dissemination threat due to their prolonged persistence, fast replication rate, and potential board host range.Therefore, we assessed the prevalence of phage signatures in our cohort and analyzed their association with phenotypic non-susceptibility among S. saprophyticus.By analyzing WGS data, we identified 520 prophage sequences in 91.3% (251/275) isolates (Fig. 5E; Table S7).The average phage number, as well as the proportion of phage-containing isolates, was similar for S. saprophyticus from different lineages or clusters (Fig. S5B).After assigning phage taxonomy, we found that 210 of the phages belonged to the Siphoviridae family, and 17 were from the Myoviridae.The distributions of these two types of phages were different among lineages: 84.8% (178/210) of siphoviruses and 94.1% (16/17) myoviruses were detected in lineage G, whereas 15.2% (32/210) of siphoviruses and 5.9% (1/17) myoviruses were from lineage S (Fig. 5E).We grouped phage sequences with a 95% ANI cutoff and defined each such group as a phage "population." Isolates from different lineages contained distinctive phage populations (Fig. 5F).This implies either a strainlevel specificity of phage infections in S. saprophyticus or differential phage environments for G and S isolates.

DISCUSSION
Here, we present a genomic comparison of 275 human pathogenic S. saprophyticus isolates, collected from multicenter healthcare networks.Building from our global phylogenomic characterization of S. saprophyticus lineages, we focused on profiling the S. saprophyticus antibiotic resistome, including analysis and prediction of genotypephenotype associations.To test the cefoxitin and oxacillin resistance of our S. sapro phyticus, we used the disk diffusion method following the procedural guidelines of Staphylococcus epidermidis outlined by the CLSI (M100 31st, 2021) (13), given that CLSI is still in the process of defining an optimal surrogate method for S. saprophyticus.
When comparing susceptibility phenotypes between lineages, G isolates displayed lower phenotypic resistance against oxacillin (146/209 compared to 60/66 among S isolates, P-value is 0.0015 by χ 2 test) but higher resistance against cefoxitin (34/209 compared to 4/66 among S isolates, P-value is 0.0435 by χ 2 test; Fig. 1E).In some Staphylococcus species, β-lactam agents, such as oxacillin and cefoxitin, are used as surrogate markers to predict mecA-mediated methicillin resistance (74)(75)(76).Accordingly, the discrepant burdens of oxacillin and cefoxitin resistance in our cohort prompted us to analyze the occurrence of mecA and SCCmec elements that transfer mecA.Approximately 7.6% (21/275) of our isolates encode mecA, which is similar to a previous report from Japan (77).However, the incidence of mecA gene in S. saprophyticus can be varied across cohorts differing in clinical significance, size, and area (15,78,79).We observed that the presence of mecA is predictive of oxacillin and cefoxitin resistance for S. saprophyticus, but its absence is not predictive of susceptibility.Indeed, 90 83) and found they were all non-typeable according to the current SCCmec schemes of S. aureus (45) due to the absence of amplification products for hitherto known ccr genes.Indeed, we also observed several mecA − SCC elements and novel configurations of the ccr gene complex in our cohort.One of these non-typable ccr complexes, ccrA1/ccrB3, was previously reported in S. saprophyticus (83) but not other configurations such as ccrA1/ccrB2, ccrA1/ccrB4, and ccrA2/ccrB3 (Fig. 2B and C).In contrast, most mec complexes found in S. saprophyticus were conserved with those in MRSA, implying a similar evolutionary origin.Additionally, two S. saprophyticus isolates contained mec genes adjacent to the mobilization element IS256.This composition has been described in other CoNS, such as S. epidermidis, Staphylococcus haemolyticus, Staphylococcus hominis, Staphylococcus sciuri, and Staphylococcus cohnii (84,85) but not in MRSA.IS256 is widespread in the genomes of multiresistant enterococci and staphylococci (86)(87)(88).In S. epidermidis, IS256 has been recognized as a marker of hospital-acquired MDR and biofilm-forming strains causing opportunistic infections in immunocompromised patients (89)(90)(91)(92).In these two S. saprophyticus strains, we detected the biofilm operon (ica) next to the IS265 SCCmec that was inverted in the genome compared to other SCCmec (Fig. S2B), indicating a unique evolutionary history.Altogether, our findings highlight the diversity of SCC elements among S. saprophyticus and motivate further studies on their classification, transmission, and clinical relevance.Since the mecA gene cannot explain all β-lactam resistances in S. saprophyticus, we used computational models to identify other potential genomic correlates of the resistance phenotypes, which would be beneficial for future resistance prediction.The results show that cefoxitin and oxacillin-correlated gene features are different, suggesting potentially distinctive resistance mechanisms against these two drugs in S. saprophyticus.Among mecA − isolates, gene group_1086, mgrA, and pdhD had positive correlations, and certain gene alleles of prkC, gdpP, and murF had negative correlations with cefoxitin resistance (Table S5).For oxacillin, gene alleles of pbpH, pyrB, ahpF, glmSU, and mrcA (Table S6) exhibited negative correlations to the resistance phenotype.We observed that several of these genes were involved in cell envelope synthesis and stress response and had been reported to correlate with AMR in other bacterial species.Gene group_1086 is annotated as an alcohol dehydrogenase (ADH) that is widely present among bacteria (93) and mitigates alcohol toxicity (94).A recent study noted that E. coli adhE was able to bind with ampicillin and exhibited higher expression levels under ampicillin stress and low intracellular alcohol conditions (95).Next, mgrA, a global regulator, has been found to affect several efflux pumps in S. aureus, such as norA and norB for MDR, and tet38 for tetracycline resistance (96,97).Gene pdhD encodes a membrane protein, dihydrolipoyl dehydrogenase (DLD), which belongs to the oxidoreductase family and is essential for energy metabolism (98).In Vibrio parahaemo lyticus, DLD levels were upregulated in antimicrobial peptide-resistant clones at both translational and transcriptional levels (99).GdpP is a phosphodiesterase that catalyzes the hydrolysis of intracellular secondary messenger c-di-AMP.S. aureus gdpP deletion mutants have been shown to elevate resistance to β-lactams and other cell wall-target ing antimicrobials (100,101).It also has been shown that mutations of pyrB, encoding aspartate carbamoyltransferase, can alter the susceptibility of P. aeruginosa to β-lactams (102).Moreover, five types of PBP were detected in our cohort, including PBP2a (mecA), PBP 1A (mrcA), PBP 2B (pbpB), PBP H (pbpH), and PBP 1A/1B (ponA).Two alleles of mrcA and three alleles of phpH have been shown to influence oxacillin susceptibilities in S. saprophyticus (Table S6).Lastly, in general, single-pass transmembrane proteins with extracellular PBP and serine/threonine kinase-associated (PASTA) domains are important to the cell wall stress response (103,104).For example, the PASTA kinases of S. aureus are essential for β-lactam resistance.However, resistance patterns vary amongst strains, and the mechanism is still understudied (105)(106)(107).Genes in the mur operon are probably the substrates of S. aureus PASTA kinases (108), and murF is essential for the optimal expression of methicillin resistance (109).In S. saprophyticus, the serine/threonine-pro tein kinase PrkC is a potential element of the PASTA system.One prkC variant and one murF variant were correlated with cefoxitin resistance (Tabls S5).In addition, ahpF and glmU are also identified as potential substrates of PASTA kinases (110,111).Alkyl hydroperoxide reductase AhpF, protecting cells against reactive oxygen species (112), is important for the tolerance of E. coli cells against antibiotics causing DNA damage (113).The GlmSMU pathway responds to produce the Uridine diphosphate-N-acetylglucosa mine, an essential peptidoglycan and cell wall teichoic acid precursor (110).We identified alleles of ahpF and glmSU of S. saprophyticus that were anti-correlated with oxacillin resistance (Table S6).In summary, the diverse functions of the above genes indicate the complexity of putative mechanisms of resistance against cefoxitin and oxacillin, which might involve numerous catalytic and metabolic pathways in S. saprophyticus.One hypothesis is that genes affecting cell wall construction and stress response can increase bacterial tolerance to survive antibiotic assault.Better resistance prediction performance may be achieved by refining clinical AST breakpoints for S. saprophyticus and by including gene expression data from the different observed mutants.
Finally, we characterized phage signatures in S. saprophyticus, motivated by the understanding that phages could promote host genetic diversity and niche adaptation by horizontal gene transfer (114)(115)(116)(117)(118)(119).In our cohort, the majority of S. saprophyticus isolates (91.2%) contained at least one phage element.This high phage genomic prevalence may indicate that they contribute to S. saprophyticus fitness in certain environments (37).Earlier work has suggested that phage-carrying ARGs are enriched in the genomes of antibiotic-treated communities (37,120,121).Accordingly, we annota ted ARGs within phage backbones and evaluated their correlation with phenotypic non-susceptibility since prior reports have suggested they may be non-functional (122).We found that 20.6% (107/520) of S. saprophyticus phage sequences contained ARGs, belonging to fosfomycin (fosB and fosD), macrolide/lincosamide [erm and erm (44)v], or trimethoprim (dfrG) resistance classes.The gene erm (44), having around 84% ANI with erm in S. saprophyticus, was previously reported in a prophage of Staphylococcus xylosus and exhibited resistance to erythromycin, together with inducible resistance to clindamycin (66).In contrast, gene erm (44)v was originally described in an S. saprophyticus isolate and conferred resistance to macrolides and lincosamides (67).We observed that phage-encoding erm and erm (44)v in our S. saprophyticus showed high non-susceptibility against erythromycin/clindamycin.Although neither lincosamides nor macrolides antibiotics were used clinically in the treatment of UTI due to the limited excretion in the urine, we tried to detect their clinical relevance for the S. saprophyticus isolates outside the urinary tract.While S. saprophyticus is primarily an uropathogen, our cohort included 24.5% (67/275) of non-urinary S. saprophyticus isolates; specifically, they were recovered from blood culture (n = 46), wounds/tissues/bone (n = 15), the respira tory system (n = 3), and sterile body fluid (n = 3).Surprisingly, the genomic dissimilarity between strains did not correlate with the body site of isolation (Fig. S1F).This may imply that S. saprophyticus can transmit and survive in diverse physiological conditions.We then compared the distribution of macrolide/lincosamide phenotypic non-suscepti bility and ARGs between isolates from blood, urine, and wound/tissue/bone, and we observed similar distribution in these body sites, besides msr(A)/mph(C) gene associated with erythromycin resistance (Fig. S4B and C).Due to the unbalanced sampling, we assumed that we did not have enough statistical power to detect differences across the distribution based on body sites.A focused study of S. saprophyticus transmission will require a more balanced sample set in terms of body site of isolation.Further compara tive genomics studies should also consider fecal samples from UTI patients, given prior reports of correlations between uropathogen bladder colonization and gastrointestinal colonization (123,124).
Our study has limitations.Our AST method, disk diffusion, does not generate an minimal inhibitory concentration (MIC) value.However, disk diffusion is a reproducible and standardized CLSI standard method that is widely used in clinical testing and has been well investigated in the setting of Staphylococcus spp.relevant to the genotypephenotype correlation.However, this method relied on manual measurement of the zone of clearance.In addition, only one brand of disk for each of the antibiotics and one brand of Mueller-Hinton agar were evaluated in our study.It is possible that variations of the cation concentration between manufacturers may influence AST results (125).
In summary, we performed a comparative phylogenomic and resistome analysis of a globally diverse collection of 275 human pathogenic S. saprophyticus isolates.We compared phenotypic antibiotic susceptibility with potential resistance determinants inferred from current ARG databases and staphylococcal literature.We found that a few documented ARGs [e.g., tet(K), dfrCG, erm, erm (44)v, erm(C), abc-f, msr(A), and msr(A)/mph(C)] from other staphylococci are associated with phenotypic resistance to doxycycline, TMP-SMX, erythromycin, or clindamycin in S. saprophyticus detected in our cohort.In contrast, the genetic antecedents of β-lactam resistance in S. saprophyticus are more complicated.Penicillin susceptibility is correlated with mecA or blaZ.For oxacillin and cefoxitin, the presence of mecA is indicative of a resistance phenotype, but the absence of this gene is not predictive of susceptibility to β-lactam antibiotics using current CLSI interpretive criteria.We also identified several genes involved in stress response and cell wall synthesis to be correlated with resistance to these two drugs.Finally, we describe ARG associations with known and novel SCCmec configurations as well as phage elements in S. saprophyticus, which may serve as intervention or diagnostic targets to limit resistance transmission.S1).Isolates were recovered from human urine specimens (n = 208), blood cultures (n = 46), wounds/tissues/bone (n = 15), the respiratory system (n = 3), and sterile body fluid (n = 3).Their purity was evaluated by streaking on blood agar plates (BAPs, Hardy Diagnostics).Microbial identification was confirmed as S. saprophyticus using matrix-assisted laser desorption/ionization time-offlight mass spectrometry with the VITEK MS system (bioMérieux).

Resistance characterization and Cefinase assay
Susceptibility testing was performed for TMP-SMX, doxycycline, erythromycin, clinda mycin, penicillin, and cefoxitin using Hardy Kirby-Bauer Disks (Hardy Diagnostics) and oxacillin using BD BBL disks (Becton, Dickinson and Company).Methods followed the procedural guidelines outlined by the CLSI (documents M02 and M100) (13,126,127).Isolates were grown from the frozen stock onto BAPs and subcultured on BAPs and then 3-5 colonies of pure growth were suspended in 0.85% sterile saline at 0.5 McFarland standard.The suspension was used to inoculate a lawn on Mueller-Hinton Agar (MHA, Hardy Diagnostics).After 16-18 hours (24 hours for cefoxitin) incubation, the zone of clearance around the disks was manually measured with a metric ruler.S. aureus ATCC 25923 was used as a quality control strain.Detection of β-lactamase production was assessed by nitrocefin-based Cefinase disk test without induced (Hardy Diagnostics).

ARG and SCC element identification
ARGs were identified by AMRFinder (v3.8.4) (73) using results from Prokka as inputs, including assembled genomes (.fna), predicted genes (.faa), and master annotations (.gff).A presence-absence matrix of all ARGs was generated using MATLAB, with associated metadata displayed as color strips to represent isolate lineage, cluster, and corresponding resistance phenotypes in Fig. 1F.Phage-carrying ARGs were also tested by AMRFinder with 75.0%identity and 50.0%coverage as the threshold for the purpose of identifying the functional ARGs.Gene alignment was performed using Clustal Omega (https://www.ebi.ac.uk/Tools/msa/clustalo/) on extracted sequences at nucleotide or amino acid level.The hierarchical tree based on gene sequence alignment was generated using Jalview (https://www.jalview.org)and visualized with iTOL.All genes of SCC elements were identified and annotated with the online tool SCCmecFinder (45) using Roary pan-genome sequence as the input.Hierarchical clustering of mec and ccr gene contents was performed by the pheatmap function (R pheatmap packages) (142) and labeled with isolate lineage, cluster, and resistance phenotypes by color strips.The alignment of SCC elements was presented by Easyfig (143).

Prediction accuracy
To evaluate the prediction accuracy links between genotype and phenotype, we applied specific rules related to the presence of ARGs and antibiotic resistance.We assumed that all the ARGs found in the strains were expressed.All S. saprophyticus isolates carrying mecA were predicted to be resistant to β-lactams, including cefoxitin, oxacillin, and penicillin (Fig. 2D).Isolates carrying mecA or blaZ were also used to predict penicillin resistance (Fig. S3A).A very major error was defined as inferring susceptibility from genomic data, while the strain was resistant to AST.A major error was defined as inferring resistance from genomic data, while the strain was susceptible by AST.True resistant and true susceptible indicated that the prediction was identical to the AST result.

Determining phenotype-associated genes and RFC modeling
A presence-absence matrix was built for all accessory genes and 131 unique amino acid substitutions across 36 core genes related to β-lactams reported in other staph ylococci (Table S3).Then, this matrix was analyzed by MaAsLin2 (54) to determine the features that were correlated with cefoxitin and oxacillin resistance using the following options: min_prevalence, 0.039 (i.e., present in ≥10 isolates); analysis_method, "LM"; normalization, "CLR"; transform, "None"; others were using the default.RefSeq and UniProtKB accession numbers of the top 8 correlates and the 36 core genes were detected by UniProtBLAST (https://www.uniprot.org/blast)using sequences in pan_genome_reference.fa from Roary (Tables S4 to 6) to check for their annotations and functions.To evaluate the prediction performance from genomic data to resistance phenotype, we conducted a custom machine-learning process employing random forest analysis using the randomForest function (R randomForest package) (144) with default parameters and the following adjustments: ntree = 5,000, proximity = FALSE, importance = TRUE, and mtry = 3. Genes or gene alleles with a prevalence >3.9% (i.e., ≥10 isolates) were included in the analysis.The model was run over 100 iterations of the 75/25 training/testing data set splits.The model performance was measured through the AUC estimator with the prediction and performance functions (R ROCR package) (145).The mean AUC value was reported with 95% confidence intervals.The ROC plot was generated using the predict and roc functions (R pROC package) (146).

Phage sequence identification and validation
Assemblies from Unicycler were piped through Cenote-Taker 2 to identify putative phage contigs (147) with end features as direct terminal repeats indicating circularity and inverted linear repeats (ITRs) or no features for linear sequences.The linear viral contigs were then binned by VAMB (148) due to the highly fragmented assemblies from short reads, resulting in 1,200 clusters.Contigs in each cluster were concatenated and filtered by length and completeness to remove false positives.Specifically, the length limits were 1,000 nt for the detection of circularity, 4,000 nt for ITRs, and 5,000 nt for other linear sequences.The completeness was computed as a ratio between the length of our phage sequence and the length of matched reference genomes by CheckV (149), and the threshold was set to 10.0%.Phage contigs passed these two filters were then run through VIBRANT with "virome" flag to further remove obvious non-viral sequences (150).As a result, 520 putative viral sequences were identified (Table S7).

Phage taxonomy and population
Protein sequences created by CheckV were used as input for vConTACT2 with "DIA MOND" and database "ProkaryoticViralRefSeq207-Merged" to assign taxonomy (151).For the "unsigned" ones from vConTACT2, we used the tentative taxonomy from Cenote-Taker 2 inferred using BLASTP against a custom database containing Refseq virus and plasmid sequences from GenBank (147).The final viral taxonomy was determined at the family level and used for further analysis (Table S7).Based on MIUViG recommen ded parameters (152), phages were grouped into populations if they shared ≥95% nucleotide identity across ≥85% of the genome using BLASTN and a CheckV supporting code, anicalc.py(https://bitbucket.org/berkeleylab/checkv/src/master/).The result was visualized using Cytoscape (https://cytoscape.org).

FIG 1
FIG 1 Relatedness and antibiotic resistance profile of S. saprophyticus recovered globally from human infections and colonization during 2012-2021.(A) Phylogenetic tree demonstrating the core gene alignment of 275 S. saprophyticus isolates.Each node represents an isolate.Two lineages are indicated with color ranges covering the complete clade branches.Hierarchical clusters based on rhierBAPS are indicated by color strips in the internal ring.The continent (Continued on next page)

FIG 1 (
FIG 1 (Continued) of origin and year of collection, as well as the number of ARGs carried by each isolate, are labeled by color strips.The scale bar represents the average number of nucleotide substitutions.(B) Distribution of S. saprophyticus isolates in terms of their continents in the two lineages.(C) Box plot comparing the number of ARG between isolates from different lineages (left) and clusters (right).(D) Comparing multidrug resistance (MDR) rates between isolates from different lineages (left) and clusters (right).(E) Comparing percentages of isolates resistant to different antibiotics or expressing β-lactamase activity between lineages.Susceptibility phenotypes: resistant, orange; intermediate, yellow; susceptible, light blue.β-lactamase activity: positive, dark blue; negative, light gray.In D and E, the χ 2 test is used with a significance threshold of 0.05.* P < 0.05, **P < 0.01, and ***P < 0.001.(F) Annotating the core gene phylogenetic tree with antibiotic resistance phenotypes, linking with the presence of various ARGs.Genotypical and phenotypical data are grouped by antimicrobial class.Lineages, clusters, resistance phenotypes, and ARG content are indicated by color strips.

FIG 2
FIG 2 Diversity of SCC elements in S. saprophyticus and their resistances against β-lactams.(A) Distribution of SCC elements in G and S lineages.(B) Hierarchical tree of 21 isolates with SCCmec structured by the composition of mec and ccr genes.(C) Hierarchical tree of 61 isolates with SCC elements but not carrying mecA, structured by the composition of ccr genes.Lineages, clusters, and resistance phenotypes are indicated by color strips in B and C. (D) Prediction accuracy of genotype to phenotype inference for the 21 mecA + strains of the study.It is used mecA gene as the marker to predict the resistance against three β-lactam antibiotic drugs.

FIG 3
FIG 3 Associations between gene alleles encoding β-lactamase identified in S. saprophyticus and their AMR against β-lactam antibiotics.(A) Hierarchical tree of bla sequences from all S. saprophyticus isolates, according to amino acid sequence identity.Alleles 1-8 indicate bla gene alleles with at least one amino acid substitution that each is present in at least 10 isolates, otherwise are labeled as "others." Alleles T183, T109, and T54 indicate truncated bla based on their putative peptide length.S. saprophyticus lineages, clusters, resistance phenotypes, and β-lactamase activity are indicated by color strips.The presence of mecA and blaZ genes is symbolized by the orange star and blue circle, respectively, at the tip of the branch.Tree branch length is ignored.(B) Percentage of isolates resistant to penicillin out of the total number of S. saprophyticus carrying mecA and blaZ, compared to isolates with none of them.(C) Percentage of isolates exhibiting β-lactamase activity, detected by Cefinase assay, out of the total number of S. saprophyticus carrying specific bla gene alleles.(D) Hierarchical tree of bla gene alleles 1-8 with the alignment of amino acid sequences.β-lactamase activity is indicated by color strips.The black box highlights the unique mutations in allele 6 with the potential to influence β-lactamase activity.The scale bar represents the average number of amino acid substitutions.See Fig. S3C for the whole-length alignment.(E and F) Left panel: percentage of isolates resistant to cefoxitin and oxacillin out of the total number of mecA − S. saprophyticus carrying specific bla gene alleles.Right panel: associated P-value based on the χ 2 test compared resistant rates between different alleles.The P-value is color-coded, and red indicates significant differences.The χ 2 test is used with a significance threshold of 0.05 in B-C and E-F.*** P < 0.001.

FIG 4
FIG 4 Prediction performances of genotype to phenotype inference for mecA − S. saprophyticus strains against cefoxitin and oxacillin, tested by RFC.In A-D, the top panel is for cefoxitin, and the bottom panel is for oxacillin.(A) ROC curves evaluate the ability to predict the resistance phenotype based on the presence and absence of all accessary genes, presenting in at least 10 isolates.(B) ROC curves evaluate the ability to predict the resistance phenotype based on the presence and absence of all gene alleles (present in ≥10 isolates) that have been shown related to methicillin resistance in other staphylococci.(C) ROC curves evaluate the ability to predict the resistance phenotype based on the presence and absence of significant AMR-related genes and alleles identified by MaAsLin2.In A-C, the red line represents the mean ROC curves from 100 RPC tests (light gray lines), and the AUC is exhibited for each model.(D) Box plots of AUC values from 100 RPC tests of different models in A-C.NS.P ≥ 0.05 and ***P < 0.001 as determined by Wilcoxon rank-sum test.

FIG 5
FIG 5 Genes and phage signatures correlated with non-β-lactam resistances in S. saprophyticus.(A-D) Box plots of zone diameters from disk diffusion tests for doxycycline, TMP-SMX, erythromycin, and clindamycin susceptibility, respectively.The phenotypes are represented by colors: red for resistant, yellow for intermediate, and white for susceptible.Each gray dot denotes an individual isolate carrying a specific ARG.In D, isolates that were tested for ICR by D-test were represented as red or dots for positive or negative results.Few isolates containing different ARGs correlated with the same antibiotics are ignored.NA indicates S. saprophyticus with no related ARGs.* P < 0.05, **P < 0.01, and ***P < 0.001 as determined by Wilcoxon rank-sum test.(E) Annotating phage signatures and phage-carrying ARGs on the core gene phylogenetic tree of S. saprophyticus.The scale bar represents the average number of nucleotide substitutions.The innermost ring represents bacterial lineages by color strips.The blue bar graph shows the phage number detected in each S. saprophyticus isolate.The distribution of phages from Siphoviridae or Myoviridae family, as well as the presence/absence of ARGs carried by phages, is visualized as filled and empty symbols.The inset shows the distribution of phage taxonomy in different lineages to their bacterial host belongs.(F) Phage populations containing at least 10 phage sequences, defined by MIUVIG-recommended parameters (95% ANI and 85% alignment fraction).Each node represents a phage sequence, and an edge indicates a similarity between its nodes.The phage populations are labeled with their bacterial host lineages (top) and the number of phage-carrying ARGs (bottom).(G) Percentage of isolates non-susceptible to clindamycin [ICR for erm and constitutive resistance for erm(44)v] and erythromycin out of the total number of S. saprophyticus carrying erm and erm(44)v within their phage elements.Distributions of these two genes against lineages are shown at the top.

TABLE 1
Genes associated with non-β-lactam resistance of S. saprophyticus in the present work a The number of isolates carrying the ARG.bThe ratio of the number of isolates showed phenotypic non-susceptibility to the number of isolates carrying the ARG.c mph(C) occurred only in combination with msr(A) in the present study.d Major Facilitator Superfamily.
This may indicate that the SCCmec mobilization rate or the types of SCCmec elements may be different among different staphylococci.A previous study used PCR to characterize the SCCmec composition of eight mecA + S. saprophyticus isolates ( (80)(81)(82)) and 44.7% (17/38) of S. saprophyticus isolates resistant to oxacillin and cefoxitin, respectively, did not carry mec genes.Interestingly, whereas most clinical S. epidermidis strains carry mecA(80)(81)(82), the prevalence of mecA in S. saprophyticus is much lower.