Genetic markers associated with host status and clonal expansion of Group B Streptococcus in the Netherlands

Objectives Certain Group B Streptococcus (GBS) genotypes are associated with invasive disease in neonates. We conducted a comparative genomic analysis of GBS isolates from neonatal disease and maternal carriage in the Netherlands to determine distribution of genetic markers between the two host groups. Methods Whole genome sequencing was used to characterise 685 neonatal invasive isolates (2006–2021) and 733 maternal carriage isolates (2017–2021) collected in the Netherlands. Results Clonal complex (CC) 17 and serotype III were significantly more common in disease while carriage isolates were associated with serotypes II, IV, V as well as CC1. Previously reported CC17-A1 sub-lineage was dominant among disease isolates and significantly less common in carriage. The phiStag1 phage, previously associated with expansion of invasive CC17 isolates in the Netherlands, was more common among disease isolates compared to carriage isolates overall, however it was equally distributed between CC17 isolates from carriage and disease. Prevalence of antimicrobial resistance genes was overall lower in disease compared to carriage isolates, but increased significantly over time, mediated by rise in prevalence of a multidrug resistance element ICESag37 among disease isolates. Conclusion There is a stable association between certain GBS genotypes and invasive disease, which suggests opportunities for developing more precise disease prevention strategies based on GBS targeted screening. In contrast, GBS mobile genetic elements appear less likely to be correlated with carriage or disease, and instead are associated with clonal expansion events across the GBS population.


Introduction
Streptococcus agalactiae (Group B Streptococcus, GBS) is a common coloniser of the vaginal and gastrointestinal tracts of healthy adults.Carriage of GBS during pregnancy represents a risk factor for the development of invasive disease in the newborn and GBS is a leading cause of invasive infection in neonates worldwide (Gonçalves et al., 2022).Beta-lactams represent the first choice for intrapartum antibiotic prophylaxis (IAP) during labour and treatment of GBS disease.While most GBS isolates remain susceptible to beta-lactams (Kobayashi et al., 2021), prevalence of resistance to second-line antibiotics such as erythromycin and clindamycin has been increasing (Slotved and Hoffmann, 2020;Kekic et al., 2021;Sabroske et al., 2023).
Group B Streptococcus isolates are often grouped based on their capsular polysaccharide (CPS), with 10 different serotypes described to date: Ia, Ib, and II-IX (Berti et al., 2014).GBS CPS is a major virulence factor of GBS and a number of GBS multivalent vaccines targeting CPS are currently under development (Absalon et al., 2022).GBS isolates are also characterised using multi-locus sequence typing (MLST), which has revealed that five GBS clonal complexes (CCs) are associated with colonisation and disease in humans: CC1, CC10, CC17, CC19, and CC23 (Björnsdóttir et al., 2016;Khan et al., 2022).Some GBS lineages are associated with specific CPS serotypes, for instance CC17 isolates express predominantly serotype III (Teatero et al., 2016).Associations between GBS molecular markers and different host groups have been observed, with CC17-serotype III dominant among neonatal GBS invasive disease (Teatero et al., 2016;Bianchi-Jassir et al., 2020;Jamrozy et al., 2020), while CC1 often associated with disease in the adult population (Flores et al., 2015).
We have previously reported that CC17 prevalence has increased among GBS isolates from neonatal disease in the Netherlands, which was associated with expansion of particular CC17 clonal groups and with acquisition of a novel phage phiStag1 (Jamrozy et al., 2020).It has been unclear whether the increasing prevalence of these CC17 clones occurred only among the diseaseassociated GBS isolates, or was reflective of a more broad expansion across the GBS population.To address this, we have used whole genome sequencing (WGS) to analyse and contrast population structures of GBS isolates from maternal carriage and neonatal disease, collected in the Netherlands.Furthermore, to better understand the genetic variability between isolates from the two at-risk populations, we compared the distribution of key GBS molecular markers such as serotype, CC, antimicrobial resistance (AMR) genes and the intra-lineage population structure within the major CCs.

GBS isolates
The collection consisted of 685 neonatal (<90 days old) invasive GBS isolates collected between 2006 and 2021, and 733 maternal carriage GBS isolates collected between 2017 and 2021 in the Netherlands.Isolates from neonatal disease were derived from a nationwide surveillance of bacterial meningitis and infant bacteraemia conducted by the Netherlands Reference Laboratory for Bacterial Meningitis (NRLMB).Disease isolates collected between 2006 and 2016 were described previously (Jamrozy et al., 2020).The infections were classified as early onset disease (EOD) at age 0-6 days, and as late onset disease (LOD) at age 7-89 days.Maternal carriage isolates were collected from pregnant women in hospitals in Amsterdam, The Hague, Utrecht, Hengelo, and Arnhem, for the Netherlands observational study on GBS disease, bacterial virulence and protective serology (NOGBS).Isolates were cultured from the vagina (n = 528) or urine (n = 205) according to local hospital protocols.

Whole-genome sequencing and post processing
Genomic DNA was extracted using either the Wizard R Genomic DNA Purification Kit or the Maxwell R RSC Cultured Cells DNA Kit (AS1620) from Promega.Tagged DNA libraries were created using NEBNext R Ultra TM II DNA Library Prep Kit for Illumina.Whole-genome sequencing was performed on the Illumina NovaSeq 6000 platform with 150 bp paired-end reads.Sequence reads were used to create assemblies using SPAdes v3.10.0 (Bankevich et al., 2012).Annotated assemblies were produced as described previously (Page et al., 2016).

Whole-genome sequence data analysis
The sequence data was assessed using GBS QC pipeline v1.0.3 1 .Sequences that have passed QC were analysed using the GBS typer pipeline v1.0.10 2 to determine sequence type (ST), serotype, and AMR gene carriage.Novel MLST alleles and ST profiles were  et al., 2017) and a single locus variant for group definition.To determine the presence of a phiStag1 (Jamrozy et al., 2020) and ICESag37 elements, sequence reads were mapped to reference sequences (phiStag1: GenBank accession PP091924; ICESag37: accession no.CP019978, 629058-702486) with SRST2 v0.2 using default parameters (Inouye et al., 2014).Phylogenetic analyses were performed as detailed in Supplementary Methods.CC17 isolates from the Netherlands were supplemented with publicly available CC17 genomes to reconstruct a global, time-calibrated phylogeny as detailed in Supplementary Methods.

Statistical analysis
Fisher's exact test was used to determine significant association between host status and GBS genotypes, P-value < 0.001 was considered statistically significant.

Serotype, ST, and CC distribution among GBS from carriage and disease
The dataset consisted of 733 maternal carriage and 685 neonatal disease isolates.The majority of neonatal isolates were from EOD (62%) with the remainder derived from LOD (38%; Supplementary Table 1).
We analysed associations between CCs and serotypes (Figure 2 and Supplementary Figure 2), which showed that CC17 and CC23 carried a single dominant serotype, III and Ia, respectively, while the other main CCs had a higher serotype diversity (Figure 2).Most serotypes were associated with multiple CCs, except for VI and VII which were only identified in CC1, while serotype IX was found only in CC130 isolates (Supplementary Figure 2).
We wished to compare the distribution of genotypes between isolates from carriage and disease.However, since our dataset was not fully temporally matched, we needed to account for the possibility of sampling bias due to the previously reported temporal changes in the prevalence of certain GBS lineages among isolates from neonatal invasive disease in the Netherlands (Jamrozy et al., 2020).To account for the likelihood of a continuing temporal trend in frequency of GBS genotypes, we have evaluated the differences between carriage and disease isolates by comparing a full dataset as well as a subset consisting only of isolates that were collected during overlapping collection years (2018)(2019)(2020)(2021).As such, the latter included only the most recently collected disease isolates.
Across the full dataset we observed that serotype III was significantly more common in disease while serotypes Ib, II, IV, and V were more prevalent in carriage isolates (P < 0.001; Table 1).Among temporarily matched datasets, serotypes II, IV, and V remained more common in carriage although this was not statistically significant, while serotype III was still significantly associated with disease isolates.Among all isolates, ST17 was significantly more common in disease, while ST1, ST28, ST291, and ST569 were significantly associated with carriage isolates (P < 0.001; Table 1).In timematched datasets, these carriage-associated STs were still more prevalent among carriage isolates but this was not statistically significant.In contrast, ST17 was still significantly more common among disease isolates.In line with these associations, CC17 was significantly associated with disease while CC1 with the carriage isolates (P < 0.001), which was observed across the full and timematched datasets.Additionally, CC8 isolates were more common in carriage although this was statistically significant only for the full dataset.We also observed that CC19 was significantly (P < 0.001) Serotype distribution by host status and CC.EOD, early onset disease; LOD, late onset disease.more common in carriage but only within the time-matched dataset, due to a substantial drop in its prevalence in the most recent disease isolates.Regarding CC-serotype associations, isolates from CC24-serotype V and CC17-serotype IV were found exclusively in carriage isolates except for a single CC24-serotype V identified in disease isolate (Table 1).
We also compared the distribution of genotypes between maternal carriage isolates collected from vagina and urine and observed no variation in prevalence of serotypes and CCs between the two isolation sources (Supplementary Figure 3).

Phylogenetic structure and host status associations within GBS CC
Intra-lineage population structure was analysed by clustering each of the five major GBS CCs into phylogenetic clades (Figure 3).We have previously reported a clonal expansion of specific CC17 clades (CC17-A1 and CC17-A2) among GBS isolates from neonatal disease in the Netherlands (Jamrozy et al., 2020) and wished to compare their distribution among carriage and disease isolates, together with a broader comparison of GBS population between the two host groups.The phylogenetic trees of CC17 and CC23 revealed a single dominant clade (CC17-A and CC23-A, respectively), while the phylogenies of other CCs were more diverse, revealing between 4 and 6 distinct clades each.To identify the CC17 clades associated with the previously reported expansion, the dominant CC17 clade, CC17-A, was partitioned further into three sub-clades: CC17-A1, CC17-A2, and CC17-A0.For each clade identified, we calculated its prevalence across all carriage and disease isolates to identify dominant clusters within each host group and to compare their distribution (Figure 4).
To better understand the variable CC clade distribution between carriage and disease isolates, we also compared the prevalence of these clades within corresponding CC (Supplementary Figure 4).This has revealed that for CC1, CC8, and CC23 the distribution of clades was similar between carriage and disease isolates.For instance, CC1-A, CC8-C, and CC23-A represented dominant CC1, CC8, and CC23, respectively, clades in both carriage and disease.In contrast, for CC17 and CC19, we observed that variable clade distribution was associated with differences in CC17 and CC19 population structure between carriage and disease.As such, CC17-A1 was the dominant CC17 clade in disease isolates, while CC17 isolates from carriage showed an equal distribution of CC17-A1 and CC17-A2.The dominant CC19 clade in carriage isolates was CC19-B, while in disease the majority of isolates belonged to CC19-A.
Previous analysis of CC17 isolates from neonatal invasive disease in the Netherlands also revealed acquisition of a novel phage, phiStag1 (GenBank accession PP091924), which correlated with the clonal expansion of clade CC17-A1 (Jamrozy et al., 2020).In the current dataset, phiStag1 phage was found in 26% of all isolates, and it was significantly more common in disease (32%) in comparison to carriage (21%) isolates (Table 1).The phage was found predominantly in CC17 isolates where it was mostly associated with CC17-A1 and CC17-A2 (Supplementary Figure 5).Despite being more common in disease isolates overall, the phage was equally distributed among CC17 isolates from carriage and disease (Supplementary Figure 5).The phiStag1 phage was also detected in other dominant CCs: CC19 (10%), CC23 (28%), CC1 (5%), and CC8 (19%), where it was mostly equally distributed between carriage and disease (Supplementary Figure 5).

GBS resistome
Tetracycline resistance genes (tetM, tetO, and tetL) were the most prevalent AMR determinants, observed in 86% of all GBS isolates.They were equally represented in disease and carriage isolates (Supplementary The total number of isolates from carriage/disease with corresponding genotype is shown in brackets.The prevalence of genotypes is shown for the full and time-matched (2018-2021) datasets.
Overall, 6% of all GBS isolates carried aminoglycoside resistance genes, with similar prevalence in isolates from carriage (7%) and disease (6%) in a full dataset (Supplementary Table 2).
The global CC17 isolates clustered into the three previously observed clades: CC17-A, CC17-B, and CC17-C (Figure 5).The majority of CC17-A isolates were represented by clade CC17-A1 (45%).The ICESag37 element was identified in 10% of non-Dutch CC17 genomes and only in isolates belonging to CC17-A, predominantly in CC17-A2 (63%) but also in CC17-A1 (12%) (Figure 5 and Supplementary Figure 7).The ICESag37positive CC17 isolates were globally distributed and clustered into three distinct sub-clades, indicating multiple independent acquisition events followed by clonal expansion (Figure 5 and Supplementary Figure 7).It was estimated that all ICESag37positive sub-clades emerged in the 1990s.Based on this dataset, the first ICESag37 positive CC17 isolates were collected in 2010 in Canada and China, with the first isolation in the Netherlands in 2011 (Supplementary Figure 8).Regardless of the country of origin, the majority of globally derived CC17-A2 isolates collected between 2010 and 2021 were ICESag37 positive (Supplementary Figure 8).ICESag37 sequence from all globally distributed CC17 isolates shared significant nucleotide identity (93%-100%, median 99.8%) (Supplementary Figure 9).

Discussion
Intrapartum antibiotic prophylaxis currently represents the main strategy for the prevention of early onset GBS disease.This prevention strategy assumes an equal risk of neonatal invasive disease from any identified colonising GBS isolate.However, our and previous research clearly showed that some GBS genotypes carry a higher risk of neonatal disease.More studies are needed to investigate the pathophysiological mechanisms that drive these differences in invasive potential and evaluate the added value of GBS genotype determination to more precisely target GBS prevention.Our work has shown that, in line with previous reports, CC17-serotype III strains were significantly more common in disease (Kekic et al., 2021).while serotypes II, IV, V, and CC1 were associated with maternal carriage.We have also identified variable prevalence of some lineage-serotype combinations between the two host groups.This included isolates representing CC24-serotype V and CC17-serotype IV, which were associated with carriage.This suggests that the association between CC17 and neonatal disease is serotype III dependent.Although other serotypes have emerged within this GBS lineage, they appear less likely to cause neonatal infection as none of the CC17-serotype IV were observed among disease isolates in our collection.In contrast, serotype III remained associated with neonatal disease even after exclusion of all CC17 isolates (P < 0.001).
Our previous work has shown expansion of specific CC17 sub-clades, CC17-A1 and CC17-A2, among isolates from neonatal invasive disease in the Netherlands, which correlated with a rise in disease incidence in the country.A matched collection of isolates from maternal carriage from the Netherlands was not previously available, which hindered further investigation of the epidemiology of these clones in a wider GBS population.In this work, we addressed this data gap and compared the prevalence of different clades from major CCs, including CC17, between carriage and disease isolates.Overall, CC17-A1 clade was the most prevalent sub-lineage among all disease isolates, suggesting an increased capacity to cause disease.However, although it was considerably less common among all carriage isolates, the CC17 population from carriage was dominated by CC17-A1 and CC17-A2 isolates.This suggests that the previously reported rise in the frequency of these clusters in GBS from neonatal disease likely reflected their expansion in the carriage GBS CC17 population, which resulted in a spillover to invasive GBS population.
We also reported previously and in this work that the expanding CC17 sub-clades, CC17-A1 and CC17-A2, are associated with certain MGEs that might contribute to their prevalence.One is a novel phage, previously termed phiStag1, which emerged suddenly in the CC17 population around the mid-1990s (Jamrozy et al., 2020).A recent study has shown that the phage belongs to a novel group of phages designated streptococcal mobilisable prophages (SMphages) (Huang et al., 2023).The phage carries a putative virulence gene, which was termed Alp-P1 and was shown to promote the adhesion and invasion of bovine and human cells.These findings further indicate that phiStag1 might provide some selective advantage to its host and thus promote clonal expansion of CC17-A1 and CC17-A2.In our dataset, we found phiStag1 to be overall more common among disease isolates.However, among CC17 isolates, the phage was equally distributed among carriage and disease.Further work is needed to better understand phiStag1's role in GBS disease.While it was found more common in isolates from disease, this was likely driven by its association with CC17 and the dominance of this lineage within disease.It remains unclear if presence of this phage contributes to maternal colonisation, transmission to the infant or neonatal invasive disease.
We have also observed a high prevalence of the ICESag37 element among CC17 isolates.This MGE confers resistance to erythromycin, tetracycline and aminoglycosides (Zhou et al., 2017).It was first identified in the Sag37 strain, which represents ST12.In our dataset, ICESag37 was most common in CC17 (15%), followed by CC8 (4%), which includes ST12.Carriage of a MDR ICESag element, corresponding to ICESag37, has been reported previously in CC17 (Campisi et al., 2016).Our analysis of a global CC17 phylogeny has confirmed that ICESag37-positive CC17 isolates are widely distributed and have been found in Asia, Europe, and North America.We also observed that carriage of this MGE within CC17 is associated mostly with sub-clade CC17-A2.Within the Dutch GBS collection, CC17-A2 accounted for 87% of all isolates carrying ICESag37.As such, ICESag37-positive CC17-A2 isolates resistant to both macrolides and aminoglycosides might pose a clinical threat due to reduced options for first-and second-line antimicrobial treatment of GBS infections.
Limitations of our study include a temporal sampling bias, with disease and carriage isolates collected over different time periods, with only a 4-year overlap between the two collections (2018)(2019)(2020)(2021).To account for this, we conducted a parallel analysis of full and time-matched datasets.While some genotypes showed statistically significant associations across both datasets, for many the differences between carriage and disease isolates were no longer statistically significant in time-matched dataset, which is likely partly due to much lower disease sample size in the latter.However, the analysis also showed that the prevalence of AMR genes was higher in most recently collected disease isolates, which was associated with increase in frequency of isolates carrying the ICESag37 element.Finally, the maternal carriage isolates were recovered from vagina and urine, with the latter potentially associated with asymptomatic bacteriuria.However, we observed no variation in genotype distribution between isolates from these sources suggesting that GBS isolates from urine are acquired from the rectovaginal site and represent the same GBS population.
Here we report that the previously observed clonal expansion of CC17-A1 and CC17-A2 clades as well as the emergence of phiStag1 phage among GBS isolates from neonatal invasive disease in the Netherlands likely reflect changes in the maternal carriage population.Overall, our findings reinforce the importance of comparing GBS isolates from healthy individuals and patients to identify pathogen genotypes that might be associated with increased capacity to cause disease.Altogether this will provide pathogenicity markers that can be targeted in disease prevention strategies as well as molecular markers for surveillance of highrisk clones that demonstrate enhanced dissemination across GBS population irrespective of the host status.any identifiable patient information, the study did not involve direct interaction with human subjects.Furthermore, the research adhered to all relevant ethical guidelines and regulations governing the use of anonymized genomic data for scientific research purposes.Therefore, ethical approval was not deemed necessary for this type of retrospective, anonymized data analysis study.The studies were conducted in accordance with the local legislation and institutional requirements.Written informed consent for participation was not required from the participants or the participants' legal guardians/next of kin in accordance with the national legislation and institutional requirements because in accordance with guidance from the Netherlands Reference Laboratory for Bacterial Meningitis, patient data utilized in this study were obtained from an anonymized collection database.As a result, individual patients are not identifiable or traceable, aligning with Dutch Law requirements.This research was supported by a Clinical Research Grant of the Amsterdam Institute for Infection and Immunity and by Stichting Steun Emma Kinderziekenhuis and ItsME Foundation to MB and an  to DB. UK and DJ were supported by the Bill and Melinda Gates Foundation (grant INV-010426).SB was supported by the Wellcome Trust (grant 220540/Z/20/A).

FIGURE 1
FIGURE 1 Serotype and CC distribution among the GBS isolates by host status.(A) Proportion of all isolates representing each serotype, stratified by host status.(B) Relative serotype distribution in each host status group.(C) Proportion of all isolates representing each CC, stratified by host status.(D) Relative CC distribution in each host status group.(A,C) Total number of isolates for each serotype and CC, respectively, is displayed above the bars.

FIGURE 3
FIGURE 3Phylogenetic trees of the five major CCs.The branches of each tree are coloured in accordance with CC-specific clusters ID.Each tip is annotated with (from the innermost circle): host status, serotype, carriage of MLS B resistance genes, phiStag1 and ICESag37 (where applicable).Phylogenetic trees of (A) CC17, (B) CC23, (C) CC1, (D) CC8, and (E) CC19.

FIGURE 5
FIGURE 5Core genome time-calibrated maximum likelihood phylogeny of global CC17 GBS isolates.The tree consists of external (n = 650) and Dutch (n = 229) CC17 GBS genomes.Each tip is annotated with CC17 clade ID and country of isolation ("Other": countries represented by less than 10 isolates).Branches of clusters carrying ICESag37 are coloured in green.

TABLE 1
Prevalence of genotypes found to be differentially distributed between GBS from carriage and disease.