Genomic and functional determinants of host spectrum in Group B Streptococcus

Group B Streptococcus (GBS) is a major human and animal pathogen that threatens public health and food security. Spill-over and spill-back between host species is possible due to adaptation and amplification of GBS in new niches but the evolutionary and functional mechanisms underpinning those phenomena are poorly known. Based on analysis of 1,254 curated genomes from all major GBS host species and six continents, we found that the global GBS population comprises host-generalist, host-adapted and host-restricted sublineages, which are found across host groups, preferentially within one host group, or exclusively within one host group, respectively, and show distinct levels of recombination. Strikingly, the association of GBS genomes with the three major host groups (humans, cattle, fish) is driven by a single accessory gene cluster per host, regardless of sublineage or the breadth of host spectrum. Moreover, those gene clusters are shared with other streptococcal species occupying the same niche and are functionally relevant for host tropism. Our findings demonstrate (1) the heterogeneity of genome plasticity within a bacterial species of public health importance, enabling the identification of high-risk clones; (2) the contribution of inter-species gene transmission to the evolution of GBS; and (3) the importance of considering the role of animal hosts, and the accessory gene pool associated with their microbiota, in the evolution of multi-host bacterial pathogens. Collectively, these phenomena may explain the adaptation and clonal expansion of GBS in animal reservoirs and the risk of spill-over and spill-back between animals and humans.

New genome sequence data was generated for n=91 isolates at two institutions.The University of Glasgow had n=44 isolates (originating from Vietnam) sequenced through MicrobesNG (Birmingham, UK).Isolates were plated on sheep blood agar (E&O Laboratories, Bonnybridge, UK) and grown overnight at 37 • C to confirm viability and purity.One colony of each isolate was inoculated into Todd-Hewitt broth (Oxoid -Thermo Fisher Scientific, Waltham, Massachusetts, US) and incubated aerobically at 37 • C overnight.DNA was extracted with the GenElute Bacterial Genomic DNA Kit (Sigma-Aldrich, St. Louis, Missouri, US) as per the manufacturer's instructions.Library preparation was carried out with the Nextera XT DNA Sample Preparation Kit and Miseq Reagent Kit V2 Library Preparation Kit (Illumina Inc., San Diego, California, US) and DNA was sequenced on an Illumina HiSeq platform.Similarly, Illumina sequencing was performed by the Genome Institute of Singapore (GIS) on n=47 isolates (originating from Vietnam and Thailand) with the Efficient Rapid Microbial Sequencing (GERMS) platform (https://www.a-star.edu.sg/gis/our-science/genome-architecture-and-design/germs).

A.2. Genome selection
The initial dataset comprised 2,437 GBS genomes, including newly generated data (n=91, see section A.1) and data from public repositories (NCBI and ENA, literature was reviewed in March 2020).When data were available in the form of raw reads (n=2,047 genomes), genomes were assembled with either SPAdes v3.14.0 [1] (n=1,025) or velvet v1.2.10 [2] (n=1,022), the latter used as standard in the Sanger Institute assembly pipeline.We subsequently: i) curated metadata from the literature; ii) ran multi-locus sequence typing (MLST) with SRST2 v0.2.0 [3] and run MLST.py(the former on raw reads, the latter in case of assembled genomes).Single locus variants (SLV) of existing sequence types (ST) were detected (n=14; n=2 ST103, n=2 ST314, n=5 ST552, n=3 ST61, n=1 ST1409, n=1 ST415); of note, for n=75 genomes from Brazil ST could not be determined because a glcK allele could not be assigned (n=15 genomes from bovine, n=60 from fish); iii) determined the capsular type with blastn v2.12.0+ [4] based on the method developed by Metcalf and colleagues [5], and confirmed dubious cases with that of Sheppard and colleagues [6]; iv) de-duplicated the dataset for clonal isolates with the drop duplicates() method within Pandas v1.1.3[7], based on the following data: country (down to the province/state level or city for countries such as Canada and the USA), host-species (including specific fish species e.g., tuna, tilapia etc.), origin/clinical manifestation (e.g., carriage vs invasive), year of isolation, farm/unit of origin (this was applicable to most bovine and fish isolates, but only to a few human isolates for which herds-persons had been sampled together with their livestock).A total of 1,158 genomes were eliminated at this step, including all genomes from the Kawasaki paper [8] (n=39), because of inconsistencies between the isolate id/metadata and the associated genome assemblies for some genomes (e.g., capsular type and ST were not matching).
We then tested the remaining 1,279 genomes for assembly quality with QUAST v5.0.2 [9].Reference ranges for GC content (%) and total genome length (bp) were calculated as their mean (35.38 and 2,074,536, respectively) ± twice the standard deviation (2SD) (reference ranges for GC content: 34.63-36.13;total length: 1,627,943-2,521,130). Genomes that were outside at least one of these ranges (n=25) were omitted from further analyses.

A.3. Core genome analyses
To reconstruct a core genome phylogeny, the snippy-multi script from the snippy suite v4.4.5 (https://github.com/tseemann/snippy)was used for alignment.To define the best topology of the tree, an outgroup was included in a first analysis, including reference genomes from the closely related species Streptococcus pyogenes (NCTC8224, NCTC12059, NCTC12069, NCTC13737, NCTC13744).The snippy clean full aln script was then run on the core.full.alnfile, on which gubbins v3.2.0 [10] was run to remove recombination.IQ-TREE v.2.0.6 [11] was then used to infer the phylogeny on the filtered polymorphic sites file from gubbins, with a GTR model with 1,000 bootstrap replicates.The tree was visualised in Microreact [12].A clustering algorithm, fastbaps v1.0.8 [13], was run on the same filtered polymorphic sites file with the optimise.symmetricparameter to define genomic clusters.These were then renamed, inheriting numbers from the 7-gene MLST nomenclature of the most represented ST within each population, preceded by the acronym SL for sub-lineage.For example, population 1 was mostly represented by ST17 isolates, and it was therefore named SL17 (Fig C .1).CG were identified manually on the phylogeny within each SL.
For the classification of SL and CG in host restricted, host-adapted, and host generalists, the prevalence within SL/CG of the different hosts was calculated (Fig 2B).The prevalence in the dominant host within each SL/CG was then plotted as a histogram, to identify cut-offs.Host restricted include those SL/CG that primarily occur in one host species or host group (e.g., SL552 for fish, SL61 and SL91 for bovine, SL22 for human, SL609 and SL612 for camel), although rare exceptions of their isolation from other hosts may exist in the literature (e.g., incidental report of SL61 in humans [14]).Host-adapted SL/CG were defined as having more than 80% but less than 98% of isolates originating from the same host species or host group; and host generalists, were defined as having no more than 80% of isolates originating from a single host or host group (Fig 1).Homologous recombination was visualised with Phandango [15] from the gubbinsgenerated recombination-prediction file.A generalised linear model was run in RStudio v2022.07.01,R v4.2.0 (2022-04-22), on the output from gubbins, after having mapped the internal nodes and the leaves to the corresponding CG, to test association between the host-specialisation level (generalists, adapted, restricted) and the ratio of recom-bination vs mutation (r/m).The total number of CDS per genome was assessed with the panaroo-qc script.

A.4. Accessory genome analyses
To calculate pairwise distances of accessory gene content of isolates (using the Jaccard similarity index), a gene presence/absence matrix was generated with panaroo v1.2.0 from gff files annotated with Prokka v1.14.5 [16].The matrix was processed with GraPPLE (Graphical Processing for Pangenome Linked Exploration) (downloaded on 27th April 2021; https://github.com/JDHarlingLee/GraPPLE).The resulting file was visualised with Graphia v2.2 [17] (k-NN using edge weight k=18 and descending rank order).Pilus island genes (Ap1 and Ap2, for both PI-1, PI-2a and PI-2b), which are important surface proteins, were detected with tblastn.This was done because an association between pilus variant and host of origin was reported in GBS [18], suggesting that pili may play a role in host adaptation, with PI-2a being the most common variant in human isolates and PI-2b the most frequent among bovine isolates.
A.5. Genome-Wide Association Studies (pyseer and scoary) For the pangenome-based method, starting from the panaroo-generated presence/absence gene matrix, scoary v1.6.16[19] was used to detect gene-enrichment in the three host groups (human n=671, bovine n=342, fish n=180).This was done with the no pairwise flag, as described in the scoary GitHub page (https://github.com/AdmiralenOla/Scoary). Sensitivity, specificity, p-values and negative/positive correlation of gene variants with host groups were assessed.For the unitig-based method, the pyseer suite v1.3.3 [20] was used to find association between unitigs and host groups with a mixed effects model.Unitig-counter v1.0.5 was used to count unitigs on the 1,254 genome assemblies of the deduplicated dataset.The phylogeny distance.pyscript was used to create a distance matrix from the core genome phylogeny.A binary (0 vs 1) phenotype file was created for each of the three host groups, and pyseer with a linear mixed model was run for each group (e.g., pyseer -lmm -phenotypes host.pheno-kmers unitigs clean.txt.gzsimilarityphylogeny K.tsv -output-patterns host unitigs patterns.txt-cpu 24).The count patterns.pyscript was then used to determine a significance threshold using the number of unique unitig patterns (patterns: 260,474; threshold: 1.92 × 10 -7 ).Significant unitigs were then extracted and mapped to reference genomes (NZ CP019979 for human, CP008813 for bovine, CP003919 for fish) and visualised with Phandango [15].Manhattan plots were built with seaborn v11.2 [21].Unitigs were then annotated with annotate hits pyseer with host-specific references (complete genome assemblies available from NCBI): NZ CP019979 for the human phenotype (an ST19 serotype III isolate), CP008813 for the bovine phenotype (an ST103 serotype Ia isolate; one of the only two complete GBS genomes from bovine that are publicly available) and CP003919 for fish (an ST552 serotype Ib).The former two were plotted with python (Figs C.2A and C.2B).Pyseer also allows unitig annotations from a list of reference and draft genomes, which was done for the three phenotypes [draft], JN BR GBS104 [draft]; fish: CP003919, CP007482, CP019802, NC 018646, STIR-CD-25); these did not vary the results from annotation with one reference.

A.6. In vivo assessment of the role of Locus 3 in fish infection
A.6.1.Bacterial strains and plasmids Strains and growth media are shown in Table C.1.The temperature-sensitive Escherichia coli shuttle vector pG + host9 [22] was used for plasmid mediated allele exchange mutagenesis in GBS strains to generate ST7 and ST283 knock-out mutants of Locus 3 (∆Locus3).Genomic DNA and plasmid DNA were extracted and purified using a GenElute bacterial genomic DNA extraction kit (Sigma-Aldrich) and a QIAprep Spin Miniprep kit (Qiagen), respectively.DNA was quantified using NanoDrop One UV-Vis spectrophotometer (ThermoFisher Scientific).All kits were used according to manufacturers' guidelines.
A.6.2.Construction of the pG + host9 mutagenesis plasmid Genomic sequences flanking the mutagenesis target regions were amplified from GBS ST7 gDNA using the Phusion PCR kit (New England Biolabs -NEB).Oligonucleotide primers for amplifying 5' and 3' flanking PCR sequences are described in Table C.2. PCR products were visualised using 1% (w/v) TAE agarose gel electrophoresis, against a 1kb DNA ladder.Flanking PCR products were purified using a Qiaquick PCR purification kit (Qiagen).Purified flanking PCR products were digested with the appropriate BamHI or EcoRI restriction endonuclease.Digested 5' and 3' flanking PCR products were ligated together using T4 DNA ligase to create mutant alleles.Mutant allele ligation products were amplified by Phusion PCR (NEB), using mutagenesis target primers F1 and R4, before Qiaquick PCR purification (Qiagen).The mutagenesis plasmid, pG + host9, was digested with SmaI restriction endonuclease and purified with the Qiaquick PCR purification kit.Purified mutant alleles were ligated into linearized pG + host9 using T4 DNA ligase with a molar insert to vector ratio of 10:1 and incubated overnight (16 • C).Electrocompetent E. coli TG1-dev cells were prepared [23] and transformed with 4µL of overnight ligation mixture by electroporation using a 0.2mm electroporation cuvette and Gene Pulser XCell electroporation system (Bio-Rad) with parameters set at 2,000V; 25µF; 200Ω.Transformed cells were immediately suspended in 400µL of SOC medium and incubated (37 • C, 1 hour).Ten-fold dilutions of transformed cells were plated onto Luria-Bertani (LB) agar supplemented with 400µg/mL erythromycin and incubated (37 • C, 2 days).E. coli TG1-dev transformants were screened for presence of mutant allele (mutagenesis primers F1-R4) by colony PCR using GoTaq Green mastermix.Plasmids were extracted from positive transformants and the mutant allele region was verified by sequencing using mutagenesis primers F1 and R4.

A.6.3. Transformation of GBS and temperature-mediated allele exchange mutagenesis
Electrocompetent GBS cells were prepared [24] and transformed with 1µg of pG + host9 construct by electroporation using a 0.1mm electroporation cuvette and Gene Pulser XCell electroporation system with parameters set at 2400V; 25µF; 100Ω.Transformed cells were immediately suspended in 400µL of SOC medium and incubated (28 Nile tilapia (Oreochromis niloticus) fingerlings were acclimatised for 1 week prechallenge in 450L freshwater tanks at 30 • C. In preparation for challenge, fish were transferred into 100L experimental tanks (n=4 replicates per GBS strain, n=2 negative control replicates) under the same ecological conditions.Ten fish from each experimental tank were challenged by intraperitoneal (IP) injection with 0.1mL of PBS (negative control) or 0.1mL of 10 5 CFU/mL of GBS in PBS.IP-challenged tilapia were co-habited with 40 healthy tilapia per tank to provide contact challenge.Mortality of IP-challenged and contact-challenged fish was monitored for 21 days.Moribund fish were euthanized and included in mortality counts.Kidney and brain samples were taken from up to 10 fish per tank (n=5 IP-challenged, n=5 contact exposed) and used to check for presence of GBS, cultures of which were stored in BHI with 30% glycerol (v/v) at -80 • C. Locus 3 gene specific PCR assays were performed to confirm the identity of strains isolated from challenged fish.

A.6.5. Statistical analysis
Kaplan-Meier survival analysis [25] was conducted to compare survival of knock-out mutants and their isogenic WT strains.A statistical Log-rank test was conducted [26], using Microsoft Excel software, to compare survival curves of tilapia challenged by WT vs knock-out mutants. A.

B.2. Accessory genome analyses
Similar to previous studies [18], in our dataset we found a higher prevalence of PI-2a in human genomes (79.3%) compared to bovine (46.5%) and fish (30.6%) genomes.However, we found only a slightly higher frequency of PI-2b in bovine (53.2%) compared to human (20.9%), with the highest prevalence among fish genomes (70.6%).When observing the distribution of pili among the phylogeny (section B.4), it is clear that the association of these mobile elements is more associated with SL/CG than with the host of origin.

B.3.1. Scoary
For the human phenotype, together with positively-associated genes with high sensitivity and specificity (part of the scpB transposon, SE 91.1-92.6%,SP 57.3-72.2%),some high-scoring genes were negatively-associated, in particular genes belonging to the Lac.2 operon lacEFG (SE 5.3%, SP 50.9%).Ten genes were identified as humanspecific by scoary, i.e., found uniquely in GBS genomes from humans, although not in all of them.This included a TetR transcriptional regulator and a multidrug transporter (see S2

B.3.2. Pyseer
The toxin-antitoxin system pezAT identified by scoary as significantly-associated with the bovine phenotype was also detected by pyseer, as were several genes belonging to an ICE (see section B.3.1) and a prophage.These same phage genes matched those of two incomplete prophages (which show a high similarity to each other) that had previously been reported as bovine-associated by Richards and colleagues [29] (the prophage in reference genome CP008813 had 96.36%ID and 68% query coverage with phage A from Richards and colleagues [29], and 93.70% ID and 67% query coverage with phage B) (see section B.3.3).For the fish phenotype, pyseer results were unsatisfactory.Although population structure should have been controlled for (this was done after several iterative trials, setting an optimal minor allele frequency, or MAF, of a minimum of 0.04 and a maximum of 0.96), no evident peaks were present in Manhattan plots of any complete genome sequences from fish in this dataset (n=39).

B.3.3. BLAST
To confirm association of pezAT, the ICE and the prophage with the bovine phenotype, tblastn searches (thresholds of 90% identity and 80% query coverage) were performed on amino-acid sequences of pezAT, of two ICE genes (LPXTG cell wall anchor protein and N-6-DNA methylase) and two prophage genes (phage tail spike protein and phage tail tape measure protein).All three elements were confirmed as more prevalent among bovine GBS genomes, although not as common as Lac.

Figure C. 1 .
Figure C.1.Alluvial diagram showing the inheritance of 7-gene multi-locus sequence typing (MLST) nomenclature on sublineages (SL) and clonal groups (CG).Note that some new sequence types (ST) and those that could not be defined due to a missing locus are indicated with NF (not found).

Figure C. 2 .
Figure C.2. Annotated unitigs mapped to two reference genomes.A) Human-associated unitigs annotated to reference genome NZ CP019979; B) Bovine-associated unitigs annotated to reference genome CP008813.Genes in common with results from scoary have been labelled.For the human-associated unitigs, labelled genes are all part of the scpB transposon.For the bovine-associated unitigs, lacABCDEFGR are all part of the Lac.2 operon, whereas pezAT are associated with an integrative conjugative element.

Figure C. 3 .
Figure C.3.Heat-map showing the Locus 3 gene identities of a dataset of 212 Streptococcus iniae genomes (NCBI, June 2024).Genes were detected with tblastn, and a minimum threshold of 90% of query coverage was set, whereas the percentage of gene identity here displayed is referred to the amino acid sequence of each gene vs the reference in Group B Streptococcus.The gene numbers correspond to the ones in Fig 5 of the manuscript.

Figure C. 4 .
Figure C.4.Maximum-likelihood phylogenetic tree of SL17 (A), and of SL19 (B).Capsular type (CPS) and host of origin are indicated.Presence/absence of host-associated accessory genes (scpB, Lac.2, Locus 3) and pilus island genes is shown.In SL19, clonal groups (CG) are also indicated; this was not done for SL17, as it comprises one unique CG (CG17).SL17 and SL19 are human-specialists, and they show a complete lack of Locus 3 and a preponderant absence of Lac.2, whereas the human-associated gene scpB is highly conserved in both SL.Pilus islands PI-1 and PI-2b are almost inariably present in SL17, whilst this combination is rare in other sublineages.

Figure C. 5 .
Figure C.5.Maximum-likelihood phylogenetic tree of SL91 and SL61 (A), and of SL103 and SL314 (B).Sublineage (SL), clonal group (CG), capsular type (CPS) and host of origin are indicated.Presence/absence of host-associated accessory genes (scpB, Lac.2, Locus 3) and pilus island genes is shown.SL91 and SL61 are bovine-specialists.They show a total absence of Locus 3 and an almost complete absence of scpB, whereas the bovine-associated Lac.2 gene cluster is highly conserved across both SL.Pilus island PI-2b is found in both lineages and PI-1 has been acquired in one geographically restricted (Italy) branch of SL91.Clustering of host-generalist lineages SL103 and SL314 was not as neat as for the other lineages (one genome phylogenetically close to SL103 was categorised as belonging to SL314 and vice-versa).Despite being commonly found in cattle and humans, scpB is largely absent whereas Lac.2 and Locus 3 are often present.Pilus island PI-2b is highly conserved across both SL.

Figure C. 6 .
Figure C.6.Maximum-likelihood phylogenetic tree of SL552 (A), and of SL1 (B).Clonal group (CG), capsular type (CPS) and host of origin are indicated.Presence/absence of host-associated accessory genes (scpB, Lac.2, Locus 3) and pilus island genes is shown.SL552 is a fish-specialist (fish and other cold-blooded species such as frogs).It shows a complete absence of scpB and Lac.2, whereas Locus 3 is omnipresent.Pilus island PI-2b is found in all genomes from this SL, albeit incomplete in CG261.SL1 is a host-generalist (human, bovine, other hosts).scpB is highly prevalent in this lineage, whereas Lac.2 is variably present, and mostly associated with bovine isolates.Locus 3 is mostly absent.A tendency towards human-specialisation can be observed within CG459 (CPS IV), in which a sub-clade is uniquely associated with the human host, all genomes carry scpB and lack both Lac.2 and Locus 3. Predominant pilus island genes associated with CG1 are PI-1 and PI-2a, whereas CG459 and CG817 mostly show presence of PI-1 and elements of PI-2a.Across SL1, especially in CG817, some isolates carry PI-2b genes.
Transformed cells were plated neat onto TSA+2µg/mL erythromycin and incubated (28 • C, 2 days).GBS transformants were screened for presence of mutant allele (mutagenesis primers F1-R4) by colony PCR using GoTaq Green mastermix.Plasmids were extracted from positive transformants, digested with BamHI restriction endonuclease and visualised on a 1% TAE agarose gel against a 1kb DNA ladder for verification of mutagenesis plasmid constructs.Transformants containing mutagenesis plasmid constructs were cultured at 28 • C (permissive temperature) overnight in 5mL of Tryptic Soy Broth (TSB) +2µg/mL erythromycin.Overnight cultures were diluted in 20mL TSB+2µg/mL erythromycin and grown at 28 • C until reaching OD600 nm 0.25, transferred to 37 • C (non-permissive temperature) and incubated overnight.Ten-fold serial dilutions were prepared, 100µL of 10 −1 to 10 −4 dilutions was plated onto TSA+2µg/mL erythromycin and incubated (37 • C, 2 days).Ten colonies of GBS were collected onto a single sterile inoculation loop and used to inoculate 20mL of TSB+2µg/mL erythromycin which was incubated (37 • C, overnight).Overnight cultures were diluted 1:50 to TSB without antibiotic and incubated (28 • C, until the end of the day), then diluted 1:50 to TSB without antibiotic and incubated again (28 • C, overnight).This procedure was repeated once before ten-fold serial dilutions were prepared and 100µL of 10 −4 to 10 −7 dilutions plated onto TSA without antibiotic and incubated (37 • C, overnight).200 colonies per mutant strain were replica plated onto TSA+2µg/mL erythromycin and TSA without antibiotic and incubated (37 • C, overnight).Replica GBS strains which were susceptible to erythromycin were selected from the TSA without antibiotic plate and incubated in TSB (37 • C, 1 hour).•C,overnight).Genomic DNA was extracted from overnight cultures.Mutagenesis of target Locus 3 regions were confirmed firstly by PCR amplification of gDNA using mutagenesis primers F1 and R4 and secondly by DNA sequencing across the gDNA target region with mutagenesis primers F1 and R4.Specific Locus 3 genes (galactokinase; α-galactosidase; ABC sugar permease substrate binding protein; PTS system galactitol specific IIC component gene number 14) were screened using PCR to detect presence or absence in mutant strains using primers shown in TableC.2.A.6.4.Tilapia challenge studiesChallenge studies were conducted by a contract research organisation specialised in aquatic animal health (Ictyodev, France).Studies were performed to Good Laboratory Practice standards, approved by an internal Ethics Committee and the French Ministry of Research, and audited by an independent Quality Assurance partner.Challenge studies were conducted with wild type (WT) GBS ARG/BAC/2014-107 (ST7) and ARG/BAC/2016-6 (ST283) and their ∆Locus3 knock-out mutants.
humans: 27.0% and 27.1%; and in fish: 10.6% for both (in fish, ICE genes were uniquely found in ST7 genomes).Prophage in bovines: 69.0% and 69.9% (phage tail spike protein and phage tail tape measure protein, respectively); in humans: 37.9% and 38.5%; and in fish: 10.6% for both (in fish, prophage genes were uniquely found in ST7 genomes, the same ones that also carried ICE genes).Fig C.4A), is a human-adapted lineage that shows rare possibility of spill-over events in both cattle and fish.The most prevalent capsular type (CPS) detected in this lineage is III, with a few CPS IV being detected.Strong humanassociation of this lineage is reflected in the pattern of presence/absence of the three host-associated accessory gene clusters: scpB +, Lac.2-(with two exceptions) and Locus 3-.Pilus island genes associated with this lineage are PI-1 and PI-2b.Recombination is almost absent from this SL (Fig2A).Fig C.4B), is a human-adapted lineage that shows possibility of spill-over events, primarily in cattle.Two main capsular types are detected in this lineage, whereby CPS II is associated with CG28, and CPS III with CG19.CG19 includes six serotypes, with relatively high prevalence of CPS V in separate branches, indicative of multiple independent capsular switching events.Human-association of this lineage is reflected in the pattern of presence/absence of the three host-associated accessory gene clusters: scpB +, Lac.2-(with a few exceptions) and Locus 3-.Pilus island genes associated with this lineage are PI-1 and PI-2a.Recombination is present in SL19, some of which is shared with SL183 and SL1, but lineage-specific recombination blocks can also be observed (Fig2A).Capsular switching is observed in both SL, with one branch showing acquisition of CPS Ia.All isolates on this branch originate from Colombia.Bovinespecificity of these lineages is reflected in the pattern of presence/absence of the three host-associated accessory gene clusters: scpB -(with one exception), Lac.2+ and Locus 3-.Pilus island genes associated with these lineages are primarily PI-2b, with four genomes of SL91 also carrying PI-1, a combination otherwise mostly seen in SL17.All SL91 isolates with this PI profile originate from Italy.Recombination is present in SL61 and SL91, but it does not involve large recombination events, and it is mostly lineage-specific (Fig2A).Most genomes carry CPS Ia, except for a sub-clade of CPS III, which is mostly associated with human isolates.The lack of a strong host association is reflected in the mixed pattern of presence/absence of Lac.2.Surprisingly, scpB is mostly absent, despite isolation of SL103 and SL314 from humans, whereas Locus 3 is commonly present, even though this lineage is not represented among available fish GBS genomes.Pilus island genes associated with these lineages are primarily PI-2b.Recombination is almost absent from these two SL (Fig2A). of GBS have.Genomes in SL552 all carry CPS Ib (in one case the in silico CPS screening gave two best possible matches: Ib and III), and fishspecificity of this lineage is reflected in the pattern of presence/absence of the three host-associated accessory gene clusters: scpB -, Lac.2-and Locus 3+.Pilus island genes associated with this lineage belong to PI-2b, although PI-2bAP2 is absent from CG261.Recombination is absent from the entire sublineage (Fig2A).

Table C .
1. Bacterial strains and culture conditions used to generate Group B Streptococcus mutants.Table C.2. Oligonucleotide primers used for mutagenesis of Group B Streptococcus.
a c Survival analysis (SA) p-value.