Systematic review of Group B Streptococcal capsular types, sequence types and surface proteins as potential vaccine candidates

Highlights • Most comprehensive review of Group B Streptococcal serotypes through 2018.• First systematic review of Group B Streptococcal strain type and protein data.• Theoretically candidate vaccines may protect against 93-99% disease-causing strains.• More studies on GBS strains in low- and middle-income countries are needed.


Introduction
Deaths for children before their fifth birthday have reduced from an estimated 12.6 million child deaths in 1990 to 5.3 million deaths in 2017 [1]. This decline is in part attributed to high coverage of childhood immunisations [2], although infectious causes still account for at least a quarter of under-five child deaths. Almost half of child deaths worldwide occur in the neonatal period [2], and approximately two thirds within the first three months of life, so more attention is required to meet Sustainable Development Goals (SDG) by 2030 [3]. Hence there is more urgency in addressing the confluence of neonatal deaths and infections, with maternal immunisation as one strategy [4].
Intrapartum antibiotic prophylaxis (IAP) to reduce early-onset GBS disease (EOGBS), and in the United States of America (USA), implementation since 1990 has been associated with > 80% reduction in EOGBS incidence [15]. However, most low and lowermiddle-income countries (LMIC) do not have a specific IAP policy [16], and scale-up is likely to be challenging [17]. Additionally, IAP has no effect on reducing late-onset GBS disease (LOGBS) [13], and would not be expected to prevent GBS associated stillbirth or preterm birth, due to IAP administration is given after onset of labor and/or rupture of membranes. In contrast, an effective GBS vaccine could prevent invasive GBS disease across all atrisk population groups, including mother, fetus, infant, and the elderly or immunocompromised [11].
Several GBS vaccine candidates are in development, including multivalent GBS bacterial capsular polysaccharide (CPS) -CRM 197 conjugate vaccines [18], CPS-protein conjugates vaccines [19], and multivalent adjuvanted protein vaccines (NCT03807245). Placental transfer of anti-CPS specific GBS antibodies from the mother to the fetus reduces the risk of invasive GBS disease with evidence of protection against both EOGBS and LOGBS [20]. Multivalent CPSprotein conjugate vaccines induce an increased CPS-specific IgG response [21]. Since capsular-type switching is possible [22], there is also interest in developing GBS protein-based vaccines, and protein-based vaccines are undergoing preclinical studies [23] or Phase I/II clinical trials (NCT03807245). GBS isolates are classified by their CPS into ten serotypes: Ia, Ib, II-IX. Strains are assigned to a sequence type (ST), through multilocus sequence typing (MLST), based on allelic variation of seven housekeeping genes [24] further grouping similar allelic profiles into clonal complexes (CC). Molecular techniques, including MLST and whole-genome sequencing (WGS), have enabled better characterisation of GBS, and highlighted that different capsular serotypes are present within the same ST. The latter is especially important for capsule polysaccharide-based vaccines as certain sequence types have been more associated with GBS human disease, such as ST17. A strong association between ST17 and severe neonatal and young infant disease has been demonstrated [25][26][27]. Five major clonal complexes in humans (CC1, CC10, CC17, CC19, and CC23) are associated with colonisation and invasiveness of GBS [28][29][30]. GBS strains can also be classified on the basis of surface proteins, such as Alp family proteins, serine-rich repeat proteins, C5a peptidase, and pilus islands [31]. Proteins such as hvgA, Rib and pilus island proteins have also been associated with invasiveness of GBS strains [32][33][34][35].
Development of GBS vaccines should be informed by evidence from all over the world, considering all the relevant at-risk populations, and data not just on serotype distribution, but also strain types and conserved protein targets. This study aims to inform GBS vaccine design, based on systematic reviews and metaanalyses regarding GBS serotypes, considering geographical variation and time trends. We update literature searches [36] regarding serotype data for 1). maternal colonisation, 2). maternal invasive disease and 3). infant invasive disease. Additionally, we expand the previously covered at-risk populations to include 4). stillbirths and 5). disease in adults over 60 years old. We also expanded the searches remit from serotypes alone to cover sequence types and specific surface protein genes.

Case definitions
Definitions for maternal GBS colonisation, invasive GBS disease, EOGBS, LOGBS, and maternal invasive GBS disease, have been detailed previously [5,9,[36][37][38] (supplementary Table S1). For stillbirth, we used the World Health Organisation definition for international comparison and reporting (birth of a fetus with no signs of life at 28 weeks' gestation or weighing 1000 g), and the International Classification of Disease definition (birth of a fetus with no signs of life at 22 weeks or more gestation or weighing > 500 g) [39]. GBS invasive disease in older adults was defined after 60 years of age.

Search strategy and inclusion criteria
We performed systematic literature searches in Medline, Embase, Scopus, the World Health Organisation Library Information System (WHOLIS), and Literature in Health Sciences in Latin America and the Caribbean (LILACS). Searches were limited to humans with no language restrictions. All searches were to March 2019. For maternal colonisation and infant GBS disease, the literature searches were from 2017 to current, updating previously conducted literature reviews [5,37]. All searches included ''Group B Streptococcus" or ''Streptococcus agalactiae", combined with ''serotype" or with ''sequencing" or ''MLST". Specific search terms per database are in supplementary Table S2. Medical subject heading (MeSH) terms were used where possible. Snowballing identified additional studies. We included studies reporting serotypes, sequence types and protein expression either from observational studies or pooled laboratory samples, and presented proportions among cases. Inclusion and exclusion criteria are detailed in supplementary Table S3.
Database searches, screening for duplicates and titles for eligibility, and selection of abstracts were performed by FBJ for serotype data updates and KT, CCA, KLD and EJ for MLST/virulence factors. Assessment of full-length articles, and data extraction, was done by two independent investigators, FBJ and PP for serotype data and KT, KLD and EJ for MLST/virulence factors. If there was discrepancy between two reviewers, a third investigator made the final decision.

Data abstraction
Data were extracted into standardised Excel forms including: year of study, country, study site, study design, definitions used (for invasive GBS disease or EOGBS/LOGBS or stillbirth), age of patients from whom samples were taken, site of isolation (e.g. if vaginal, rectal or both, for invasive disease whether isolate was from blood or CSF cultures), and serotyping methods. United Nations SDG region classification was used for world regions [40]. For infant GBS disease studies, we abstracted serotype data from cases with sepsis or bacteraemia and meningitis, when available. For MLST data we abstracted additionally infant colonisation, and all adult invasive disease data (18yearsandolder). Reported GBS genotypic data were summarised based on most common ST and/or CC, number of isolates that were ST-17, detected virulence genes, if WGS was done, and the related serotypes with CCs, and presence of proteins of interest and pilus islands. Data collected from previous review/meta-analyses were input as previously reported.

Analyses
Data were imported to STATA version 14 software (StataCorp 2014, Texas) for meta-analyses. We used random-effects metaanalyses to estimate the proportion of each serotype with the number of isolates serotyped as the denominator, using the DerSimonian and Laird method for pooled proportion estimates with 95% confidence intervals [41]. Pooled estimates for each serotype were then transformed to percentages (pooled percentages) and adjusted (scaled up or down) to fit 100% for the total of all serotypes (adjusted percentages).
For each at-risk population group, we calculated proportions of six major clonal complexes (CC1, CC10, CC17, CC19, CC23 and CC12), of six surface proteins (alp1, alp2, alp3, alp4, alpha C and rib), and the three pilus islands alleles (PI-1, PI-2a, PI2b). Within reported CCs, surface proteins and pilus islands, we analysed ten serotypes distribution. We undertook a meta-analysis for ST17 proportion reported for each at-risk population group.
Regarding time trend analysis, studies were classified according to median year the samples were taken, into 4 time periods : pre-2001, 2001 to 2006, 2007 to 2012, and 2013 to 2018. Sensitivity analyses were done to assess significant changes in serotype distribution when excluding studies that only tested/reported < 5 serotypes.
Regional variation was observed in data availability by at-risk population groups. Maternal colonisation data were available in almost all regions (except Oceania and central Asia) (Supplementary Fig. S1). There were no other infant invasive disease studies reporting serotypes from southern Asia which accounts for almost half the world's births ( Supplementary Fig. S2). Studies reporting early and late onset infant sepsis and meningitis serotypes were only from eastern Asia (n = 3) and Europe (n = 3). Isolates causing maternal invasive disease were reported only from northern America and one study from China ( Supplementary Fig. S3). The four studies with data for GBS-associated stillbirth were from Kenya, Mozambique, South Africa and Canada ( Supplementary Fig. S4). For older adults, all data were from Europe and northern America, except for two studies from south-eastern Asia (both from Malaysia), and one study from Latin America (Argentina) ( Supplementary  Fig. S5).
Forty percent of maternal colonisation strains had Rib protein gene, from which 66% were serotype III (66%). Alpha C was also reported on 28% of strains and alp1/epsilon on 26% of strains, of which 59% were serotype Ia ( Supplementary Fig. S10). 87% of strains had at least one of alp 1, alp2/3, alpha C or Rib protein targets. The combination of PI-1 and PI-2a predominated among maternal colonisation strains with 38%, followed by only PI-2a in 32%. The biggest proportions of serotypes with pilus island protein genes were serotype Ia at 69% (PI-2a only) and serotype III at 85% of strains (PI-1 and PI-2b) (Supplementary Fig. S11).
Newborn/Infant colonisation (n = 159 isolates): Similar to maternal colonisation, CC19 was the commonest clonal complex identified with 39%, followed by CC23 with 23%, but less CC17 than maternal colonisation with only 5%. 60% of CC19 strains expressed the serotype III CPS and 52% of CC23 strains expressed Ia CPS (Supplementary Fig. S9). The pooled proportion of ST17 for infant colonisation isolates was 4% (95%CI: 0-11). Different from maternal colonisation, the most common surface proteins genes were alp 1/epsilon and alp2/alp3 with 27% each, followed by Rib with 23%, and alpha C (18%). 90% of strains with the Rib protein gene and 92% of strains with alp2/alp3 belonged to serotype III (Supplementary Fig. S10). 97% of the strains had at least one of alp1, alp2/3, alpha C or rib protein targets. There was only one study (n = 35) of pilus island protein genes from infant colonisation, showing 77% had PI-1 and PI-2a, followed by 14% strains with only PI-2a.
MLST data for maternal disease were available from only one study (n = 29 isolates) [49], reporting the commonest clonal complex as CC23 (41%) followed by CC17 (24%), where 83% of CC23 were serotype Ia and 86% of CC17 were serotype III (Supplementary Fig. S13). This study did not specify the number of strains that were ST17 or report surface proteins or pilus islands data.
Rib was the surface protein gene most likely to be reported in infant invasive disease (54%), particularly strains belonging to serotype III (95%) (Supplementary Fig. S16). Other proteins genes in infant invasive disease strains were alp1/epsilon with 17%, alpha C 16%, and alp2 and alp3 8% (61/754). 93% of the strains had at least one of alp1, alp2/3, alpha C or Rib protein targets. The combination of PI-1 and PI-2b was common in infant invasive strains (46%), from which 97% were serotype III, followed by the combination of PI-1 and PI-2a with 32% ( Supplementary Fig. S17). HvgA presence was analysed in six studies [22,25,[58][59][60], all of which were present in ST17 strains, and was associated with infant disease.
5. Invasive GBS disease in older adults (n = 2525 isolates): Serotype V was the most common with 25% (95%CI: 20-31) followed by serotype Ia with 23% (95%CI: [19][20][21][22][23][24][25][26][27], and serotype III 11% (95% CI:9-14) (Fig. 3A). Fig. 3D shows differences between regions in the distribution of GBS serotypes in the elderly population, though Latin America and the Caribbean results are based on one study (n = 9) and south-eastern Asia on two (n = 16). Nonetheless, in south-eastern Asia, similar to maternal colonisation (the only other risk group that has data from this region), serotypes VI to IX have a higher presence than in other regions, representing 31% (95%CI: 9-57) of the isolates (4 serotype VI and 1 serotype VII). In Europe and northern America and in south-eastern Asia invasive disease in the elderly was caused less by serotype III, compared to invasive disease in the infant, mother or stillborn infant.

Time trends for GBS serotypes
There has been an increase in the number of studies published on GBS serotypes in the last twenty years from 17 studies in pre-2001 to 93 studies in 2013-2018, especially for maternal colonisation and infant GBS disease in the last period. Although the majority of studies and isolates were from countries in the UN classified developed region [40], over time, the number of studies from all other regions have been increasing, to the point that in the last time period, there were more published studies from countries in other regions than from the developed region.
Only seven studies, three for maternal colonisation and four for infant invasive disease, presented longitudinal data for serotype distribution changes over time [55,[61][62][63][64][65][66]. In general, all studies described serotype variation over time but usually among the most common serotypes, for example in a South African study for infant invasive disease there were changes over a 10-year period with serotype Ia and III interchanging as the dominant serotypes for infant invasive disease [64]. Fig. 4 shows the distribution of GBS serotypes over the past decades according to the study periods of data collection (samples taken), for maternal colonisation, EOGBS and LOGBS.  For LOGBS serotypes fluctuate between time periods with no specific trends, always with serotype III predominating.

Sensitivity analysis
There were no differences between the serotype proportions of the main analysis (n = 198 studies) and the sensitivity analysis that excluded studies sampling or reporting only five or less serotypes (n = 161 studies) (Supplementary Fig. S18).

Discussion
This paper provides the most comprehensive worldwide review of GBS circulating serotypes, and the first systematic review, to our knowledge, on MLST data and proposed vaccine candidate pro-  teins, which is timely given likely investments in GBS vaccines. A GBS maternal vaccine if effective would reduce invasive disease after birth (the target of IAP) but also reduce a major burden of stillbirths, maternal bacteremias, and LOGBS, where IAP is not expected to be effective. Additionally, most of the current burden is in LMIC (notably Africa and South Asia) where IAP is unlikely to be feasible to scale equitably. Our review considers all the relevant at-risk populations: in addition to the previous review of GBS serotypes for colonised pregnant women, early and late onset infant sepsis and meningitis, and maternal invasive disease, [5,37,38] we include stillbirths, which are often omitted, despite 2.6 million annually most of which are preventable [67]. We also included adults > 60 years. Our top finding is that a hexavalent polysaccharide-protein conjugate vaccine (Ia, Ib, II, III, IV, V) has the potential to prevent up to 93% of worldwide maternal colonising isolates, 95% of maternal invasive GBS disease, 99% of GBS-associated stillbirth, and 99% of infant invasive GBS disease. Although evidence is still limited, a vaccine targeting maternal colonisation could provide additional protection against neonatal disease, and evaluation of this should be included in phase 2 studies. For maternal colonisation and maternal disease, the main serotypes across all regions were similar: Ia, III, and V. In EOGBS and LOGBS, serotype III dominated with 52% and 77%, respectively. An additional two serotypes (Ia, II) accounted for more than 15% of infant disease in the Americas, Europe, and East Asia, while serotypes Ia and V accounted for more than 20% of infant disease in Sub-Saharan Africa and Australia.
Elderly adult population was included for the first time, providing a novel picture of the circulating serotypes of GBS causing infection, and may inform potential benefit of including this atrisk group in use of a GBS vaccine. The commonest serotypes were V and Ia, accounting for nearly half of the disease in this populations, and fewer (14%) serotype III. Hence, a hexavalent vaccine could prevent up to 96% elderly invasive disease. We note that data were mainly from northern America and China.
MLST data were reported in only a quarter of the studies reviewed and even fewer (<5%) included WGS data. ST17 was more common in infant invasive disease, and although most ST17 isolates are serotype III, invasiveness of ST17 is independent of the capsular serotype [56]. It is clear that ST17 strains must be covered by any GBS vaccine candidate. Thus, sequence types, CC, and virulence factors associated with disease, add more targeted information rather than just the capsular polysaccharide. Additionally, analysis of the genotype through WGS may explore genetic recombination events such as capsular switching and mutations allowing GBS to become more virulent, which allows better observation of the potential bacterial population changes during and postvaccine implementation. Serotype replacement and serotype switching following vaccination is a known limitation of the use of polysaccharide-protein conjugate vaccine. Evidence has suggested that GBS could undergo capsular switching through horizontal transfer of the capsular locus [22,58,68], notably within CC17 where serotype III strains switched to express serotype IV capsule. Lopes et al., also studied such an event in an ST1 strain switching from serotype V to Ib capsule [69]. Capsular switching in GBS may still be a rare occurrence, yet an introduction of capsular serotype-based vaccines could create a greater selection pressure leading to serotype replacement in disease as seen with childhood pneumococcal conjugate vaccine (PCV) immunisation [70], specifically an increase in non-vaccine serotypes.
Protein antigen-based vaccines against GBS could provide an alternative to the multivalent polysaccharide-protein conjugate vaccines. Based on our findings, a protein-alum adjuvant vaccine, which contains the alp family surface proteins (alp1/epsilon, alp2/3, alpha C and Rib), has the potential to prevent up to 87% of maternal colonisation, 99% of adult invasive disease and 93% infant invasive disease (percentages of isolates with at least one of the four protein targets). In addition to the alp protein-based vaccines, the pilus proteins have been proposed as potential vaccine candidates, albeit with conflicting data on whether it has potential as a vaccine candidate [31,79]. Our findings require confirmation from ongoing seroepidemiological studies.
A strength of our review is containing serotype data for many countries (n = 62), including published and unpublished data, with notably increasing data from China, South Africa, and Iran. Additionally, the large number of studies (n = 198) and isolates (n = 29,247) spanning 20-years allowed us to compare serotype distribution by developed and other regions, to try to mitigate potential bias due to specific geographical mix of countries in each time period, whilst examining time trends.
Time trends however should be interpreted carefully as it is uncertain if the reported changes in proportions are due to one serotype becoming less/more common or if the changes are relative to other serotype changes or capture, since differences in proportions are due to different denominators. Longitudinal studies in a single population have shown fluctuations but mainly between dominant serotypes [64]. Changes in serotyping methods, such as an increase in the use of molecular methods (PCR and sequencing) with time, could also bias the trend analysis. PCR and sequencing use genetic targets to identify serotypes, hence GBS that either do not have a capsule or have a poorly expressed capsule can now be serotyped. Studies that used molecular methods had lower percentage of nontypeable strains (supplementary Table S7). Methodological changes could explain apparent reduction of non-typeable isolates over time, as PCR and sequencing use increased in the period 2013-2018 compared to previous periods. Additionally, time period classification was by years of data collection, but the literature search for maternal colonisation studies from developed countries and for infant disease studies was limited to publications after the year 2000, for reason previously explained [5,37], which could introduce some bias for results pre-2001 for developed countries.
Another important limitation are data gaps for some regions. Although we were able to increase the geographical representation compared to our last review [5,37,36], there is still a paucity of data especially on invasive disease from southern Asia. However, data from maternal colonisation isolates from Asia can give some insight into circulating GBS serotypes, although these are not necessarily the same dominant serotypes as those in invasive infant disease [80]. There are important data gaps for some population groups, notably stillbirths, maternal invasive disease and adults > 60 yrs. Most data came from developed regions. For example, results for maternal invasive disease were mainly from the USA (4/6 studies), where the most common maternal colonising serotype is Ia. Protein expression was limited to few studies and these may therefore not be representative of the global situation.

Conclusion
GBS contributes a large burden of neonatal and infant disease, particularly in low-and middle-income countries (LMIC), yet GBS disease also has an under-recognised burden among pregnant/postnatal women and stillbirths, as well as the elderly. Access to IAP is low in LMIC, where maternal vaccination strategies may be a high impact, is a more feasible alternative [11]. More studies on GBS strains to inform vaccine developers are needed to fill in the data gaps, especially LMIC and for neglected, populations such as stillbirths. MLST/WGS data help inform which ST/CC and proteins are causing disease and add value beyond considering the serotype alone. Regular systematic compilation of data on GBS cases, serotypes and sequence types are needed, ideally embedded in routine systems for perinatal outcomes. These data are important to guide vaccine development, but improved routine data monitoring post-vaccine licensure will be key to ensure progress for the poorest, who are currently most likely to be left uncounted.

Summary
Group B Streptococcus (GBS) maternal carriage affects an estimated 21 million women worldwide with several GBS vaccines in development. We summarise data regarding serotypes, sequence types and virulence markers to inform vaccine design and implementation, considering at-risk populations, geographical variations and time trends.

Funding
This work was funded only by Bill & Melinda Gates Foundation from the United Stated of America and the grant number is OPP1180644, grant given to the London School of Hygiene and Tropical Medicine.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.