Genomic epidemiology of Streptococcus pneumoniae serotype 16F lineages

Due to the emergence of non-vaccine serotypes in vaccinated populations, Streptococcus pneumoniae remains a major global health challenge despite advances in vaccine development. Serotype 16F is among the predominant non-vaccine serotypes identified among vaccinated infants in South Africa (SA). To characterize lineages and antimicrobial resistance in 16F isolates obtained from South Africa and place the local findings in a global context, we analysed 10 923  S . pneumoniae carriage isolates obtained from infants recruited as part of a broader SA birth cohort. We inferred serotype, resistance profile for penicillin, chloramphenicol, cotrimoxazole, erythromycin and tetracycline, and global pneumococcal sequence clusters (GPSCs) from genomic data. To ensure global representation, we also included S. pneumoniae carriage and disease isolates from the Global Pneumococcal Sequencing (GPS) project database (n=19 607, collected from 49 countries across 5 continents, 1995–2018, accessed 17 March 2022). Nine per cent (934/10923) of isolates obtained from infants in the Drakenstein community in SA and 2 %(419/19607) of genomes in the GPS dataset were serotype 16F. Serotype 16F isolates were from 28 different lineages of S. pneumoniae, with GPSC33 and GPSC46 having the highest proportion of serotype 16F isolates at 26 % (346/1353) and 53 % (716/1353), respectively. Serotype 16F isolates were identified globally, but most isolates were collected from Africa. GPSC33 was associated with carriage [OR (95 % CI) 0.24 (0.09–0.66); P=0.003], while GPSC46 was associated with disease [OR (95 % CI) 19.9 (2.56–906.50); P=0.0004]. Ten per cent (37/346) and 15 % (53/346) of isolates within GPSC33 had genes associated with resistance to penicillin and co-trimoxazole, respectively, and 18 % (128/716) of isolates within GPSC46 had genes associated with resistance to co-trimoxazole. Resistant isolates formed genetic clusters, which may suggest emerging resistant lineages. Serotype 16F lineages were common in southern Africa. Some of these lineages were associated with disease and resistance to penicillin and cotrimoxazole. We recommend continuous genomic surveillance to determine the long-term impact of serotype 16F lineages on vaccine efficacy and antimicrobial therapy globally. Investing in vaccine strategies that offer protection over a wide range of serotypes/lineages remains essential. This paper contains data hosted by Microreact.


INTRODUCTION
Childhood morbidity and mortality caused by Streptococcus pneumoniae remain a major global health challenge despite advances in antimicrobial treatment and vaccine development.In 2015, 294 000 HIV-uninfected infants and 23 300 HIV-infected infants were estimated to have died from pneumococcal disease globally [1].Pneumococcal conjugate vaccines (PCV) are the current recommended formulation for children and they target different serotypes that cause most invasive pneumococcal disease (IPD) [2].However, pneumococcal disease persists, in part due to the increase of disease caused by non-vaccine serotypes, a phenomenon called serotype replacement [3].In serotype replacement, the reduction of vaccine serotypes by PCV or/and antimicrobials creates room for the expansion of non-vaccine serotypes [3,4].These non-vaccine serotypes are also sometimes resistant to antimicrobial treatments, leading to an increased risk to public health [2].
The capsule of S. pneumoniae has been classified into 104 serotypes based on the reaction of a set of antisera against the capsular antigen [5].While serotype classification helps to identify virulent strains and subsequently inform vaccine development, it provides little information about the genetic shifts in S. pneumoniae strains, given that the cps locus (which encodes for the capsular polysaccharides) only accounts for 0.2 % of the genome [6].Multilocus sequence typing (MLST) was previously the gold standard for characterizing bacterial isolates, based on the genetic sequences of seven housekeeping genes, however, recombination in some of these genes and limited resolution inhibits its utility to infer relationships between strains [7,8].Whole-genome analysis offers the opportunity to classify strains that share an evolutionary history into lineages, allowing for inference of relationships between strains across a species [7].Furthermore, housekeeping genes represent a very small proportion of the genome, and are not necessarily representative of the relationship across the rest of the genome due to recombination within these genes [7].The Global Pneumococcal Sequencing (GPS) project, whose aim is to provide an international understanding of S. pneumoniae population structure and its impact on vaccine and treatment strategies, has therefore classified S. pneumoniae strains into >900 lineages [also known as global pneumococcal sequencing clusters (GPSCs)], and each of these lineages can contain a single or multiple serotypes [4,7,9].
In South Africa, where PCV13 is part of the expanded programme on immunization (EPI), serotype 16F has been reported to be among the predominant non-vaccine serotypes (NVTs) among fully vaccinated infants [10] and has been shown to contribute to invasive disease among children below the age of 3 years after the introduction of PCV13 [4].Using whole-genome sequencing, we characterized the major S. pneumoniae lineages containing serotype 16F in isolates from children in the Western Cape of South Africa and assess the public health relevance of serotype 16F lineages by describing their association with disease and antimicrobial susceptibility.

Study setting
We analysed 10 923 S. pneumoniae carriage isolates obtained from 1020 of 1143 infants enrolled from May 2012-September 2015 as part of a population-based, longitudinal prospective birth cohort study [the Drakenstein Child Health Study (DCHS)] in the Western Cape Province in South Africa [11].Both the parent study (401/2009) and this study (188/2017) received ethical approval from the Faculty of Health Sciences, Human Research Ethics Committee (HREC) of the University of Cape Town, South Africa.This study took place at two primary healthcare clinics located 2 km apart, i.e.TC Newman and Mbekweni.All infants received routine immunization including PCV13 as part of the national immunization programme, administered at 6 weeks, 14 weeks and 9 months of age through a 2+1 vaccine schedule.Nasopharyngeal (NP) swab collection was performed every 2 weeks for the first year of life as well as at 6, 12, 18 and 24 months, and whenever infants presented with pneumonia or lower respiratory tract infection (LRTI).Details on sample collection, transportation, culture and storage have been described previously [10].For global context, we included S. pneumoniae

Impact Statement
This study shows that serotype 16F lineages are predominant in southern Africa and are associated with disease and antimicrobial resistance.Although serotype 16F has been included in the newer formulation of the upcoming vaccine formulations of PCV21 and IVT-25, continuous surveillance to determine the long-term impact of serotype 16F lineages on vaccines and antimicrobial therapy remains essential.

Definition of carriage and disease and vaccine and non-vaccine serotypes
We define carriage isolates as those collected from healthy individuals and disease isolates as those from sputum or other specimen (i.e.blood, joint fluid, aspirates, etc.) from individuals with disease (including septicaemia, bacteraemia, pneumonia, cellulitis, meningitis, otitis media, bronchitis, osteomyelitis, septic arthritis, sinusitis, empyema, abscess, surgical site infection, sepsis, encephalitis, lower respiratory tract infection, conjunctivitis, bronchitis, peritonitis, pericarditis).In this paper, the vaccine serotypes are those included in PCV13, PCV15 and/or PCV20 (Table S1).

Sequencing and bioinformatics analysis
Single-colony picks of presumptive S. pneumoniae were inoculated onto Columbia blood agar base with 2 % agar, 5 % horse blood (BA) plates and incubated at 37 °C in 5 % CO 2 overnight.DNA was extracted and quantified as described previously [10].DNA was sequenced on an Illumina HiSeq platform at Wellcome Sanger Institute, generating ≥100 bp paired-end reads.The reads were assessed for quality, assembled, annotated and mapped as previously described [13].We inferred serotype using seroBA v1.0.2 and resistance profiles for amoxicillin, cefoxitin, ceftazidime, penicillin, chloramphenicol, clindamycin, cotrimoxazole, doxycycline, erythromycin, levofloxacin, meropenem, rifampicin, tetracycline and vancomycin from the genomic data using the Centers for Disease Control and Prevention (CDC) antimicrobial detection tool [14].Multidrug resistance was defined as predicted resistance to ≥ 3 antimicrobial classes.PopPUNK v.1.1.6was used to assign GPSC to the genomes [15].We mapped reads to GPSC-specific reference [GPSC 46 (accession number ERS628712) and GPSC33 accession number ERS566825)] sequences using BWA v 0.7.17 to create an alignment and then assessed for recombination within each GPSC using Gubbins v2.4.1.Phylogenetic trees for recombination-stripped alignments for each GPSC were generated using RAxML v 8.2.8.To investigate the emergence of serotype 16F lineages, we generated time trees using BactDating with a mixed gamma, relaxed clock model and visualized these trees using FigTree v.1.4.4 [16].Using Gubbins output, we calculated and compared recombination rates of different lineages as follows: sum of the recombination base substitutions across all branches divided by the sum of the point mutations across all branches.

Descriptive and statistical analysis
We described the proportion of isolates which were serotype 16F using the Drakenstein and GPS datasets.We reported the total number of isolates and serotype 16F isolates within each lineage.We determined the association between serotype 16F lineages and carriage or disease status.This analysis was limited to GPS data from South Africa to control for data collection practices, which varied amongst locations.We restricted data to isolates from children <7 years as pneumococcal disease is most common in this population.For each lineage, we compared the proportions of pneumococcal disease and carriage isolates to those of other lineages, reporting an odds ratio with 95 % confidence interval (CI) by Fisher's exact test.We calculated the frequency of antimicrobial resistance (AMR) for different classes of antimicrobials in each GPSC as follows: (total number of isolates with predicted antimicrobial resistance within a specific lineage/total number of isolates within that lineage) multiplied by 100.

RESULTS
The characteristics of isolates obtained from infants recruited from the Drakenstein community and isolates obtained from the GPS project are summarized in Table 1.

Global distribution and lineages associated with serotype 16F
We considered the overall geographical distribution of 1353 serotype 16F isolates from the DK and GPS datasets.Serotype 16F was identified in six continents, with the majority of 16F isolates collected from Africa [92 % (1248/1353)].The proportion of serotype 16F isolates from other continents were as follows: 2 % (59/1353) were from Asia, 1.8 % (25/1353) were from North America, 1.4 % (19/1353) were from South America and 0.15 % (2/1353) were from Europe (Fig. 3).Serotype 16F was present in 28 distinct lineages, with GPSC33 and GPSC46 having the highest proportion of 16F strains, at 26 % (346/1353) and 53 % (716/1353), respectively (Fig. 3).Geographical structure was observed within the African continent, with GPSC103 and GPSC274 more commonly detected in West Africa, GPSC46, GPSC47 and GPSC207 in southern Africa, GPSC268 in East Africa, and GPSC33 and GPSC114 on the south and east coasts of Africa.In North America, the predominant lineages were GPSC135 and GPSC165.GPSC18, GPSC156 and GPSC104 were predominant in Latin America, Europe and Asia, respectively.

Antimicrobial resistance across serotype 16F lineages
We assessed the presence of AMR genes in serotype 16F lineages (Fig. 4).Ten per cent (37/346) or 15 % (53/346) of isolates within GPSC33 had genes associated with resistance to penicillin or co-trimoxazole, respectively, while 18 % (128/716) of isolates within GPSC46 had genes associated with resistance to co-trimoxazole.All isolates (n=30) within the GPSC103 lineage had genes associated with resistance to cotrimoxazole, tetracycline and doxycycline.

Lineage analysis of GPSC33 and GPSC46
We explored the global phylogeny of GPSC33 and GPSC46 given that they are the predominant lineages of serotype 16F (Fig. 5).Israel (n=3), Papua New Guinea Russia (n=3) and the USA (n=3)].In GPSC33 there were two distinct clusters of isolates with penicillin resistance determinants and one cluster of isolates with cotrimoxazole resistance determinants; and in GPSC46, there was one cluster of isolates with cotrimoxazole resistance determinants (Fig. 4).

GPSC33 lineage consists
Compared to GPSC46, the time to the most recent common ancestor (tMRCA) for GPSC33 is recent; serotype 16F isolates within GPSC33 likely originated in 1908 (95 % HPD 1898-1917), compared with 1896 (95 % HPD 1795-1926) for GPSC46.For isolates with resistance that formed clusters -which suggests emerging resistant lineages -the estimated tMRCA for the cluster with penicillin resistance in GPSC33 is 1937 (95 % HPD 1927(95 % HPD -1946) ) and that for the cluster with cotrimoxazole resistance in GPSC46 is 1922 (95 % HPD 1821-1988).The tMRCA of lineages is between the 1920s and the 1930s, coinciding with the discovery of penicillin and cotrimoxazole, but prior to their widespread use [17,18].The tMRCA confidence intervals for the cluster with resistance to cotrimoxazole are wide, and antimicrobial use may have contributed to the emergence of this cluster.We compared the genetic variation through recombination (a process in which exogenous DNA is acquired and incorporated into the genome [19]) of GPSC33 and GPSC46.GPSC33 had a higher recombination ratio (i.e. the ratio of the number of recombination events to point mutations on a branch) at 8.2 compared to 4.9 for GPSC46, which could be why there are more clusters with AMR in this lineage.compared the association of serotype 16F isolates to other non-serotype 16F isolates within and GPSC46 to carriage and disease.This analysis included GPS project isolates from children <7 years old from South Africa.Isolates from the Drakenstein study were excluded due to the focus on carriage sampling.Compared to other serotypes within GPSC33 and GPSC46 lineages, serotype 16F isolates were associated with carriage in GPSC33 [OR (95 % CI) 0.24 (0.09-0.66);P=0.003], and with disease in GPSC46 [OR (95 % CI) 19.9 (2.56-906.50);P=0.0004] (Table 2).

DISCUSSION
Serotype replacement from non-vaccine serotypes continues to threaten the effectiveness of current pneumococcal vaccine interventions [11][12][13][14][15].However, pneumococcal surveillance, including the use of genomics, has been able to inform vaccine strategies [16].We show that serotype 16F remains a dominant non-vaccine type in both carriage and invasive disease in South Africa.We further contextualized our findings in a global context using the GPS Project Database [7], and we highlight that serotype 16F is prevalent in Africa.Serotype 16F lineages, that include GPSC33 and GPSC46, have been highly successful in the region and are associated with antimicrobial resistance and pneumococcal disease.Together, these data highlight that non-vaccine serotype 16F has become   important in South Africa, and therefore further surveillance is warranted to inform vaccine policy and expand vaccine valency.
We show that serotype 16F was the dominant serotype in our longitudinal carriage cohort, similar to other pneumococcal carriage surveillance findings from South Africa [17].Previous work in the region described significant decreases in PCV7 serotypes (19F, 6B, 23F and 14) and an increase in non-vaccine serotypes (16F, 34, 35B and 11A) among children <2 years of age colonized with the The tMRCA is reported in years with 95 % highest posterior density (HPD).The pink asterisk (*) corresponds to the ancestral node for isolates with antimicrobial resistance.The black asterisk (*) represents the root/ancestral node of entire phylogenetic tree.Link to access the GPSC33 phylogenetic tree is: GPSC33.Link to access the GPSC46 phylogenetic tree is: GPSC46.[17].Other regions, including West Africa [18] and Middle East [19], have described the predominance of serotype 16F in carriage and disease among children and adults following pneumococcal vaccine rollout.Here we show that lineage GPSC33 was one of the dominant lineages in our cohort, and the lineage has been described as mostly prevalent on the African continent [18].Longitudinal colonization studies have shown a typically high number of single-nucleotide polymorphisms among 16F isolates of the same GPSCs carried over multiple sampling visits, suggesting that divergent 16F strains may emerge over the course of carriage due to homologous recombination [20].Further work is required to understand the genotypic and phenotypic traits of successful lineages such as GPSC33.
Antimicrobial selection and horizontal gene transfer could potentially facilitate the expansion of resistant serotype 16F lineages in South Africa.We identified cotrimoxazole resistance sub-lineages in both GPSC33 and GPSC46.Although cotrimoxazole is not directly used to treat diseases caused by S. pneumoniae, it is widely used as a prophylaxis for infants who are HIV-exposed but uninfected [21], and adults living with HIV.Antimicrobial pressure due to cotrimoxazole use has therefore been suggested to contribute to the resistance patterns observed in the region, and therefore further use of the antimicrobial may facilitate the expansion of cotrimoxazole-resistant GPSC33 and GPSC46 sub-lineages.
Although we established that GPSC33 and GPSC46 lineages were generally susceptible to first-line antimicrobials used to treat pneumonia, such as penicillin, we identified penicillin-resistant GPSC33 sub-lineages, highlighting the potential risk of this lineage expanding and limiting antimicrobial treatment options in South Africa.Penicillin-resistant serotype 16F have been described in countries such as Japan following post-PCV7 rollout [22].Previous work on horizontal gene transfer has shown that serotypes that frequently colonize the human nasopharynx, including serotype 16F, acquire penicillin-binding protein gene fragments from Streptococcus mitis [23].The authors highlighted the presence of mosaic pbp2x among serotype 16F GPSC33 associated with reduced susceptibility [23], and modifications at the pbp2x gene are known to confer reduced susceptibility to a range of beta-lactam antimicrobials [24].Due to the high rates of recombination seen among 16F isolates in longitudinal carriage and the acquisition of mosaic pbp fragments from other species [23], we hypothesize that serotype 16F may develop higher levels of beta-lactam resistance with subsequent recombination in other important pbp genes (pbp2b and pbp1a).
Serotype 16F lineages are an important cause of invasive disease in the post-PCV13 era, particularly in countries such as South Africa.Our results show that not only is serotype 16F the predominant non-vaccine serotype carried among vaccinated infants, but serotype 16F lineages that include GPSC33 and GPSC46 cause invasive disease.In South Africa, serotype 16F has been associated with the second highest case fatality ratio after serotype 6A [25].Serotype 16F is an important cause of invasive disease in countries such as Ethiopia [26] and Denmark [27], and the serotype has increasingly become important among cases of meningitis through serotype replacement in Israel [19].Previous meta-analyses have shown increased case fatality rates associated with serotype 16F, along with serotypes 3, 6B, 9 N, 11A, 19F and 19A [28].Furthermore, seven studies found an increase in serious clinical outcomes attributed to serotype 16F, which is not included in the PCV13 formulation [28].It remains unclear why serotype 16F is associated with mortality, but infection among vulnerable populations may be of concern.There is a predominance of non-vaccine serotypes of the pneumococcus in carriage among the HIV-infected children in Ghana, particularly serotype 16F [29].Further genomic surveillance will be important to track serotypes and lineages that are associated with carriage, disease and elevated risk of serious outcomes for vaccine policy making and potentially expanding vaccine valency to include 16F.Serotype 16F is not included in the current PCV formulations (PCV10/13/15/20) approved for use in children but is included in upcoming formulations of PCV21 and IVT-25.Therefore, continuous surveillance to determine the long-term impact of serotype 16F lineages on vaccines and antimicrobial therapy remains essential.

Fig. 1 .
Fig. 1.Flow chart showing how samples and sequences were included from the Drakenstein community.

Fig. 3 .
Fig. 3. Global distribution of serotype 16F lineages.The pie charts show the proportions of serotype 16F lineages in different countries.GPSC, global pneumococcal sequencing clusters.Available on the Microreact serotype-16F .Table on the right summarizes the number of serotype 16F isolates within each lineage across different continents.GPSCs with fewer than five isolates (GPSC80, 313, 348, 386, 428, 434, 477, 485, 505, 542, 568, 583, 676, 846) are not shown on the table.One pneumococcal isolate from Oceania belonging to GPSC46 is not shown on the table.

Fig. 5 .
Fig. 5. Global phylogeny of (a) GPSC33 and (b) GPSC46.Serotype within each lineage is shown on the first column on the right of the tree.Countrieswhere the isolates within each lineage were obtained are shown on the second column on the right of the tree.Clinical manifestations of individuals from whom isolates were obtained are shown on the third column.AMR profiles of PEN (penicillin) and COT (cotrimoxazole) are shown on the fourth and fifth column on the right of the tree, respectively.I, isolates with intermediate resistance; R, isolates with resistance; S, isolates that are susceptible to antimicrobials.tMRCA, time to the most recent common ancestor.The tMRCA is reported in years with 95 % highest posterior density (HPD).The pink asterisk (*) corresponds to the ancestral node for isolates with antimicrobial resistance.The black asterisk (*) represents the root/ancestral node of entire phylogenetic tree.Link to access the GPSC33 phylogenetic tree is: GPSC33.Link to access the GPSC46 phylogenetic tree is: GPSC46.

Table 1 .
Characteristics of Streptococcus pneumoniae obtained from the Drakenstein community and GPS dataset *Isolates obtained from sputum collected at the time of lower respiratory tract infection from children participating in the Drakenstein Child Health Study were classified as disease isolates.(-) indicates no data.†This includes isolates whose disease or carriage status is unknown in the Global Pneumococcal Sequence (GPS) database.