Genomic surveillance of invasive Streptococcus pneumoniae isolates in the period pre-PCV10 and post-PCV10 introduction in Brazil

In 2010, Brazil introduced the 10-valent pneumococcal conjugate vaccine (PCV10) into the national children’s immunization programme. This study describes the genetic characteristics of invasive Streptococcus pneumoniae isolates before and after PCV10 introduction. A subset of 466 [pre-PCV10 (2008–2009): n=232, post-PCV10 (2012–2013): n=234;<5 years old: n=310, ≥5 years old: n=156] pneumococcal isolates, collected through national laboratory surveillance, were whole-genome sequenced (WGS) to determine serotype, pilus locus, antimicrobial resistance and genetic lineages. Following PCV10 introduction, in the <5 years age group, non-vaccine serotypes (NVT) serotype 3 and serotype 19A were the most frequent, and serotypes 12F, 8 and 9 N in the ≥5 years old group. The study identified 65 Global Pneumococcal Sequence Clusters (GPSCs): 49 (88 %) were GPSCs previously described and 16 (12 %) were Brazilian clusters. In total, 36 GPSCs (55 %) were NVT lineages, 18 (28 %) vaccine serotypes (VT) and 11 (17 %) were both VT and NVT lineages. In both sampling periods, the most frequent lineage was GPSC6 (CC156, serotypes 14/9V). In the <5 years old group, a decrease in penicillin (P=0.0123) and cotrimoxazole (P<0.0001) resistance and an increase in tetracycline (P=0.019) were observed. Penicillin nonsusceptibility was predicted in 40 % of the isolates; 127 PBP combinations were identified (51 predicted MIC≥0.125 mg l−1); cotrimoxazole (folA and/or folP alterations), macrolide (mef and/or ermB) and tetracycline (tetM, tetO or tetS/M) resistance were predicted in 63, 13 and 21.6 % of pneumococci studied, respectively. The main lineages associated with multidrug resistance in the post-PCV10 period were composed of NVT, GPSC1 (CC320, serotype 19A), and GPSC47 (ST386, serotype 6C). The study provides a baseline for future comparisons and identified important NVT lineages in the post-PCV10 period in Brazil.


INTRODUCTION
Streptococcus pneumoniae is the main cause of otitis media and community-acquired pneumonia as well as invasive pneumococcal disease (IPD), including meningitis, sepsis and bacteremia [1]. Antimicrobials and vaccines are tools currently available to treat and prevent pneumococcal diseases, respectively.

ACCESS
The indiscriminate use of antimicrobial agents in community settings results in the selection of resistant pneumococcal strains and impacts IPD by resulting in antibiotic treatment failure [2]. In addition, pressure due to vaccination can change resistance patterns temporally and geographically [1].
Pneumococcal conjugate vaccines (PCV, 7-valent PCV, 10-valent PCV, and 13-valent PCV) are highly effective in preventing IPD caused by serotypes present in its composition [3,4]. Brazil was the first country to introduce the 10-valent pneumococcal conjugate vaccine (PCV10, target serotypes 1, 4, 5, 6B, 7F, 9V, 14, 18C, 19F and 23F) into their national childhood immunization programme in March 2010 [5]. The vaccine schedule was three primary doses at ages 2, 4 and 6 months and a booster dose for children aged 12-15 months. During the first year of PCV10 introduction, a catch-up campaign with two primary doses for children at 7 to 11 months of age plus a booster dose at 12-15 months, and a single dose for children aged 12 to <24 months was adopted [5]. In 2016, the primary schedule was changed to two primary doses at ages 4 and 6 months and a booster dose at 12 months [6].
It is well documented that PCV introduction was followed by changes in S. pneumoniae epidemiology due to (i) vaccine pressure leading to evidence of serotype replacement by the capsular switch, (ii) through the expansion of common strains and (iii) by increases in newly emerging non-vaccine type strains [3,4]; as well as reductions in the transmission of vaccine types (VT) resulting in the indirect effect of herd immunity in the unvaccinated population. Therefore, surveillance is essential to detect potential temporal changes in the epidemiological and genetic characteristics of pneumococcal isolates [1,7,8]. Genomic methods have proved to be excellent tools for understanding the biology and epidemiology of important bacterial pathogens, including S. pneumoniae. Multi-locus sequence typing (MLST) is a molecular method historically important, and still widely used in pneumococcal epidemiology that consists of sequencing seven housekeeping genes as a sample of genomic variation and is used to define the sequence type (ST) and clonal complexes (CCs) [1]. This present study aimed to describe the genetic characteristics of invasive pneumococcal disease (IPD) isolates sampled from ongoing routine laboratory surveillance in Brazil during the pre-(2008-2009) and post-(2012-2013) PCV10 introduction periods. The whole-genome sequence data was used for in silico analysis of the serotype, antimicrobial resistance predictions, to identify possible changes in genetic lineages, and circulation of multi-drug-resistant clones following PCV10 introduction. The data generated will provide a baseline for continued vaccine impact monitoring and support future vaccination strategies for pneumococcal disease control.

Bacterial strain collection
The study collection consisted of a random subset of 466 IPD isolates recovered through a national laboratory surveillance network led by Institute Adolfo Lutz (IAL), the Brazilian National Reference Laboratory for Meningitis and Pneumococcal Infections. This study included pneumococcal isolates from pre-(n=232, 2008-2009) and post-(n=234, 2012-2013) PCV10 introduction periods collected in 20 of 26 Brazilian States (Table S1, available in the online version of this article) and corresponding to 14 % (n=466/3342) of the total S. pneumoniae isolates received by IAL in the period analysed. Isolates from the years 2010 and 2011 corresponding to the first years of PCV10 introduction were excluded from the study. Table  S2 shows the pneumococcal study collection isolates stratified by age groups, clinical diagnosis and vaccine periods. Brandileone and collaborators [9] included in their publication the detailed phenotypical analyses of the pneumococcal serotypes that caused IPD before and after the introduction of PCV10 using data from the laboratory surveillance system in Brazil from a larger period (2005 to 2015) of time. From this dataset, data of 3342 isolates corresponding to the periods from 2008 to 2009 and 2012 to 2013 were used in our study as a basis for selection and comparison of serotype distribution of the 466 isolates whole-genome sequencing (WGS) subset (Figs S1 and S2). We presented serotype data for the entire collection of 3342 isolates and restrict other analyses to the 466 randomly sampled WGS.
The IAL receives strains previously identified as S. pneumoniae by the laboratory of origin and confirms this identification using classical methodologies described by WHO [10]. For routine surveillance, serotyping was performed by Quellung reaction and antimicrobial susceptibility profiles were determined by the disc diffusion and/or broth microdilution to determine minimal inhibitory concentrations (MIC) according to Clinical Laboratory Standards Institute (CLSI) breakpoints [11][12][13].

Genome sequencing and analyses
The 466 IPD isolates were WGS on the Illumina HiSeq platform to produce paired-end reads of 150 base pairs in length and raw data were deposited in the European Nucleotide Archive (ENA) (Supplementary Material: Data_ Summary_ GPS_ Brazil. xlsx). WGS data were processed as previously described [14]. We derived the virulence factors (serotype

Impact Statement
This study, based on WGS analysis, makes several noteworthy contributions to understanding the genetic structure of the S. pneumoniae population in Brazil. We analysed genomic data from invasive pneumococcal isolates collected in Brazil between 2008 and 2013 to provide a detailed description of the population structure during that sampling period. We identified globally spreading lineages that also included non-vaccine serotype components, indicating that they potentially might contribute to vaccine evasion. The data generated by this study can be used as a baseline to determine vaccine impact during the following years.
The genetic structure was defined by assigning the clonal complexes (CCs) from the STs previously described by the Global Pneumococcal Sequencing Project (GPS) [14] and also by assigning Global Pneumococcal Sequence Cluster (GPSC) on each isolate using a PopPUNK [18], along with a reference list of pneumococcal isolates (n=13 454) in the GPS database (https://www. pneumogen. net/ gps/ assigningGPSCs. html). The STs and GPSCs described in this study were deposited in the PubMLST (https:// pubmlst. org/ organisms/ streptococcuspneumoniae) and GPS databases (https://www. pneumogen. net/ gps/ assigningGPSCs. html), respectively. Phylogenetic analysis was performed on all Brazilian isolates in this study by constructing a maximum-likelihood tree using FastTree [19]. In brief, the tree was built upon a SNP alignment after mapping reads to the reference genome of S. pneumoniae ATCC 700669 (NCBI accession number FM211187) using Burroughs Wheeler Aligner (BWA). Capsular or serotype switching was identified in isolates with identical ST but different serotypes in this study. For each ST, we examined the genetic relatedness of isolates in lineage-specific phylogenies and place the Brazilian lineage of interest in a global context by including other GPS published isolates belonging to the same GPSC [20]. The lineagespecific tree was constructed using GUBBINS [21]. In brief, GUBBINS detects recombination regions and removes them when constructing the phylogeny. The recombination-free phylogeny created by GUBBINS was used as input for Bact-Dating [22], an R package used to create a time-measured phylogeny performing Bayesian dating inference of the nodes on the bacterial phylogenetic tree; typically involves simultaneous Bayesian estimation of the molecular clock rate and coalescent rate as previously described [20]. The timemeasured tree was used to estimate the period when capsular switching occurred.

Statistical analyses
The pneumococcal isolates were defined as vaccine serotype (VT) when isolates belonged to predicted serotypes included in PCV10 (1, 4, 5, 6B, 7F, 9V, 14, 18C, 19F and 23F), and as non-vaccine serotypes (NVT) for the predicted serotypes non-PCV10, including the additional PCV13 serotypes 3, 6A and 19A. We defined the status of a lineage (GPSC) as VT (100 % PCV10 serotypes), NVT (100 % non-PCV10 serotypes) and GPSC with both VT and NVT isolates, based on its serotype composition detected in the whole study period. The prevalence of in silico serotypes was stratified by age groups (<5 years old and ≥5 years old).
As the population denominators are unavailable, we evaluated significant changes of VT/NVT in each GPSC lineage in proportion to all VT/NVT, respectively, using Fisher's exact test. This calculation was performed to avoid the overestimation of the NVT increase. Overall, and by GPSC, the prevalence of antibiotic resistance between vaccine periods was also detected using Fisher's exact test. Two-sided P-values of <0.05 were considered statistically significant. The number of samples was calculated to achieve 80 % of statistical power with a significant level of P-values. Before using the Fisher's exact tests to compare variables (e.g. VT or penicillin resistance) before and after the PCV10 period, we calculated the number of samples that we need to achieve an 80 % statistical power with a significant level of P-value<0.05 using the R package pwr, which contains functions for basic power calculation [27]. When the variables (VT and/or antibiotic resistance) would not have sufficient statistical power to be tested, Fisher's exact test was not performed. Multiple testing was adjusted using the Benjamin-Hochberg false discovery rate of 5 %, the statistical analysis was carried out in R version 3.5.2, and R scripts used for analyses were deposited at GitHub (https:// github. com/ StephanieWLo/ Genomic-Surveillance).

Serotype distribution
No discrepancies were observed between the 466 predicted serotypes and the Quellung results, and the frequency of the predicted serotypes in our study subset reflected the frequency of the serotypes identified in the larger 3342 isolates' collection. Figs S1 and S2 show serotype distribution by vaccine period (pre-PCV10 and post-PCV10), for age groups <5 years and ≥5 years, and for the larger and subset collections included in this WGS study.
As expected, a higher number of VT isolates was observed in the pre-PCV10 period mainly in the <5 years while NVT (including the additional PCV13, 3, 6A and 19A) were more frequent in the post-PCV10 period (Fig. S1). Before vaccination, serotype 14 was highly common in both age groups and after vaccine introduction, serotype 3 was most frequent in children aged <5 years, followed by serotype 19A, 6A, 12F and 6C (Fig. S1), and in ≥5 years serotype 12F, 3, as well the serotypes 8 and 9 N (Fig. S2).
In comparison with the pre-PCV10 period, we detected any significant changes in the frequency of VT and NVT within GPSC, but we do not have sufficient statistical power to detect changes in each serotype. However, isolates associated with VT GPSCs decreased from 48-20 % (P<0.0001) and 32-24 % (P=0.2871) in the age groups <5 and ≥5 years old respectively, while isolates belonging to NVT GPSCs increased from 17-39 % (P<0.0001) and 34-52 % (P=0.0246). The five most frequent GPSCs by age groups are listed in Tables 1 and 2. GPSC6 and GPSC16 were among the top lineages in both age groups, pre-PCV10 and post-PCV10 periods. Though GPSC6, composed of CC156 and VT 9V and 14, remained a predominant lineage during the whole period of study and showed a decreasing trend among the <5 years old group (Fig. S4). In contrast, GPSC16 lineage persisted with the NVT components 9 N and 15A frequently observed in the post-PCV10 period. In the post-PCV10 NVT lineages were mainly associated with children aged <5 years; GPSC1 (CC320, serotype 19A), GPSC12 (CC180, serotype 3) and GPSC51 (CC458, serotype 3) (Fig. 1, Table 1). In the ≥5 years old group, GPSC3 expressing serotypes 8 (CC53) and 11A (CC62) became the predominant lineage 2-3 years after PCV10 introduction, though it was not in the top five lineages before vaccine roll-out (Fig. 2, Table 2).

Capsular switch variants
Among the STs identified, nine occurred in more than one serotype and are suggestive of capsular switching events: STs 66,156,193,199,338 and 386; four are NVT switches and were observed in the post-PCV10 period (ST66 serotype 9 N, ST 199 serotypes 19A/15B/15C, ST338 serotype 15B/15C, and ST386 serotype 6C).
The ST386 (GPSC47) was represented by serotypes 6B in the pre-PCV10 period and 6C in the post-PCV10 period (Fig. 1) and a time-measured phylogeny of GPSC47 using isolates from the GPS database and including the ST386 of this study, showed serotype 6C grouped separately from the ST386 serotype 6B isolates (Fig. 3). Using BactDating software we estimated a possible capsular switch may have occurred between serotypes 6B and 6C isolates around 1994 (95 % confidence interval: 1990-1997). This suggests the capsular switch event occurred in the pre-PCV era leading to the selection of the NVT lineage GPSC47 (ST386, serotype 6C) over the VT lineage GPSC47 (ST386, serotype 6B) following PCV introduction.

Antimicrobial resistance
The The antimicrobial non-susceptibility patterns between preand post-PCV10 periods showed a significant increase of tetracycline resistance (P=0.0019) and a decrease of penicillin (P=0.0123) and cotrimoxazole resistance (P<0.0001) among isolates from children aged <5 years after vaccination. No significant difference (P≥0.05) was observed in the frequency of predicted non-susceptibility to chloramphenicol, erythromycin and MDR isolates from <5 years, as well for all studied antibiotics in the isolates from ≥5 years (  NVT and capsular switching described in our study (Table  S4).
The frequency of the gene combination ermB and mef increased from 2 (1 %) to 11 (7 %) isolates in the pre-PCV10 vs post-PCV10 period in the <5 years old and was associated with GPSC1 (CC320, serotype 19A). Full resistance to cotrimoxazole (MIC ≥4 mg l −1 ) was characterized for the presence of alterations in genes folA (all isolates have the I100L substitution) and folP, which were detected in a large proportion of the isolates (n=210, 46 %). As previously shown [16], a mutation within folA or folP alone conferred intermediate cotrimoxazole resistance, while mutations within both folA (I100L) and folP (1-2 codon insertions) conferred full resistance. For tetracycline, the most frequent resistance gene detected was tetM (n=97, 21 %), mainly identified in the post-PCV10 period (n=62/234, 26 %) and associated with GPSC1 (CC320, serotype 19A) and GPSC16 (CC66, serotype 9 N). Additionally, we observed lower frequencies of tetO and the combination of the tetS and tetM genes also predicting tetracycline resistance. The cat gene conferring chloramphenicol resistance, substitutions in the rpoB gene (P15A, H21N, or K22N) that predict rifampicin resistance (MIC >2 mg l −1 ) and substitutions in the parC gene (S79C, S29F, or S79Y) predicting fluoroquinolone resistance were identified in only a few isolates (Table 4).
This study observed MDR in 57 (12.2 %) pneumococcal isolates, 22 (9.5 %) in the pre-PCV10, and 35 (14.9 %) in the post-PCV10 period (Table 3). Figs 1 and 2 illustrated the overall resistance among the GPSCs in each age group. The <5 years old group presented higher levels of MDR associated mainly with GPSCs 1, 10, 23, 47 and 341. Focusing on the post-PCV10 period, 12 isolates were associated with the lineage GPSC1 (CC320, serotype 19A) with a profile of high resistance to penicillin (MIC=4 mg l −1 ) plus resistance to cotrimoxazole, erythromycin, and tetracycline; and five related to GPSC47 (ST386, serotype 6C) with a profile of lower resistance to penicillin (MIC=0.125 mg l −1 ) and resistance to erythromycin and tetracycline.

DISCUSSION
Our study analysed the genomic characteristics of a select subset of invasive pneumococcus strains obtained through national laboratory-based surveillance in Brazil. The distribution of serotypes predicted by our WGS study represented as closely as possible the national serotype distribution in the pre-(2008-2009) and post-PCV10 (2012-2013) periods. As expected, a lower prevalence of VT in the post-PCV10 period was due to the substantial PCV10 impact on IPD in the country [28,29] suggesting that even in a short period after the introduction of the vaccine, it was possible to observe the phenomenon of herd immunity since this VT reduction was also observed in the group of ≥5 years old age, who are not targeted for PCV10 vaccination. However, some increase in NVTs was also documented. A previous Brazilian study [9] analysed a large collection (n=8971) of IPD isolates and compared the prevalence of VT and NVT over a longer period, pre-PCV10 (2005-2009) and post-PCV10 (2010-2015), and showed a large IPD VT reduction among children and adult population, documenting a direct and indirect vaccine effect. They also showed a change to NVT as the main cause of the IPD in the post-PCV10 period and concluded that in Brazil there is evidence of cross-protection between serotypes 6B/6A, a fact not observed among the serotypes 6B/6C and 19F/19A. In the post-PCV10 period, our study observed the NVT 3, 19A, 6A, 12F and 6C in the <5 years old age group, and serotypes 12F, 3, 8 and 9 N in the ≥5 years old age group; suggesting, as observed by Brandileone et al. [9], an absence of cross-protection between the serotypes 6B and 19F present in the PCV10 composition and the serotypes 6C and 19A, respectively, and revealing that the burden of pneumococcal disease could be further reduced in the country with the introduction in the national childhood immunization programme of PCV13 or other new generation of PCVs (new-PCV10, PCV15 and PCV20), which include the serotypes 3, 6A (with cross-protection to serotype 6C), and 19A in their composition. Bacterial molecular typing is essential in the surveillance of infectious diseases [20]. This study characterized the baseline population pneumococcal structure for continued vaccine impact monitoring using whole-genome sequencing. Genome data not only allow us to extract public health-relevant data (e.g. serotype and antibiotic resistance profile) from a single experiment but also delineates genetic lineages using both whole-genome clustering method (GPSC) and multi-locus sequencing typing (MLST). Our findings showed a good concordance between these two typing methods. The wholegenome clustering method has further revealed the relationships between strains over a longer timescale by accounting for genetic variations across the whole genome [14]. The GPSCs characterization of the 466 IPD isolates presented a similar genetic structure to the globally GPSC described by the GPS project [14] with the majority of the isolates belonging Fig. 3. A timed-measured phylogeny of GPSC47 (ST386) isolates from Brazil and the other 15 countries from Gladstone et al. [14]. The phylogeny is built using BactDating [22] with 100 000 000 generations on a recombination-free SNP alignment generated by GUBBINs [21].  to these previously described GPSCs. The study collection showed the majority of the GPSC lineages belonged to NVT lineages and a smaller proportion of lineages expressing both VT and NVT serotypes. The post-PCV10 period was marked by the increase in NVT lineages for both the <5 and ≥5 years old age groups. We observed the emergence of NVT lineages, GPSC1 (CC320, serotype 19A), GPSC12 (CC180, serotype 3) and GPSC51 (CC458, serotype 3) in <5 years old and GPSC3 (CC53, serotype 8 and CC62, serotype 11A) in ≥5 years old age groups, highlighting the importance of expanding the PCV coverage for a higher valence vaccine such as PCV13 or others that are still in development (new-PCV10, PCV15 and PCV20) would be useful to further reduce pneumococcal diseases in Brazil.
The molecular characterization of isolates enabled the identification of several possible capsular switching events from VT to NVT. Initial molecular studies in the serotype 6C reported its origin related to independent recombination events involved isolates from serotype 6A [59], but after that other studies reported possible recombinant events from other serotypes like 6B [60,61] suggesting multiple genetic origins for serotype 6C. In our study, we estimated a switch from serotype 6B to 6C occurred in the GPSC47 (ST386) before vaccine implementation with an expansion of the serotype 6C clone in the post-PCV10. This clonal expansion correlates with previous data from Lo et al. [37] that suggests serotype replacement is mostly mediated by expansion of NVT within VT lineages following vaccine implementation.
The use of PCVs in routine immunization has resulted in a significant effect on the prevalence of antimicrobial resistance, as their formulations include serotypes mostly associated with penicillin and multidrug resistance [62][63][64]. Beta-lactams are widely used and generally effective for the treatment of pneumococcal infections. Penicillin is recommended to treat non-meningitis pneumococcal infection caused by strains with penicillin MIC <8 mg l −1 , instead of broad-spectrum antimicrobials such as the third-generation cephalosporin [65]. In Brazil, the use of third-generation cephalosporins is the standard choice for the empiric treatment of meningitis independent of the antimicrobial susceptibility testing results and the combination of cephalosporin and vancomycin has been used in cases of failure to respond to initial treatment. In the present study, we observed significant reductions in penicillin and cotrimoxazole resistance rates and increases in the frequency of tetracycline resistance in the post-PCV10 period for the <5 years old group. We identified resistance determinants by WGS commonly conferring resistance to penicillin, macrolides, cotrimoxazole, tetracycline and chloramphenicol [45]. In concordance, a recent Brazilian study [2] observed a reduction of isolates expressing penicillin MIC ≥0.125 mg l −1 in the first 3 years of post-PCV10 introduction (2011 to 2013), with high rates of cotrimoxazole non-susceptibility found during the study years (2007 to 2019), but showing a declining trend after PCV10 implementation, and a gradual increase of non-susceptibility to erythromycin and tetracycline over the study, reaching high rates in the years 2017-2019. We demonstrated that the most frequent lineages related to MDR in the post-PCV10 were NVT GPSC1 (CC320, serotype 19A) with high resistance to penicillin (MIC=4 mg l −1 ), cotrimoxazole, erythromycin and tetracycline, and the single lineage with the presence of the pilus islet PI-1 and PI-2; and the GPSC47 (ST386, serotype 6C) with lower resistance to penicillin (MIC=0.125 mg l −1 ) and resistance to erythromycin and tetracycline. We recommend continued genomic surveillance for long-term monitoring following data presented by Brandileone et al. [2] showing how the early impact of PCV10 in reducing non-susceptibility to beta-lactam antibiotics was eroded by increases in penicillin resistance, mainly associated with NVT S. pneumoniae, and reaching the highest rates in the years 2017-2019.
In addition to the capsular polysaccharide associated with nasopharyngeal colonization, studies show that S. pneumoniae has pilus structure that is involved in the adhesion and invasion of the bacteria in human respiratory epithelial cells [66,67]. Some studies demonstrated the association of antimicrobial resistance and pili presence, suggesting the pili structure may have a role in the spread of these antimicrobial-resistant lineages [66,68,69]. One-third of our isolates had target sequences for the pilus with the majority PI-1 type (74%) and associated with penicillin resistance GPSC6 (CC156, serotype 14) lineage. A recent review [66] analysing the role of the pilus islet in S. pneumoniae showed similar overall rates of pili with the predominance of PI-1, presence of PI-1and PI-2 in CC320 serotype 19A lineage, and the association of these genes with antimicrobial and MDR. The presence of the pilus islets PI-1and PI-2 in the MDR GPSC1 (CC320, serotype 19A) lineage may provide an additional advantage for these isolates as they are thought to enhance adherence and colonization [66]. The fact that this lineage is also primarily MDR could explain the success in the establishment of this NVT lineage in the post-PCV10 period.
Despite the limitation that our WGS study was only performed on a subset of the invasive isolates from the pneumococcal laboratory-based surveillance system in Brazil, we did show that the sampling was representative of the overall collection of isolates. We also provided serotype data available on the entire collection and used the subset to identify major antibiotic resistance mechanisms and important genetic lineages in the post-PCV10 period, highlighting the importance of specific NVT genetic lineages in the post-PCV10 period.
Even with the global widespread use of the PCVs, S. pneumoniae remains a major bacterial cause of community-acquired pneumonia [70] and one of the main bacterial agents associated with viral co-infections. Since the first great influenza pandemic in 1918 [71], followed almost a century later in 2009 by H1N1 [72] and currently the worldwide COVID-19 pandemic [73] highlights continued surveillance and monitoring of S. pneumoniae as a priority. This study provides detailed genomic data of invasive pneumococcal isolates from national surveillance in Brazil, generating a baseline that can help for the creation of long-term surveillance to monitor the vaccine impact and public health strategies.

Funding information
This study was co-funded by the Bill and Melinda Gates Foundation (grant code OPP1034556), the Wellcome Sanger Institute (core Wellcome grants 098051 and 206194), and the USA Centres for Disease Control and Prevention. The funding sources had no role in isolate selection, analysis, or data interpretation.

Acknowledgements
We would like to thank all microbiologists from the laboratories in the Brazilian States for sending the invasive isolates of S. pneumoniae to the IAL and for supporting IPD surveillance in the country. At the IAL, our thanks to the laboratory staff: Lincoln S. do Prado, Maria Luiza L. S. Guerra, Rosemeire C. Almendros, and Ueslei J. Dias for carrying out the phenotypic characterization of the isolates. We would like to thank all members of the Global Pneumococcal Sequencing Consortium ( www. pneumogen. net/ gps/ partners. html) for creating a rich database, the team at CDC for DNA extraction, the Wellcome Sanger Institute sequencing facility for whole-genome sequencing, and the Pathogen Informatics Team at the Wellcome Sanger Institute for technical support of bioinformatic analyses. The corresponding author had full access to the data and is responsible for the final decision to submit for publication.

Ethical statement
Isolates for this study were selected from a retrospective bacterial collection from Institute Adolfo Lutz. The patient data were fully anonymized and obtained as part of the routine clinical care procedures. No tissue material or other biological material was obtained from humans. The study was approved by the Technical and Scientific Council (CTC70H-2015) and Ethics Committee (CAAE: 54891516.4.0000.0059) of Institute Adolfo Lutz (São Paulo, Brazil). The findings and conclusions in this manuscript are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.