Systematic Review and Meta-analysis of the Prevalence of Group A Streptococcal emm Clusters in Africa To Inform Vaccine Development

Low vaccine coverage is of grave public health concern, particularly in developing countries where epidemiological data are often absent. To inform vaccine development for group A Streptococcus (GAS), we report on the epidemiology of the M protein emm clusters from GAS infections in Africa, where GAS-related illnesses and their sequelae, including rheumatic fever and rheumatic heart disease, are of a high burden. This first report of emm clusters across the continent indicates a high probably of coverage by the M protein-based vaccine currently undergoing testing were an emm-cluster based approach to be used.

as sepsis, necrotizing fasciitis, pneumonia, and streptococcal toxic shock syndrome (STSS) in children and adults (2), with a high fatality rate; furthermore, it is a leading cause of maternal death in some regions (3). Noninvasive GAS infections mostly affect young children and women living in developing countries (4), while severe invasive infections are both more common in adults (and increases with age) and more common in men than in women (5). The estimated symptomatic GAS pharyngitis annual incidence rate is 0.4 cases per person-year, with more than 423 million cases in children residing in developing countries (1).
The dire complications and huge economic burden of GAS infections and their sequelae, with an estimated global annual incidence of 616 million pharyngitis and 1.78 million severe cases (1), support the urgent need for an effective vaccine that would provide broad coverage of circulating GAS strains (6). One of the GAS vaccine strategies targets the M protein on the bacterial surface, which has thermal stability, antiphagocytic properties, and the capacity to evoke antibodies with the greatest bactericidal activity (7). The hypervariable N-terminal region of the M protein displays extensive nucleotide differences, thus giving rise to various M protein amino acid sequences which imparts serological specificity (8). The 5= emm sequence encoding the mature protein is the basis for categorizing different GAS strains through molecular typing methods, which aid in defining the epidemiology of GAS infections.
A 30-valent N-terminal M protein-based vaccine (9) is undergoing clinical trials (10). The vaccine composition was based on extensive GAS surveillance data from developed regions such as the United States and Europe on those isolates that are involved in invasive disease, those associated with superficial infections, and those causing autoimmune diseases (9,11). However, given the Ͼ200 GAS emm types characterized to date (12), the absence of highly prevalent GAS subtypes in the current vaccine formulation may diminish the coverage of at-risk populations outside western countries. An emm clustering system was introduced by Sanderson-Smith and colleagues that phylogenetically analyzed the whole-M protein sequences, organizing emm types into clusters that have the same or similar sequences and host protein binding properties (13). This proposed classification allows for the previously identified GAS emm types to be categorized into 48 discrete emm clusters (13), where more than one emm type may be contained within a cluster ( Table 1). The emm cluster system complements the emm typing system, which may serve to enhance studies relating to M protein function, streptococcal virulence, epidemiological surveillance, and vaccine development (13). emm clusters E1 to E6 were placed into clade X, binding to immunoglobulin and C4BP. While A-C1 through A-C5 and D1 to D5 were grouped into clade Y, with a host protein tropism toward plasminogen and fibrinogen.
To date, significant emm cluster data have been produced through emm typing of GAS, with recent studies reporting on emm cluster epidemiology. Shulman et al. documented the most prevalent emm clusters in the United States as E4 (27.16%), A-C3 (17.78%), and A-C4 (17.56%) among 7,040 isolates (14). The prevalence of emm clusters in three Pacific countries, viz. Australia, Fiji, and New Caledonia, illustrated that 70% to 84% of clusters from isolates were shared, as opposed to comparison of emm types having only 14% to 30% commonality between countries (15). In a third study by Chiang-Ni et al. in Taiwan, an analysis of both invasive and noninvasive strains revealed that cluster E6 was associated with both types of infections, while clusters D4, E2, and E3 were responsible for causing invasive isolates in their population (16). Recently, Frost et al. demonstrated that M type-specific and cross-reactive immune responses frequently align with emm clusters, raising new opportunities to design multivalent vaccines with broad coverage (17).
A thorough review of emm cluster data from Africa has not yet been undertaken. A study that aggregates the African data on clusters is essential to contribute to the growing literature in efforts to develop a GAS vaccine on a global scale, particularly in low-income countries where the burden of disease is greatest. Therefore, this review sought to provide an evidence-based distribution of GAS emm clusters in Africa.

RESULTS
The literature search for articles was reported according to the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA) statement (18). Figure 1 details the search results with the retrieval of 121 articles for consideration from the respective electronic databases. After title screening and the removal of duplicates, we excluded 23 articles. We reviewed the remaining abstracts and excluded a further 81 articles, leaving 17 articles requiring full-text evaluation. Finally, eight articles met the inclusion criteria and were included in the review. A list of the excluded studies with reasons are detailed in Table S4 in reference 19.
Characteristics of included studies. The included articles were published between 2004 and 2019 with sample sizes ranging from 43 to 396 total isolates. Of these, two articles had cross-sectional study designs, while the remaining studies took a prospective passive surveillance approach. The ages of participants included in the studies were also recorded; six articles studied isolates obtained from children (range, 0 to 18 years old) and two studied patients of all ages. Studies were conducted in local and university hospitals, clinics, outpatient departments, and schools situated in the study areas ( Table 2).
The country of each article was recorded, with 2 articles obtained from Ethiopia (20,21), two from South Africa (22,23), two from Tunisia (24,25), and one article each from Kenya (26) and Mali (27). All the studies included in this review made use of the gold-standard emm-typing molecular procedure proposed by Beall et al. (28) and the CDC (29).  Table 3). The remaining 12 emm types (55 isolates) are among those as yet not classified. The predominant clusters were E6 with 294 isolates (18.4%), followed by E3 (n ϭ 243, 15.2%) and E4 (n ϭ 225, 14.1%). The emm clusters with the least number of isolates are D1 and E5, each having a single isolate. emm cluster A-C1 was not represented. Sixty-three isolates were reported as "untypeable" by authors, thus not assigned an emm type, or as an "old" emm type that does not correspond with the CDC classification.
There were four regions represented across Africa. Variation of clusters across the regions was not remarkable. Interestingly, single-isolate cluster M55 was specific to Mali in West Africa, containing 16 isolates. The highest single-isolate cluster, M18 (n ϭ 41 isolates), was not represented in South Africa. Where age of participants in studies was provided, there was no difference among children (Ͻ18 years of age) and adults in terms of cluster prevalence. By clinical manifestation, isolates from invasive disease numbered 516 (32.4%) ( Figure S1 in reference 19), with the most prevalent clusters being E3 (n ϭ 91 isolates) followed by E6 (n ϭ 82) and D4 (n ϭ 77). No variation in emm clusters by socioeconomic status was apparent.
Overall prevalence of GAS emm clusters represented by the emm types included in the 30-valent vaccine. Cluster E6 was the most represented emm cluster (17.97%; 95% confidence interval [CI], 12.6% to 24.0%) among African isolates (Fig. 3A). This was followed by E3 (14.17%; 95% CI, 11.2% to 17.4%), E4 (12.6% of isolates; 95%   (Table 4). M protein vaccine coverage. Slightly more than 80 percent (80.3%) of African GAS isolates are classified in clusters included in the 30-valent vaccine (Fig. 4). However, based on emm types within the vaccine, together with emm types known to be cross-opsonized, the number of African GAS isolates that potentially could be covered by the 30-valent vaccine amounts to 892 of the 1,532 isolates, corresponding to 58.22% (comprising 599 vaccine-type emm types and 293 nonvaccine emm types) (9). For the emm types representing the remaining 640 isolates (41.78%), there is either no information yet available about possible cross-protection or the emm types would not be expected to cross-react with the 30-valent vaccine antisera because they are in singleemm clusters or in clusters not represented by the vaccine. Interestingly, isolates classified as emm30 (AC-2), emm36 (D1), emm51 (E5), and emm97 (D5), despite not being in a cluster represented in the vaccine, are nevertheless afforded crossprotection.
With regard to invasive emm types in Africa, the overall potential coverage of the vaccine (based on published results of cross-opsonization) was 54.1% for clusters included in the meta-analyses (Table 4). More specifically, coverage for clusters E6, E4, and E2 ranged from 69% to 74% of invasive isolates; only ϳ50% of strains would be protected in E3, and coverage for the remaining clusters was below 47% except for A-C5 (100% coverage), as there were only two invasive strains reported. Interestingly, the 30-valent vaccine would potentially only provide 12% coverage to invasive isolates belonging to the fourth highest cluster, D4 (n ϭ 28 emm types).
Assessment of risk of bias of included studies. The results from the assessment is portrayed in the Table 5, with two studies having a low risk of bias (23,27) and the remaining studies being of moderate bias. All the articles narrowed down their target population by focusing on a specific age group, clinical manifestation, or geographical area. The data in all included studies were collected directly from the study participants as opposed to by proxy, confirming the reliability of sample collection and patient demographics. The included studies clearly described the phenotypes of patients, providing an acceptable case definition or diagnostic algorithm. Studies focusing on invasive GAS infections isolated GAS from normally sterile sites such as blood, cerebrospinal fluid, joints, bones, or synovium among others. Noninvasive GAS was isolated from skin or throat via swabs of the infected area.

DISCUSSION
This systematic review provides evidence for the distribution of emm clusters of GAS in Africa, specifically focusing on the epidemiological differences within Africa and  a Those emm types that has not been assigned to a particular clade by Sanderson-Smith et al. (13). b Sixty-three isolates were "untypeable" by the author and was not assigned an emm type, or an "old" emm type that does not correspond with the CDC classification.
added value of the emm clustering system in assisting with vaccine development. Using prevalence data obtained from eight studies representing five countries within Africa, this report identified the predominant emm clusters in Africa, namely, E6 followed by E3, E4, and D4. We further report that the emm clusters contained in the current 30-valent vaccine could provide slightly more than 80% coverage across the diversity of emm cluster types in Africa.
Comparing results to other emm clustering epidemiology studies, it is clear that the dominant emm clusters vary between regions. Only cluster E3 in the present study is common with the Pacific region (15). When comparing the current data to that from the United States, E4, the highest cluster in the United States, is the third highest cluster, whereas A-C3 and A-C4 together only amount to ϳ2% of the total strains isolated in Africa (14). This study emphasizes that emm clusters E6, E3, and D4, prevalent in the African populations where the burden of GAS infections is highest (30), should take prominence alongside clusters E4, A-C3, and A-C4.
We note that there are a number of emm clusters containing a single emm type, as they do not share similar binding properties or sequences. Also, there are many emm types that have not as yet been categorized into a particular cluster, as this may be due to their recent emergence after the proposed cluster system. This should be the focus of future studies in which more associations with human host protein binding could be tested to determine any other similarities between single-emm clusters. Steer et al. reported that the African and Asian regions had the greatest diversity of emm types (12). This could be due to a variety of factors causing site-tissue tropism and disease manifestation, promoting the dominance of heterologous emm types in different regions (31). Our review provides no evidence for marked variation across the continent among most of the more prominent emm clusters. When considering the ages of participants infected with GAS, there appears to be no differences compared to that of the overall estimates. There is an increased risk for the transmission of GAS in poorer countries due to household crowding and the lack of income for proper health care (32). Evaluating socioeconomic status among our studies revealed little to no differences in emm cluster data.
By clinical manifestation in terms of the invasive nature of the infection, among noninvasive infections, cluster E6 was the most abundant cluster. This is in accordance with previous studies conducted in similarly impoverished areas in India (33,34) and Brazil (35), which identified emm types belonging to cluster E6 (emm75 and emm81) as the predominant isolates. However, in invasive disease, the predominant emm cluster is E3, followed by E6 and D4, which was comparable to the emm cluster data shown in Southern Taiwan (16). A study conducted on invasive isolates in the United States suggests that clusters E3 and E6 are the third and fifth highest clusters, respectively (36).
In terms of the current 30-valent vaccine (9), with the assumption that the emm type prevalence data from the eight included studies could be generalized for the entire continent, vaccine coverage would be 55.92% of strains isolated in Africa. Frost et al. had shown cross-reactive protection of a single emm type with the remaining emm types within the same cluster, specifically that of E4 (17). Thus, hypothetically assuming that if a single emm type in the 30-valent vaccine would provide cross-protection to the remaining isolates within the cluster, an emm cluster-based vaccine would then extend  coverage to ϳ80% protection against GAS (Fig. 4). Of interest, cluster D4, which comprises 28 heterologous emm types and ranked high in this analysis, has only a single representation (emm83) included in the vaccine. If cross-protection were to occur within clusters, more emm types belonging to cluster D4 ought to gain a particular importance for inclusion into new vaccines, especially since D4 (10.9% of isolates) is the fourth-highest abundant cluster within Africa. It is also important to note that coverage extended to invasive isolates was suboptimal (n ϭ 219, 54.1%, inclusive of crossprotection afforded to nonvaccine types). One of the main strengths of this review is attributed to the search of multiple databases, using an African search filter and a robust approach to the meta-analysis of  the data. We systematically and purposefully assessed all the data available with no language exclusions or restrictions to a clinical manifestation of disease, using the most recently published standard quality assessment tools for prevalence studies. We also assessed the risk of bias present in the individual articles, showing that the quality was reasonably high, thus allowing for comparisons across the studies. The main limitations of the review are due to the lack of epidemiological data obtained from low-to middle-income countries in Africa, especially given their relatively high burden of GAS infections. The inclusion of more articles reporting on the prevalence of GAS may further assist in distinguishing differences among the geographical locations, ages, and socioeconomic categories. A further limitation to the results of our systematic review is the significant heterogeneity in the prevalence estimates produced in the metaanalysis; however, this is expected when pooling prevalence studies. We made use of the Freeman-Tukey double arcsine transformation to stabilize the variance of primary studies before pooling, thus limiting the impact of studies with either small or large prevalence on the overall pooled estimates as well as across major subgroups (37).
In conclusion, this systematic review provides the latest evidence for the distribution of emm clusters of GAS in Africa. We show that there is negligible variation in emm clusters with regard to regions, age, and socioeconomic status across the continent. We further report that the current 30-valent vaccine will provide considerable coverage across the diversity of emm cluster types in Africa, thus providing direction for future work to include coverage of clusters D4, E2 to E4, and E6, given that they comprise 83% of the total isolates obtained in Africa.

MATERIALS AND METHODS
This study employed rigorous methods drawn from the scientific techniques and guidelines offered by the Cochrane Collaboration (38) and by reviews published previously (39,40). The review protocol has been registered in the PROSPERO International Prospective Register of Systematic Reviews as CRD42017062485.
Review question. This review asks the following questions. What is the prevalence of GAS emm clusters in Africa in the current available literature? Is there variation in emm cluster prevalence based on geography, age, clinical manifestation, or socioeconomic status? We further sought to explore the potential coverage of the current 30-valent vaccine using a cluster-based approach.
Search strategy. A comprehensive strategy was developed to search electronic databases to maximize sensitivity (Table S1 in reference 19). The search strategies incorporated both free-term text, which is controlled to suit specific databases individually, and medical subject headings (MeSH) adapted to suit each individual database. A combination of terms relating to "emm typing," "emm clusters," "emm/M protein," and "streptococcal diseases" were used, focusing on the African continent by applying the African search filter previously used by Pienaar and colleagues (41). The following databases were searched as of 29 April 2020; PubMed, Scopus, and Google Scholar for gray literature. The search was not restricted to any publication dates or language; however, abstracts had to be clearly written in English for the study to be considered. Published and unpublished data, such as gray literature including theses and conference proceedings, were also considered for inclusion.
Inclusion criteria. All studies that described the prevalence of emm clusters or emm types within a given population were included in the review. Participants were restricted to the African continent but were not discriminated by clinical manifestation of GAS or site of GAS isolation. All laboratory-confirmed GAS isolates were molecularly characterized by the emm typing method as developed by Beall et al. (28) and in alignment with the Centers for Disease Control and Prevention (29) to classify GAS according to sequence analysis of the 5= hypervariable region of the M protein gene.
Two reviewers independently applied the search strategy to the relevant databases in which the titles and abstracts were evaluated to exclude studies that did not describe the prevalence of GAS. Thereafter, full texts of the included titles and abstracts were retrieved and further evaluated against the inclusion criterion (Table S2 in reference 19). A comparison was made between individual lists: if the reviewers' lists were not concurrent, discrepancies were discussed and an arbitrator (third reviewer) was contacted to resolve any disagreements.
Exclusion criteria. Case reports, narrative reviews, opinion pieces, publications lacking prevalence primary data, and those that referenced methodology that was not according to Beall et al. (28) were excluded from the review. Duplicated studies of the same data sets and participants were removed, and the final most recent publication of the data was considered for inclusion.
Data extraction and management. Two reviewers extracted data using a standardized data extraction form, and any contradictions were solved through discussion or by a third reviewer. Search results from the databases listed above and published and unpublished studies were managed with Endnote X9 referencing software. Briefly, data extraction consisted of recording the study demographics (number of study participants, the geographical region, age group of enrolled participants, the clinical manifestation of disease, and socioeconomic status) along with the relevant emm type/cluster distribu-tions within the population. Socioeconomic status for the study settings was determined at a country level, according to The World Bank (42).
Quality assessment. The risk of bias assessment established by Hoy et al. (43) and modified by Werfalli et al (40) was adapted in questions specific for use in this review (Table S3 in reference 19). Using a quantitative scoring system, studies were characterized as being of a low, moderate, or high risk of bias. A study with a low risk of bias is of high quality, and a low-quality study is associated with a higher risk of bias. Assessing the risk of bias informs the evaluation of heterogeneity in the pooled analyses.
Analysis. Data synthesis included three steps: (i) characterizing the study demographics, (ii) documenting emm types for emm cluster calculations, and (iii) assessing potential vaccine coverage. In each study, the prevalence of emm types was recalculated by analyzing figures and tables to confirm the authors' results and findings and to document the numerators and denominators. In older studies, emm typing information needed to be updated using the CDC database (44). Where emm cluster information was not reported, the CDC classification system was used to augment missing data (https://www.cdc .gov/groupastrep/lab.html) as well as the original cluster descriptions (13).
To calculate potential coverage, three tiers were assessed: (i) M peptides in the vaccine, (ii) emm types that have been shown to be cross-opsonized, and (iii) emm types that just happen to be in a cluster that is represented by one or more vaccine emm types. Quantitative data analysis was completed using Stata version 14.1 (StataCorp, College Station, TX, USA). We applied the Freeman-Tukey double arcsine transformation option using the metaprop routine to describe the combined prevalence estimates of all included studies with the standard error across the unadjusted estimates (37). emm cluster distribution was correlated against different variables (resource setting, clinical manifestation, and age group) in each of the studies. Lastly, we determined the theoretical protective coverage by emm cluster crossopsonization for emm types included in the M protein-based vaccine (9).

ACKNOWLEDGMENTS
This work is based on the research supported in part by the National Research