Systematic review and meta-analysis of the prevalence of Strep A emm clusters in Africa to inform vaccine development

Background An emm-cluster based system was proposed as a standard typing scheme to facilitate and enhance future studies of Group A Streptococcus (Strep A) epidemiological surveillance, M protein function and vaccine development strategies. We provide an evidence-based distribution of Strep A emm clusters in Africa and assess the potential coverage of the new 30-valent vaccine in terms of an emm cluster-based approach. Method Two reviewers independently assessed studies retrieved from a comprehensive search and extracted relevant data. Meta-analyses were performed (random effects model) to aggregate emm cluster prevalence estimates. Results Eight studies (n=1,595 isolates) revealed the predominant emm clusters as E6 (18%, 95% confidence interval (CI), 12.6; 24.0%), followed by E3 (14%, 95%CI, 11.2; 17.4%) and E4 (13%, 95%CI, 9.5; 16.0%). There is negligible variation in emm clusters as regards regions, age and socio-economic status across the continent. Considering an emm cluster-based vaccine strategy, which assumes cross-protection within clusters, the 30-valent vaccine currently in clinical development, would provide hypothetical coverage to 80.3% of isolates in Africa. Conclusion This systematic review indicates the most predominant Strep A emm cluster in Africa is E6 followed by E3, E4 and D4. The current 30-valent vaccine would provide considerable coverage across the diversity of emm cluster types in Africa. Future efforts could be directed toward estimating the overall potential coverage of the new 30-valent vaccine based on cross-opsonization studies with representative panels of Strep A isolates from populations at highest risk for Strep A diseases. Importance Low vaccine coverage is of grave public health concern, particularly in developing countries where epidemiological data are often absent. To inform vaccine development for group A streptococcus (Strep A), we report on the epidemiology of the M Protein emm clusters from Strep A infections in Africa, where Strep A-related illnesses and their sequalae including rheumatic fever and rheumatic heart disease, are of a high burden. This first report of emm clusters across the continent indicate a high probably of coverage by the M Protein-based vaccine currently undergoing testing, were an emm-cluster based approach to be used.


Introduction
Group A Streptococcus (Strep A) causes a range of human infections including pharyngitis and impetigo, which can lead to non-suppurative (immune-mediated) sequelae such as acute rheumatic fever (ARF) and rheumatic heart disease (RHD) if not properly managed (2). Additionally, Strep A has the ability to cause invasive infection such as sepsis, necrotizing fasciitis, pneumonia, and streptococcal toxic shock syndrome (STSS) in children and adults (3) with a high fatality rate; furthermore, it is a leading cause of maternal death in some regions (4). Strep A infections mostly affect young children and women living in developing countries (5). The estimated symptomatic Strep A pharyngitis annual incidence rate is 0.4 cases per person-year, with over 423 million cases, in children residing in developing countries (2).
The dire complications and huge economic burden of Strep A infections support the urgent need for an effective vaccine that would provide broad coverage of circulating Strep A strains (6). One of the Strep A vaccine strategies targets the M-protein on the bacterial surface, which has thermal stability, anti-phagocytic properties and the capacity to evoke antibodies with the greatest bactericidal activity (7). The hypervariable N-terminal region of the M-protein displays extensive nucleotide differences, thus giving rise to various M-protein amino acid sequences which imparts serological specificity (8). The 5' emm sequence encoding the mature protein is the basis for categorizing different Strep A strains through molecular typing methods, which aid in defining the epidemiology of Strep A infections. A 30-valent N-terminal M protein-based vaccine (9) is undergoing clinical trials (10).
The vaccine composition was based on extensive Strep A surveillance data from developed regions such as USA and Europe, those isolates that are involved in invasive disease, those associated with superficial infections and those causing autoimmune diseases (11,12). However, given the >200 Strep A emm types characterized to date (13), it is not surprising that there are highly prevalent Strep A subtypes absent in the current vaccine formulation, thus possibly excluding at-risk populations outside of western countries (14). An emm clustering system was introduced by Sanderson-Smith and colleagues that phylogenetically analyzed the whole M protein sequences, organizing emm types into clusters that have the same or similar sequences and host protein binding properties (15). This proposed classification allows for the previously identified Strep A emmtypes to be categorized into 48 discrete emm clusters (15) where more than one emm type may be contained within a cluster ( Table 1). The emm cluster system compliments the emm typing system, which may serve to enhance studies relating to M protein function, streptococcal virulence, epidemiological surveillance, and vaccine development (15). Emm clusters E1-E6 were placed into clade X, binding to immunoglobulin and C4BP. While A-C1 through A-C5 and D1-D5 were grouped into clade Y, with a host protein tropism towards plasminogen and fibrinogen.
To date, significant emm cluster data have been produced through emm typing of Strep A, with recent studies reporting on emm cluster epidemiology. Shulman documented the most prevalent emm clusters in the USA as E4 (27.16%), A-C3 (17.78%) and A-C4 (17.56%) amongst 7,040 isolates (16). The prevalence of emm clusters in three Pacific countries, viz. Australia, Fiji and New Caledonia illustrated that 70%-84% of clusters from isolates were shared, as opposed to comparison of emm types having only 14%-30% commonality between countries (17). In a third study by Chang-Ni in Taiwan, an analysis of both invasive and non-invasive strains revealed that cluster E6 was associated with both types of infections, while clusters D4, E2 and E3 were responsible for causing invasive isolates in their population (18). Recently, Frost demonstrated that M type-specific and cross-reactive immune responses frequently align with emm clusters, raising new opportunities to design multivalent vaccines with broad coverage (19).
A thorough review of emm cluster data from Africa has not yet been undertaken. A study that aggregates the African data on clusters is essential to contribute to the growing literature in efforts to develop a Strep A vaccine on a global scale, particularly in low-income countries where the burden of disease is greatest. Therefore, this review sought to provide an evidence-based distribution of Strep A emm clusters in Africa.

Methods
This study employed rigorous methods drawn from the scientific techniques and guidelines offered by the Cochrane Collaboration (20) and by reviews published previously (21,22). The review protocol has been registered in the PROSPERO International Prospective Register of Systematic Reviews CRD42017062485.

Review Question
This review asks the following question: What is the prevalence of Strep A emm clusters in Africa in the current available literature? Is there variation in emm cluster prevalence based on geography, age, clinical manifestation or socio-economic status?
We further sought to explore the potential coverage of the current 30-valent vaccine using a cluster-based approach.

Search Strategy
A comprehensive strategy was developed to search electronic databases to maximize sensitivity (Table S1-Appendix). The search strategies incorporated both free term text that are controlled to suit specific databases individually and Medical Subject Headings (MeSH) adapted to suit each individual database. A combination of terms relating to "emm typing", "emm clusters", "emm/M protein" and "streptococcal diseases" focusing on the African continent by applying the African search filter previously used by Pienaar and colleagues (23). The following databases were searched as at 29 April 2020; PubMed, Scopus and Google Scholar for grey literature.
The search was not restricted to any publication dates or language (however, abstracts must be clearly written in English for the study to be considered). Published and unpublished data were also considered for inclusion.

Inclusion criteria
All studies that described the prevalence of emm clusters or emm types within a given population were included in the review. Participants were restricted to the African continent but were not discriminated by clinical manifestation of Strep A or site of Strep A isolation. All laboratory-confirmed Strep A isolates were molecularly characterized by the emm typing method to ascertain serotypes as this is the gold standard technique (24). The emm typing method as developed by Beall (25)  were not concurrent, discrepancies were discussed and an arbitrator (third reviewer) was contacted to resolve any disagreements.

Exclusion criteria
Case reports, narrative reviews, opinion pieces and publications lacking prevalence primary data, or referenced methodology according to Beall (25), were excluded from the review. Duplicated studies of the same datasets and participants were removed and the final most recent publication of the data was considered for inclusion.

Data extraction and management
Two reviewers extracted data using a standardized data extraction form and any contradictions were solved through discussion or that of a third reviewer. Search results from the databases listed above, published and unpublished studies were managed with Endnote X9 referencing software. Briefly, data extraction consisted of recording the study demographics (amount of study participants, the geographical region, age group of enrolled participants, the clinical manifestation of disease and socio-economic status) along with the relevant emm type/cluster distributions within the population. Socio-economic status for the study settings was determined at a country level, according to The World Bank (27).

Quality assessment
The risk of bias assessment established by Hoy (28) and modified by Werfalli (22), was adapted in questions specific for use in this review (Table S3-Appendix). Using a quantitative scoring system, studies were characterized being of a low, moderate or high risk of bias. A study with low risk of bias is of high-quality and a low-quality study is associated with a higher risk of bias. Assessing the risk of bias informs the evaluation of heterogeneity in the pooled analyses.

Analysis
Data synthesis included three steps: (1) characterizing the study demographics (2) documenting emm types for emm cluster calculations, and (3) assessing potential vaccine coverage. In each study, the prevalence of emm types was recalculated by analyzing figures and tables to confirm the authors results and findings and to document the numerators and denominators. In older studies, emm typing information needed to be updated using the CDC database (29). Where emm cluster information was not reported, the CDC classification system was used to augment missing data (https://www.cdc.gov/groupastrep/lab.html), as well as the original cluster descriptions (15).
To calculate potential coverage, three tiers were assessed: 1) M peptides in the vaccine, 2) emm types that have been shown to be cross-opsonized, and 3) emm types that just happen to be in a cluster that are represented by one or more vaccine emm types. Quantitative data analysis was completed using Stata version 14.1 (StataCorp, College Station, TX, USA). We applied the Freeman-Tukey double arcsine transformation option using the metaprop routine to describe the combined prevalence estimates of all included studies with the standard error across the unadjusted estimates (30). Emm cluster distribution was correlated against different variables (resource setting, clinical manifestation and age group) in each of the studies. Lastly, we determined the theoretical protective coverage by emm cluster cross-opsonization for emm types included in the M protein-based vaccine (11).

The literature search for articles was reported according to the Preferred Reporting
Items for Systematic reviews and Meta-Analysis (PRISMA) Statement (1). Figure 1 details the search results with the retrieval of 121 articles for consideration from the respective electronic databases. After title screening and the removal of duplicates, we excluded 23 articles. We reviewed the remaining abstracts and excluded a further 81 articles, leaving 17 articles requiring full-text evaluation. Finally, eight articles met the inclusion criteria and were included in the review. A list of the excluded studies with reasons are detailed in Table S4 (Appendix).

Characteristics of included studies
The included articles were published between 2004 and 2019 with sample sizes ranging from 43 and 396 total isolates. Of these, two articles had cross-sectional study designs, while the remaining studies took a prospective passive surveillance approach. The ages of participants included in the studies were also recorded; six articles studied isolates obtained from children (range 0-18 years old) and two, studied patients of all ages. Studies were conducted in local and university hospitals, clinics, outpatient departments and schools situated in the study areas ( Table 2).
The country of each article was recorded, with 2 articles obtained from Ethiopia (24,31), South Africa (14,32), Tunisia (33,34) and one article from Kenya (35) and Mali (36). All the studies included in this review made use of the gold-standard, emm-typing molecular procedure proposed by Beall (25) and the CDC (26).

Prevalence of Strep A emm clusters
Five countries within Africa contributed emm cluster data to this review ( Figure 2 No variation in emm clusters by socio-economic status was apparent.   figure 3B-D).

Overall Prevalence of Strep A emm clusters represented by the emm types included in the 30-valent vaccine
Clusters A-C3, A-C4, A-C5 and E1 each have an effect size of ~2% (Table 4). Isolates from invasive disease were abundant in clusters D4, E2, E3 and E4 while only E6 had a preponderance of strains from non-invasive disease.

Assessment of risk of bias of included studies
The results from the assessment is portrayed in the Table 5, with two studies having a low risk of bias (32,36)

Discussion
This systematic review provides evidence for the distribution of emm clusters of Strep A in Africa, specifically focusing on the epidemiological differences within Africa and added value of the emm clustering system in assisting with vaccine development.
Using prevalence data obtained from eight studies representing five countries within Africa, this report identified the predominant emm clusters in Africa, namely E6 followed by E3, E4 and D4. We further report that the emm clusters contained in the current 30-valent vaccine could provide considerable coverage across the diversity of emm cluster types in Africa.
Comparing results to other emm clustering epidemiology studies, it is clear that there are variances amongst the dominant emm clusters between regions. Only cluster E3 in the present study is common with the Pacific region (17). Within the USA, E4 is the third highest cluster, whereas A-C3 and A-C4 together only amount to ~2% of the total strains isolated in Africa (16). This study emphasizes that emm clusters E6, E3 and D4, prevalent in the African populations where the burden of Strep A infections is highest (37), should take prominence alongside clusters E4, A-C3 and A-C4. We note that there are a number of emm clusters containing a single emm type as they do not share similar binding properties or sequences. Also, there are many emm types that have as yet not been categorized into a particular cluster, as this may be due to their recent emergence post the proposed cluster system. This should be the focus of future studies in which more associations with human host protein binding could be tested to determine any other similarities between single-emm clusters.
Steer reported that the African and Asian regions had the greatest diversity of emm types (13). This could be due to a variety of factors causing site-tissue tropism and disease manifestation, promoting the dominance of heterologous emm types in different regions (38). Our review provides no evidence for marked variation across the continent amongst most of the more prominent emm clusters. When considering the ages of participants infected with Strep A, there appears to be no differences compared to that of the overall estimates. There is an increased risk for the transmission of Strep A in poorer countries due to household crowding and the lack of income for proper healthcare (39). Evaluating socio-economic status amongst our studies revealed little to no differences in emm cluster data.
Amongst non-invasive infections, cluster E6 was the most abundant cluster. This is in accordance with previous reports completed by Sagar (40), Dhanda (41) and Arêas (42), that identified emm types belonging to cluster E6 (emm75, emm81) as the predominant isolates obtained from countries closely relating to the impoverished environments within Africa, India and Brazil respectively. However, when referring to invasive disease, the predominant emm clusters are E3, followed by E6 and D4, which is complimentary to the emm cluster data shown by Chiang-Ni (18).
In terms of the current 30-valent vaccine (11), with the assumption that the emm type prevalence data from the eight included studies could be generalised for the entire continent, vaccine coverage would be 55.92% of strains isolated in Africa. Frost had shown cross-reactive protection of a single emm type with the remaining emm types within the same cluster, specifically that of E4 (19). Thus, hypothetically assuming that if a single emm type in the 30-valent vaccine would provide cross-protection to the remaining isolates within the cluster, a emm cluster-based vaccine would then extend coverage to ~80% protection against Strep A (Figure 4). Of interest cluster D4, which comprises 28 heterologous emm types, and ranked high in this analysis, has only a single representation (emm83) included in the vaccine. If cross-protection were to occur within clusters, more emm types belonging to cluster D4 ought to gain a particular importance for inclusion into new vaccines, especially since D4 (10.9% of isolates) is the fourth highest abundant cluster within Africa. It is also important to note that coverage extended to invasive isolates was sub-optimal (n=219, 54.1%).
One of the main strengths of this review is attributed to the use of multiple databases searched, using an African search filter and a robust approach to the meta-analysis of the data. We systematically and purposefully assessed all the data available with no language exclusions, or restrictions to a clinical manifestation of disease, using the most recently published standard quality assessment tools for prevalence studies. We also assessed the risk of bias present in the individual articles, showing that the quality was reasonably high, thus allowing for comparisons across the studies. The main limitations of the review are due to the lack of epidemiological data obtained from low to middle income countries in Africa, especially given their relatively high burden of Strep A infections. The inclusion of more articles reporting on the prevalence of Strep A may further assist in distinguishing differences amongst the geographical location, age and socio-economic categories. A further limitation to the results of our systematic review is the significant heterogeneity in the prevalence estimates produced in the meta-analysis, however, this is expected when pooling prevalence studies. We made use of the Freeman-Tukey double arc-sine transformation to stabilize the variance of primary studies before pooling, thus limiting the impact of studies with either small or large prevalence on the overall pooled estimates, as well as across major subgroups (30).

Conclusion
In conclusion, this systematic review provides the latest evidence for the distribution of emm clusters of Strep A in Africa. We show that there is negligible variation in emm clusters as regards regions, age and socio-economic status across the continent. We further report that the current 30-valent vaccine will provide considerable coverage across the diversity of emm cluster types in Africa, thus providing direction for future work to include coverage of clusters D4, E2-E4 and E6, given that they comprise 83% of the total isolates obtained in Africa.