Group A Streptococcus emm Gene Types in Pharyngeal Isolates, Ontario, Canada, 2002–2010

Determination of emm variations may help improve vaccine design.

G roup A Streptococcus (GAS) is a gram-positive bacterial pathogen responsible for ≈600 million cases of pharyngitis each year worldwide (1). The widespread prevalence of this disease results in considerable costs, estimated to exceed $200 million annually in the United States alone (2). In addition to acute pharyngitis, GAS causes several other human diseases, ranging from relatively mild to more severe, such as necrotizing fasciitis, soft tissue infections, glomerulonephritis, acute rheumatic fever, and streptococcal toxic shock syndrome. Thus, infections caused by GAS are a major public health concern in the United States and Canada and throughout the world.
GAS strains are classifi ed mainly on the basis of variation in a cell-surface molecule known as M protein, encoded by the emm gene (3,4). M protein is a critical virulence factor and a major site of the human antibody response against GAS. M type-specifi c immunity develops in persons recovering from some GAS infections (5,6). As a result, the portion of the emm gene that encodes the amino-terminal 100 residues of M protein is under strong diversifying selection pressure, and this region is hypervariable in terms of GAS types (7). Currently, >120 distinct emm types of GAS are recognized.
Despite the considerable diversity of emm types of GAS isolates, epidemiologic studies have found that relatively few emm types tend to predominate within a local population; most isolates are composed of a small number of emm types (8,9). In distinct geographic areas, the predominant emm types often vary in frequency from year to year for reasons not fully understood. In addition, sizeable outbreaks can be caused by strains of a single emm type or of a small number of emm types. Overall, this combination of factors results in a complex epidemiologic situation for GAS pharyngitis.
Recently, vaccine candidates have been identifi ed in an effort to reduce the prevalence of GAS disease and the number of human deaths it causes (10). Some of these experimental vaccines are based on the amino-terminus of Group A Streptococcus emm Gene Types in Pharyngeal Isolates, Ontario, Canada, 2002-2010 M protein because of the type-specifi c immunity that may develop after GAS infection. A multivalent vaccine has been developed that exploits the amino-terminus of the M protein from many different emm types (11). In principle, the effectiveness of this type of M-protein vaccine may be highly dependent on how well the M proteins selected for the vaccine match the emm types of locally circulating strains. Thus, a more complete understanding of geographic and temporal variation in emm type may be useful for vaccine design. Furthermore, the emergence of new variants of known M types has been documented. Knowledge of the rate and patterns of emergence of distinct emm types and their alleles may be critical for understanding how GAS may "escape" the immune response generated by a vaccine based on the amino-terminus of M protein.
We investigated the distribution of GAS emm types causing pharyngitis in Toronto, Ontario, Canada, during 2002-2010. We also examined the temporal change in emm types in pharyngitis cases and compared this distribution with data from a comprehensive population-based study of GAS emm types that were causing invasive infections in Ontario. Finally, we studied the emm types causing pharyngitis in multiple geographic locations across the province of Ontario in 2009 and 2010.

Collection of Isolates
Isolates collected from throat specimens of patients with acute pharyngitis were identifi ed as GAS from primary media by a variety of standard methods. These GAS isolates (hereafter also referred to as pharyngeal isolates) were collected from 2002 through 2010 from multiple Ontario laboratories. GAS isolates, stripped of patient identifi ers, were forwarded to Mount Sinai Hospital in Toronto for confi rmation of identity and shipped to The Methodist Hospital Research Institute in Houston, Texas, for emm gene typing. Basic demographic information, including location where collected, age and sex of patient, and specimen collection date, was provided for isolates.

Geographically Diverse Ontario Strains
Additional isolates from outlying locations of LifeLabs and Gamma Dynacare laboratory chains in London, Sudbury, and Thunder Bay, Ontario, each provided 100 consecutive GAS isolates per center during July-September 2009. The Gamma Dynacare Ottawa laboratory provided consecutive isolates up to 100 per month from July 2009 through July 2010, for a total 659 isolates. The Gamma Dynacare London branch provided 219 isolates from July through September 2009, and the distantly located North Bay and Elliot Lake branches together provided 36 GAS isolates from July through October 2009. emm Type Assignment GAS isolates were grown overnight at 37°C with 5% CO 2 on trypticase soy agar plates containing 5% sheep blood (TSAII; Becton Dickinson, Franklin Lakes, NJ, USA). Genomic DNA was obtained by boiling a sample obtained by streaking from multiple colonies in 0.05 mol/L NaOH for 2 min. The crude cell lysates were centrifuged for 2 min at 2,000 × g, and 2 μL of the lysate was used in PCRs. The hypervariable region of the emm gene that encodes the amino-terminus of M protein was amplifi ed by PCR by using primers emm1 5′-TATT(C/G)GCTTAGAAAATTAA-3′ and emm2 5′-GCAAGTTCTTCAGCTTGTTT-3′. PCR products were purifi ed by using 96-well ultrafi ltration plates (EdgeBio, Gaithersburg, MD, USA), according to the manufacturer's instructions; products were suspended in 100 μL distilled water. Cycle sequencing was performed with the Big Dye version 3.1 dye-terminator kit (Applied Biosystems, Foster City, CA, USA) by using primer emm1. Unincorporated fl uorescent dye terminators were removed with 96-well gel-fi ltration cartridges (EdgeBio). Sequencing reactions were analyzed with a 3730xl DNA sequencer (Applied Biosystems), and chromatograms were analyzed with Sequencher version 4.9 (GeneCodes, Ann Arbor, MI, USA). High quality sequences were trimmed to 220 nt in length and compared with reference sequences in the Centers for Disease Control and Prevention (CDC) emm database (ftp://ftp.cdc.gov/pub/infectious_diseases/ biotech/tsemm) by using the BLAST algorithm (http:// blast.ncbi.nlm.nih.gov/Blast.cgi). Data analysis and graphing were performed with the GraphPad software package (Prism, La Jolla, CA, USA). The invasive index of each emm type was calculated by dividing its frequency in invasive infections by frequency in pharyngitis infections.

Overview of Pharyngitis Strains
We determined the emm type for 4,635 GAS isolates that were causing acute pharyngitis in the province of Ontario during 2002-2010. Of these, 3,209 isolates were collected from the greater Toronto metropolitan region, and 1,426 isolates were obtained from 5 sites located throughout Ontario (London, Ottawa, North Bay/Elliot Lake, Sudbury, and Thunder Bay). The mean age of patients was 16.1 years (range 8 months to 105 years).

Distribution of emm Types in Toronto GAS Pharyngitis Strains
Consistent with fi ndings from previous surveys of GAS isolates that have caused pharyngitis in North America, Europe, and elsewhere (8,9,12), we found that a relatively small number of emm types dominated. For example, the 6 most prevalent emm types collected in Toronto during 2002-2010 were (in order of prevalence) emm12, emm1, emm4, emm28, emm2, and emm89 ( Figure 1). These 6 emm types came from 68.9% of the pharyngeal isolates, whereas 29 emm types came from the remaining 31.1% of the isolates. Analysis of the annual change in emm type distribution indicated that, with few exceptions, these 6 types were consistently the most commonly collected. This fi nding suggests that the emm type population dynamic is relatively stable. However, emm89 strains were a key exception. The frequency of emm89 isolates increased 5-fold over the study period, increasing from 2.6% of isolates in 2002 to 14.7% in 2010 ( Figure 2). In 2010, emm89 isolates were the second most common emm type among pharyngitis specimens in our sample. These data indicate a recent major expansion of type emm89 strains among isolates causing pharyngitis in Toronto.

Identifi cation of New emm Alleles
We identifi ed 20 allelic variants of 8 GAS emm types that had not been previously described: emm1 (4 alleles), emm3 (3 alleles), emm5 (3 alleles), emm6 (4 alleles), emm8 (1 allele), emm11 (2 alleles), emm12 (2 alleles), and st106M (1 allele). Nucleotide sequences for these alleles have been submitted to the CDC Streptococcus pyogenes emm sequence database (designations listed in Table 1). Seventeen of these alleles differed from the most closely related reference sequence by 1 single-nucleotide polymorphism (SNP); 2 allelic variants differed by 2 SNPs; and 1 isolate had a 6-bp in-frame insertion, resulting in the addition of 2 amino acid residues. Of the 21 SNPs identifi ed, all but 1 resulted in a predicted amino acid substitution in the translated M-protein sequence. This excess of nonsynonymous mutations underscores the effect of the strong diversifying selection pressure acting on the emm gene.

Ontario emm Types in Relation to an Experimental 26-Valent GAS Vaccine
Overall, 18 of 57 emm types found in the Toronto pharyngeal isolates are represented in an experimental 26-valent GAS vaccine described elsewhere (11). These 18 emm types included 11 of the 12 most prevalent emm types that represent strains causing 78.5% of the pharyngitis cases we studied, a number similar to estimates for the US population (11). Notably, the single most commonly observed GAS emm type (emm4) not included in the 26-valent experimental vaccine was very common in Toronto. For example, during 2002-2010, emm4 was the third most common emm type causing pharyngitis, and in 2007 and 2008, it was the most common emm type. This fi nding is not entirely unexpected because emm4 has been one of the most common serotypes identifi ed by other pharyngitis surveys (8,9,12,13).
Although a multivalent GAS vaccine based on the amino-terminus of M protein has theoretical promise, a potential concern is the detrimental effects of allelic variation on vaccine effi cacy. Virtually all new and previously described emm alleles collected from pharyngitis patients in Toronto contained nucleotide changes that resulted in changes in amino acid sequence, with few alleles defi ned only by silent nuclear polymorphisms. New emm alleles generated by strong diversifying selection pressure acting on the emm gene could provide the means by which GAS strains evade a vaccine that includes only a single variant of each M-protein serotype; that is, creating vaccine-escape mutants.
To determine how common allelic variation was in the Toronto GAS population, we examined the number of alleles for each emm type found in specimens from pharyngitis patients in Toronto. For the top 10 serotypes, the most prevalent allele was found in 87.2% of the isolates (range 100%-42.7%). Notably, several prevalent emm types had an extensive number of alleles that would encode variant M proteins. For example, of the 6 most common serotypes, 4 had >6 allelic variants, and the most common emm type (emm12) had 16 different alleles.

Comparison of emm Type Distribution in Pharyngitis and Invasive GAS isolates
Previous studies have identifi ed nonrandom associations between specifi c emm types and an increased risk for invasive infection (14)(15)(16)(17) or increased severity of invasive infection (18,19). Thus, we tested the hypothesis that certain emm types were more prevalent in invasive disease isolates than in pharyngitis isolates in the Toronto region. Consistent with previous reports (9,20,21), we found that emm1 and emm3 strains each had an invasive index >1.0 (Table 2), which suggests that these emm types are overrepresented among invasive infections. Additionally, emm49 strains had an exceptionally high invasive index (16.7), largely because of the rarity of these strains among the pharyngeal isolates.
Comparison of the annual change in emm type frequencies in pharyngitis and invasive disease isolates indicated that certain emm types had highly variable frequencies, consistent with epidemic behavior. In particular, emm3 pharyngitis strains peaked in frequency in 2006 to become the fourth most common emm type that year. This timing corresponds with the observed peak in cases of invasive disease caused by emm3 strains in 2006 (Figure 3), which suggests a relationship between abundance of pharyngitis cases and invasive infections.

Variability in Frequency Distribution of emm Types from Diverse Localities
Several studies have reported that emm type distribution can vary geographically. Generally, however, these comparisons involved localities separated by large distances. To test the hypothesis that emm type distribution varied over a relatively small geographic distance, we analyzed GAS isolates from pharyngitis patients at 5 additional areas across Ontario (London, Ottawa, North Bay/Elliot Lake,  (Table 3). However, we discovered striking inter-site variability in the distribution of emm types. For example, in 2009, emm89 was the most common emm type identifi ed at 4 of the 6 localities, but emm89 strains were not among the 6 most common emm types found in Ottawa. Additionally, only 2 emm types were shared among the 5 most common organisms collected in North Bay and Sudbury, an unexpected result (p = 0.0007; Fisher exact test), given that these locations are separated by only ≈120 km. We also observed apparent local outbreaks of certain emm types in some locations. In 2009 and 2010, emm3 strains were among the 6 most prevalent emm types in Ottawa but were rarely observed elsewhere. Most (33/43 [77%]) of the Ottawa emm3 strains had the emm3.53 allele, which differs from the emm3.2 allele by a single nucleotide change. Isolates with the emm3.2 allele are otherwise the most abundant emm3 strains in Ontario. We note that 1 isolate with the emm3.53 allele was found in Toronto in 2009, where it had not been observed in previous years, suggesting recent introduction. Subsequent studies will be required to determine whether the emm3.53 strain expands across Ontario and whether it has increased invasive potential.

Discussion
In this large study of emm type distribution among GAS pharyngitis strains in Canada, we identifi ed a similar pattern of emm type distribution as reported in previous surveys and also observed that strains of a relatively few emm types dominate. The most abundant emm types were similar to those reported in previous studies of GAS pharyngitis strains from North America and Europe (8,9,12).
Of note, we found that emm89 strains have recently increased in frequency in Ontario. Specifi cally, over a 9-year period, emm89 strains increased 5-fold and in 2010 were the second most common emm type in the Toronto sample. The increase in emm89 strains among pharyngeal isolates paralleled an increase in the frequency of emm89 strains among invasive GAS isolates from 2003 through 2010 ( Figure 4). This fi nding suggests that a marked expansion of emm89 strains has occurred in Ontario. Regional outbreaks of emm89 strains have been documented previously, including a clonal epidemic that occurred in northern Italy (22). Surveillance conducted by CDC also has reported similar increases in emm89 strains among invasive infections in New York and Maryland during 2007-2009, and emm89 was among the top 5 invasive serotypes collected by CDC in 1998CDC in , 2001CDC in -2003CDC in , and 2007CDC in -2009. Thus, emm89 strains may commonly contribute to local epidemics of pharyngitis and invasive disease.
Although the fact that the frequency distribution of GAS pharyngitis emm types varies across localities separated by large distances has been well described, we have shown that emm type distributions also can vary over relatively small distances. This fi nding expands knowledge of GAS epidemiology. Thus, GAS pharyngitis strains circulating in Ontario are a collection of distinct populations, apparently characterized by relatively limited transmission between distant locations. Much of our understanding of GAS epidemiology has been based on the characterization of   strains causing local outbreaks of invasive disease and large surveillance networks encompassing geographically expansive catchment regions. This circumstance has led to the belief that GAS exists mostly as large, homogeneous populations. Our fi ndings suggest that GAS populations are much more complex. This conclusion is supported by our previous genomewide analysis of invasive emm3 isolates from Ontario, which found that genetic distance and geographic proximity were strongly correlated and that groups of clonally related isolates were frequently limited to discrete geographic locations (24). Our longitudinal analysis of emm types in Toronto indicated that several emm types (including emm1, emm2, emm3, and emm77) varied substantially in annual frequency, which suggests features of epidemic behavior. Comparison of yearly frequencies of emm types in emm3 isolates from patients with pharyngitis and invasive disease showed a nearly superimposable pattern, with coincident peaks of infection occurring in 2006 (Figure 3). This fi nding is consistent with a model in which many invasive GAS strains originate from the local pharyngitis strains and that cyclical outbreaks of invasive infection coincide or follow recent outbreaks of pharyngitis infections. A similar conclusion was reported by Hoe et al. (25), whose analysis of pharyngitis and invasive isolates from Finland showed that a novel streptococcal inhibitor of complement (sic) alleles fi rst appeared in local pharyngitis strains before their appearance in invasive isolates. Furthermore, a GAS clone responsible for a local outbreak of invasive disease in Minnesota was common among pharyngeal isolates from school-aged children living in the outbreak area (26). Additional investigation into the genetic relationship between pharyngitis and invasive disease strains conducted at the full-genome level may provide useful information about the molecular events that contribute to invasive GAS.
The large size and longitudinal nature of our strain sample enabled us to obtain extensive information about the level of emm allelic variation and the rate of emergence of new emm alleles in Ontario. Previous experiments conducted by Dale et al. on 3 serotypes included in the 26-valent GAS vaccine found that slight allelic variants had little infl uence on bactericidal killing activity during in vitro assays, leading the researchers to conclude that variant subtypes might not affect vaccine effi cacy (27). However, this fi nding contrasts with several other reports that observed a variable response to allelic variants (28,29). Whether the fi ndings of Dale et al. are applicable to all 26 serotypes included in the vaccine is unknown (27). Despite its potential, albeit unproven, relevance to GAS vaccine design, we have a relatively limited understanding about this subject. We observed extensive emm allelic variation in Ontario, with most common emm types possessing >6 different alleles. We also found that strong selective pressure was driving the emergence of new M-protein variants, with all but one of the new alleles encoding amino acid substitutions. The observed ratio of synonymous to nonsynonymous nucleotide substitutions indicates that allelic variation most likely is shaped by selective pressure, perhaps immune mediated. Previous investigators have also reported that the N-terminal regions of M proteins possess functional domains in addition to opsonic epitopes that might constrain the amount of variability within an M type (30)(31)(32)(33). Whether allelic variation may eventually result in escape mutants in a population with high levels of immunity, resulting from administration of an M-proteinbased vaccine, is not known but should be considered. We believe this allelic variation might pose a challenge to GAS vaccine designs that rely on recombinant portions of the M-protein amino-terminus.
GAS M-protein serotypes are often regarded as genetically homogeneous populations composed of a single or relatively few clones. The remarkable level of emm type allelic diversity we observed in Ontario contrasts with this view. We found extensive diversity not only in the distribution of GAS serotypes but also on the allelic level and between geographic locations separated by short distances. Our recent genomewide analysis of invasive M3 isolates in Ontario revealed a strikingly complex genetic structure (24). Given the relationship between these populations, GAS pharyngeal isolates probably harbor an additional layer of genetic diversity that remains to be elucidated through whole-genome sequencing of a population of pharyngeal isolates.