Population and Whole Genome Sequence Based Characterization of Invasive Group A Streptococci Recovered in the United States during 2015

ABSTRACT Group A streptococci (GAS) are genetically diverse. Determination of strain features can reveal associations with disease and resistance and assist in vaccine formulation. We employed whole-genome sequence (WGS)-based characterization of 1,454 invasive GAS isolates recovered in 2015 by Active Bacterial Core Surveillance and performed conventional antimicrobial susceptibility testing. Predictions were made for genotype, GAS carbohydrate, antimicrobial resistance, surface proteins (M family, fibronectin binding, T, R28), secreted virulence proteins (Sda1, Sic, exotoxins), hyaluronate capsule, and an upregulated nga operon (encodes NADase and streptolysin O) promoter (Pnga3). Sixty-four M protein gene (emm) types were identified among 69 clonal complexes (CCs), including one CC of Streptococcus dysgalactiae subsp. equisimilis. emm types predicted the presence or absence of active sof determinants and were segregated into sof-positive or sof-negative genetic complexes. Only one “emm type switch” between strains was apparent. sof-negative strains showed a propensity to cause infections in the first quarter of the year, while sof+ strain infections were more likely in summer. Of 1,454 isolates, 808 (55.6%) were Pnga3 positive and 637 (78.9%) were accounted for by types emm1, emm89, and emm12. Theoretical coverage of a 30-valent M vaccine combined with an M-related protein (Mrp) vaccine encompassed 98% of the isolates. WGS data predicted that 15.3, 13.8, 12.7, and 0.6% of the isolates were nonsusceptible to tetracycline, erythromycin plus clindamycin, erythromycin, and fluoroquinolones, respectively, with only 19 discordant phenotypic results. Close phylogenetic clustering of emm59 isolates was consistent with recent regional emergence. This study revealed strain traits informative for GAS disease incidence tracking, outbreak detection, vaccine strategy, and antimicrobial therapy.


IMPORTANCE
The current population-based WGS data from GAS strains causing invasive disease in the United States provide insights important for prevention and control strategies. Strain distribution data support recently proposed multivalent M type-specific and conserved M-like protein vaccine formulations that could potentially protect against nearly all invasive U.S. strains. The three most prevalent clonal complexes share key polymorphisms in the nga operon encoding two secreted virulence factors (NADase and streptolysin O) that have been previously associated with high strain virulence and transmissibility. We find that Streptococcus pyogenes is phylogenetically subdivided into loosely defined multilocus sequence type-based clusters consisting of solely sof-negative or sof-positive strains; with sof-negative strains demonstrating differential seasonal preference for infection, consistent with the recently demonstrated differential seasonal preference based on phylogenetic clustering of full-length M proteins. This might relate to the differences in GAS strain com-

RESULTS
Potential coverage of a combined 30-valent type-specific M and Mrp protein vaccine. Of the 1,454 iGAS isolates in this study, 1,290 (88.7%) had 1 of 27 emm types targeted by an experimental 30-valent type-specific vaccine shown to elicit Ͼ50% killing of strains expressing individual vaccine M types (7,8) (Fig. 1A; Table 1). In addition, a single query identifying the three classes of the M-like Mrp vaccine candidate (9) was positive for 140 (85.4%) of the 164 iGAS isolates of emm types not covered by the 30-valent vaccine. A majority of the strains (778/1,454 [53.5%]) would theoretically be targeted by both components of a potential combination vaccine ( Fig. 1B; Table 1), while 98.3% (1,430/1,454) of the strains may be covered by at least one of the two vaccine components.
Resistance. Of the 1,454 isolates used in this study, 328 (22.6%) were detected with one or more accessory genes or chromosomal signatures associated with resistance or decreased susceptibility to antimicrobials ( Table 2). Only 19 isolates (1.3%) with discrepant phenotypes were observed. Among these 19 isolates, there were 14 instances of undetected resistance that included 12 tetracycline-resistant group A S. dysgalactiae subsp. equisimilis (see Table S1 in the supplemental material for accession numbers) and 2 erythromycin-nonsusceptible (MICs of 0.5 to 1 g/ml) isolates. Five isolates associated with false predictions of resistance are described in Table 2 footnote b.
Although ␤-lactam antibiotic resistance in S. pyogenes has not been reported, resistance to this class of antibiotics in this species would have a profoundly negative public health impact.
For this reason, we incorporated a WGS-based monitoring system. Determination of PBP2x transpeptidase sequence types (STs) of group B streptococci (GBS) and pneumococci is a very sensitive mechanism for detecting potential first-step mutations conferring decreased susceptibility or intermediate resistance to penicillin and other ␤-lactam antibiotics (16)(17)(18). We used this same approach with the corresponding PBP2x region from S. pyogenes. The MICs of the six ␤-lactams for all of the isolates were below the values previously flagged for GBS, which corresponded to 1 of 16 PBP2x types (Table S1). Of the 1,454 isolates, 1,107 (76.1%) shared type PBP2x-1, which served as the reference sequence.
Macrolide resistance was predicted in 210 (14.4%) of the isolates and in most cases (184/210, 87.6%) was associated with erm methylase genes (ermT, ermB, or ermTR) and either inducible or constitutive coresistance to clindamycin. Most isolates positive for ermT or ermTR (Ͼ80%) were inducibly clindamycin resistant, while most ermB-positive isolates (Ͼ90%) were constitutively resistant (data not shown). The most frequently occurring macrolide resistance determinant was ermT, because of its association with emm92 as previously described (19) and possibly more recent associations with the emm4/ST39 and emm77/ST399 lineages ( Table 1). As previously noted in GBS, the ermT determinant was detected at an approximately 10-fold greater read depth than other markers, consistent with its presence on a multicopy plasmid (17). Analysis of highquality assemblies from two randomly selected ermT-positive strains (one type emm4 [isolate 20155033) and one type emm92 [isolate 20154014]) revealed that the single ermT-positive contig in each strain compared to previously described (17) plasmid pRW35 (4,968 bp) had 100% coverage, Ͼ99% identity, and a length of~4,970 bp.
One strain contained the putative efflux determinant lsaC and also ermB. In GBS, we found that this combination confers decreased susceptibility to quinupristindalfopristin, presumably because of streptogramin A resistance conferred by lsaC and streptogramin B resistance conferred by ermB (17). In this iGAS isolate, the MIC of quinupristin-dalfopristin was somewhat higher (1 g/ml) than the average (~0.3 g/ml) but still below the MIC (2 g/ml) indicative of intermediate resistance (20).
Approximately 95% of the GAS isolates tested had a ciprofloxacin MIC of Յ2 g/ml and a levofloxacin MIC of Յ1 g/ml. For this reason, we considered the 2-g/ml MIC of both antibiotics an indicator of reduced susceptibility to fluoroquinolones. We found nine substitutions in ParC and/or GyrA that were highly associated with reduced susceptibility or nonsusceptibility to fluoroquinolones (Tables 2 and 3). Current guide-  (Table 3). Four instances of chloramphenicol resistance, corresponding to the presence of a cat gene, were found (Table 2). We found no resistance to rifampin, gentamicin, or vancomycin, consistent with finding no previously described rpoB substitutions associated with rifampin resistance (17,21,22) and no genes for gentamicin or vancomycin resistance (23,24). All isolates were also susceptible to daptomycin and linezolid.
S. pyogenes strain diversity as assessed by MLST and emm type. The 1,454 isolates comprised 70 different MLST complexes (MCs) and 64 different emm types (Table 1). On the basis of a looser criterion than MCs (described in Materials and Methods), there were 15 clonal groups and 28 singletons (Fig. 2). There were generally strong associations of genetic features with the MCs depicted in Table 1. emm types. Type emm1 was the most frequently occurring type overall (Table 1), accounting for 21.7% of the isolates, and was among the three most common emm types at each ABCs site (https://www.cdc.gov/abcs/reports-findings/survreports/gas15 .html). Types emm89 and emm12 were also frequent (12.9 and 9.3% of the isolates tested, respectively) and widely distributed. With the exception of emm82, the five most frequently occurring emm types in 2015 (emm1, emm89, emm12, emm82, and emm28) have each been among the five most frequently occurring types each year since 2012 (emm1 was the most frequent each year).
Recent regional emergence of emm59. Regional instances of emm type emergence were apparent. For example, emm59 was predominant in New Mexico in 2015 but infrequent elsewhere (https://www.cdc.gov/abcs/reports-findings/survreports/ gas15.html). emm59 was also predominant in New Mexico in 2014 (https://www.cdc .gov/abcs/reports-findings/survreports/gas14.html). In 2015, an emm59 clone had also  (10) or emm cluster (12). For some parameters (e.g., sda1) there was Ͼ25% variability in a given strain complex. Exotoxin genes for which Ն75% of the isolates were positive (spe genes and smeZ are represented by the last letter, and ssa is represented by S) are capitalized. For all of the parameters whose presence (ϩ) or absence (Ϫ) is indicated, the symbols indicate Ն90% positivity or negativity, respectively. b In-frame deletion derivative of precursor emm type. emm227 is a 17-codon deletion derivative of emm1.0. emm151 is an 11-codon deletion derivative of emm49. All emm subtype sequences are available at ftp://ftp.cdc.gov/pub/infectious_diseases/biotech/tsemm/. c Contains a rocA null mutation that is conserved in the lineage of emm3 and emm18 strains indicated (3). d There were six single-locus variants of ST101 (five ST407 and one ST910) that differed from the other isolates in this strain complex in being T11, hasA positive, and Pnga3 negative. e This S. dysgalactiae subsp. equisimilis isolate contains the gacI gene required for group A carbohydrate synthesis (28). Se128 refers to ST128 in the S. dysgalactiae MLST database (https://pubmlst.org/sdysgalactiae/). emerged in Arizona, a non-ABCs state (25). We were unable to directly compare the phylogeny of New Mexico primary and secondary emm59 clades with recent Arizona emm59 isolates (25), since WGS data from these 18 isolates were not provided. However, two genomic sequences closely related to the Arizona cluster were available from two New Mexico isolates recovered in 2011 and 2012 (SRR1574573 and SRR1574608; a single Colorado outlier from this study is also shown) and included in our phylogenetic analysis of ABCs 2015 type emm59 isolates (Fig. 3). The close similarity of the two recently described New Mexico isolates to the 2015 clade 1 isolates (clade 1 is ST864, while clade 2 is ST172 and speC positive, with the exception of 20154161, which was used to root the tree) indicates very close relatedness (compare Fig. 3 with Fig. 2 in reference 25; however, these two Arizona isolates differed from the ABCs 2015 a Abbreviations: Tet, tetracycline; Ery, erythromycin; Cli, clindamycin; Fq, ciprofloxacin and levofloxacin; Syn, quinupristin plus dalfopristin (Synercid); Chl, chloramphenicol. b These 18 isolates consisted of the following. There were 12 tetracycline-nonsusceptible (MICs, 2 to 6 g/ml) group A S. dysgalactiae subsp. equisimilis isolates without detected tetracycline resistance determinants (9 lacked any resistance determinants, and 3 were positive for mef and msrD). There were two erythromycinnonsusceptible isolates (MICs, 0.5 and 1 g/ml) without detected erythromycin resistance determinants (isolates 20152764 and 20161074 lacked detectable resistance determinants). Three tetM-positive isolates (20154605, 20154042, and 20160491) were tetracycline susceptible (two had tetM ϩ , and one had tetM ϩ and ermTR ϩ ). One ermT-positive isolate (20160450) was susceptible to erythromycin and clindamycin. One isolate (ermB lsaC tetM; 20156709) predicted to be intermediately resistant to quinupristin-dalfopristin had an E-test value of 1 g/ml (below the cutoff of 2 g/ml). c Results of phenotypic testing of these 48 isolates are shown in Table 3.
Genetic Features of Current U.S. iGAS Isolates ® New Mexico clade 1 isolates by the presence of speC. The five New Mexico clade 2 isolates from the 2015 ABCs also form a very tight phylogenetic cluster.
Potential emm type switching. There were only four instances of a single S. pyogenes ST (ST28, ST433, ST12, and ST36) associated with more than a single emm type, only one of which, emm82 in the common emm12 genotype ST36, appeared representative of an emm gene switching event (described below). The predominant global emm1 lineage is ST28. The single emm227/ST28 strain is a deletion derivative of emm1 (predicted to lack processed M protein residues 17 to 24). Similarly, the three emm151 isolates were ST433, as expected of a derivative of M49/ST433 with mature M protein residues 3 to 13 deleted. The relationship between the two unrelated emm types, emm29 (one isolate) and emm91 (three isolates), in ST12 is not straightforward. ST12 a Abbreviations: Cip, ciprofloxacin; Lev, levofloxacin.   (Fig. 4). Both strains revealed the same likely crossover point in the isp gene 5.8 kb upstream of emm82 at (base 1682938 relative to the emm12/ST36 reference genome [accession no. CP000259]). The downstream crossover point for the switching event appears to have been in a downstream histidine triad gene approximately 8.4 kbp downstream of the emm82 gene (base 1665610 relative to the emm12/ ST36 reference genome). Following this double-crossover event, we predict that there was homologous excision of the enn82 gene that normally lies downstream of emm82, facilitated by the near sequence identity between the emm82 and enn82 3= regions (see GenBank accession number CP007561 from an emm82/ST334 strain for comparison). All of the emm12/ST36 strains analyzed to date contain a conserved single base deletion predicted to result in a truncated nonfunctional 746-residue protein lacking its fibronectin-binding repeats (26). Both emm82/ST36 strains shared the same highly conserved sof gene shared by emm12 strains (14,26); however, their sof12 allele no longer contained the inactivating single base deletion and was predicted instead to encode a full-length 1,019-residue Sof12 protein inclusive of fibronectin-binding repeats and the C-terminal membrane anchor. Consistent with this observation, the two emm82/sof12 recombinant progeny strains were found to be serum opacity factor positive.
One isolate (20160179), also of major emm12 lineage ST36, was not emm typeable. Surprisingly, this strain lacked both the emm12 and sic genes, corresponding to a precise deletion of a 4,892-bp region between mga and scpA (relative to the recipient strain shown in Fig. 4, bases 1673889 to 1678781 relative to the emm12/ST36 reference genome [GenBank accession number CP000259]). emm types present among multiple genetic backgrounds in S. pyogenes. While only one instance of emm gene switching between different lineages was clear (two emm82/ST36 strains described above), there were 10 additional instances of the same emm type distributed between two and three completely unrelated MLSTs (Table 1). There is insufficient information to track the origins of these different emm/ST combi- nations. An example that shows the genetic diversity in certain emm types is type emm77, which is shared among three unrelated multi-isolate MLST-based lineages. The differences between these distinct MLST-based lineages are reflected in their different genetic features (Table 1). The common emm77.0/ST63 lineage (35 isolates found in nine ABCs states) has long been documented in Germany, Poland, and the United States (https://pubmlst.org/spyogenes/). The emm77.0/ST399 lineage (12 isolates, includes one single-locus variant, ST904) was also found in 2015 in nine ABCs states, with the only known documentation of ST399 being a single emm77 isolate recovered in Thailand (https://pubmlst.org/spyogenes/). Finally, the sole association of emm77/ ST133, shared by seven isolates recovered in Tennessee (Table S1), is actually with the original tee (T) type 5 Lancefield M27 reference strain recovered more than 60 years ago. Briefly, this type was subsequently redesignated emm77 by the CDC emm database curator in the 1990s (B. Beall, unpublished data) because of its emm sequence identity to the prevalent emm77/ST63 lineage and its sequence dissimilarity from the original Griffiths M27 strain, which is the current established type emm27 (13,27).
Surface protein determinants. In general, MCs were strongly associated with specific emm and T types and the presence or absence of additional surface protein genes. These included fibronectin-binding repeat motif-containing genes (sof, fbaA, prtF2, sfb1) (48) and the emm-like mrp and enn virulence genes that flank emm in many strains (9). These two emm-like genes show much less interstrain variation than emm genes and are also virulence factors. All sof-negative S. pyogenes strains were associated with previously described emm cluster A-C or D types (or patterns) (10), with the exception of a single emm15 (pattern E/cluster E3) isolate. Type emm15 is the only cluster/pattern E emm type in this study that is also historically associated with the serum opacification-negative phenotype (13).
GAS pili are important virulence factors that function in epithelial adhesion (reviewed in reference 3. Most (1,388/1,454, 95.5%) isolates had 1 of 21 different pilus (tee) types, corresponding to different pilus backbone protein subunit genes (29) and classical T agglutination types (13). In individual MCs, most tee types (based on 120-to 240-bp gene segment queries) were generally predictive of highly conserved 950-to 1,800-bp open reading frames that shared Ͼ95% sequence identity within the type (data not shown). The single exception included the tee gene of an emm106/ST338 isolate that shared only 82.5% sequence identity with the previously described reference tee3 gene. As previously described (29), tee genes were highly diverse but exhibited differing regions of inter-tee gene homology and all contained signal sequence motifs situated near the 5= ends with SrtB sortase family wall attachment motifs near the 3= ends.
A query for the streptococcal inhibitor of complement (sic) gene (31, 32) employed a short sequence targeting the derivatives found in the emm region of the prevalent emm1/MC28 and emm12/MC36 lineages, which were primarily positive for the query (309/316 for emm1, 114/135 for emm12). In addition, an emm227/ST28 deletion derivative of emm1 (Table 1) and two emm228 isolates (both double-locus variants of ST28) were sic positive.
Exotoxin gene patterns were generally highly associated with GAS emm/ST-defined lineages. For example, this was evident in emm1/MC28, where 306/316 isolates were positive for speA, speG, speJ, and smeZ. Three emm1/ST28 isolates were positive for additional exotoxin genes besides these four. One emm1/ST28 isolate was additionally positive for speK, and another was additionally positive for speC. Finally, emm1/ST28 isolate 20156011 was additionally positive for speC and ssa. Examination of the genome revealed that spd1 (DNase gene), speC, and ssa were tandemly situated in a prophage sequence, as in the highly related HKU488.vir prophage from an antimicrobialresistant emm1 strain causing scarlet fever in Hong Kong (33). Unlike the Hong Kong emm1 strain, ABCs strain 20156011 was susceptible to antibiotics and lacked ermB and tetM determinants.
nga operon markers previously associated with strain emergence. The variant 3 nga promoter (Pnga3) has been associated with increased transcription of the nga operon (4). Pnga3 is a promoter sequence associated with increased transcriptional activity relative to the previously described less active emm89 clade 1 and 2 promoters situated upstream of the genes (nga and slo) encoding the extracellular toxins NADase and streptolysin O (5). The presence of Pnga3 was invariably linked to the putatively active NADase 330G query among the study isolates (4). Consistent with Pnga3 and NADase 330G being associated with transmissibility and virulence, these two features were evident in the three most frequently occurring strain complexes. These three strain complexes (emm1/MC28, emm89/MC101, and emm12/MC36) accounted for 44% (636 isolates) of the entire iGAS sample. Overall, about 56% of the isolates (808/1,454) contained these two nga operon markers. NADase 330G was found in 326 isolates unlinked to Pnga3, with 320 isolates containing the inactive NADase (G330D substitution) and also lacking the more active Pnga3 promoter (Table 1). emm89 emergence. Of the 185 emm89 isolates recovered in 2015, 178 were acapsular (hasA negative) and positive for the previously described clade 3 promoter Pnga3 (5). These data are consistent with recent studies that also employed ABCs emm89 isolates (recovered from 1995 to 2013), as well as emm89 isolates recovered in Finland, Iceland, and the United Kingdom (4,5,35). Recent studies correlated the acquisition of Pnga3 and the acapsular genotype with the increase in infections caused by emm89 GAS recovered through ABCs in the mid-2000s (4,5).
It is interesting that the only emm89 isolates found in ABCs from 1995 to 1999 were serologically T type 11 (Fig. 5). WGS obtained from two T11/emm89 isolates recovered in 1995 and 2000 revealed that both were single-locus variants of emm89/ST101 (emm89/ST407), were hasA ϩ , and contained Pnga1 (5) (Fig. 5). As recently described (36), we found that the emergent clade 3 emm89 also acquired a tee gene distinct from that of clade 1 strains. The T89 genetic marker, which we find to be associated with the serological T13 type, was first detected in 2000 in ABCs emm89 isolates (Fig. 5), associated with less active Pnga2 (5), and was also hasA ϩ (Fig. 5). In our 2015 isolate set, we found that only 6 of the 185 emm89 isolates were of the emm89/ST407/T11(tee11) lineage, were hasA ϩ , and contained Pnga1 (Fig. 5; Table 1), corresponding to previously described clade 1 (5).
sof gene relationships with different emm clusters/patterns, emm-like genes, and MCs. The majority (884/1,454, 60.8%) of the isolates tested corresponded to pattern E strains on the basis of 3= sequences of emm family genes at the mga locus (10). Nearly all of the pattern E strains contained emm genes of different E clusters according to the recently described clustering scheme (12). Only one pattern E emm15 isolate (emm cluster E3) was sof negative (Table 1), consistent with previous emm15 associations (13). There were 39 emm types corresponding to E clusters or patterns. Nearly all of the other strains contained 1 of 18 A-C or D cluster emm types (equating to patterns C and D), all of which were sof negative or historically serum opacity factor negative. Unlike pattern D and E strains, pattern A-C strains were negative for the emm-like genes mrp and enn, with the sole exceptions of the two type emm18 isolates (Table 1).
Serum opacity factor is an important hypervariable virulence factor expressed by a large percentage of GAS strains (37). Positivity for the sof gene fragment query was predictable by previously established associations of sof genes with emm types or of previously established associations of the opacity factor phenotype with M serotypes and/or emm types (13,14). The single exception was that type emm12 strains were positive for the sof query; however, emm12 strains are invariably opacity factor negative according to decades of published data (13). The emm12/ST36 lineage contains a conserved frameshift mutation in sof12 that prematurely truncates the protein (found in all 20 randomly selected emm12/ST36 strains in this study and the CDC M12 reference strain isolated Ͼ60 years ago). Otherwise, the presence or absence of sof completely conformed to previously observed associations (13,14).
A loose definition of MLST groups allowing any ST related by four or more alleles to any other in the group divided the 1,442 S. pyogenes isolates into 15 groups of 2 to 24 STs (10 to 346 isolates each) and 28 "singleton" STs (1 to 35 isolates each) not related to other STs by four or more alleles (Fig. 2). Also indicated are the 12 ST128 (S. dysgalactiae MLST scheme) S. dysgalactiae subsp. equisimilis strains included in the 2015 ABCs. It is striking that 14 of the 15 groups consisted solely of either sof-positive or sof-negative strains, with the exception of group 4, which consisted of 136 sof-negative isolates and 3 sof-positive isolates. These three sof-positive isolates consisted of the two unusual emm switching emm82/ST36 strains (described above and in Fig. 4) and the single emm113/ST148 isolate. The single ST148 isolate recorded at https://pubmlst.org/ spyogenes/ was an emm113 strain recovered in New Zealand in 1997.
Contrasting seasonality of infections shown by sof-negative and sof-positive strains. Recently, it was observed that infections due to emm AC cluster (or pattern A-C) strains peaked in the winter (first quarter, from January to March), while E cluster  (4,5) associating the emergence of emm89 with the appearance of the acapsular clade 3 strain containing the upregulated nga operon promoter Pnga3 and a different tee gene (36). According to our data, the clade 3 strain is T serotype 13 (corresponding to the tee89 gene) and associated with the decline of the T serotype 11 (corresponding to the tee11 gene) clade 1 emm89 strain (5) that expressed capsule and contained the less active nga promoter Pnga1. Although only two T11 (tee11) isolates (recovered in 1995 and 2000) and one T13 (tee89) isolate (recovered in 2000) were sequenced in this study prior to 2015, where all 185 isolates were sequenced, our data are entirely consistent with the previous reports and publicly available WGS data that include a large number (870) of these isolates recovered from 1995 to 2013 (4,5,36). Compare Fig. 1 in reference 36, which describes the international emergence of clade 3 emm89 to the data presented here.
(or pattern E) strain cases were disproportionally represented in the summer (third quarter, from July to September) (15). Figure 6 shows that the seasonal relationship of emm clustering (or emm locus patterns) is reflected by the presence or absence of an active sof determinant. Among the S. pyogenes isolates studied, sof-negative isolates accounted for only 38.8% (559/1442) of the total yet accounted for 45.5% (199/437) of the S. pyogenes cases in quarter 1 (P Ͻ 0.0005). In quarter 3, sof-negative isolates accounted for only 29.2% (78/267) of the cases (P Ͻ 0.0005). This marked fluctuation of sof-negative iGAS incidence between quarters 1 and 3 contrasts with the relatively stable incidence of sof-positive iGAS in these periods (Fig. 6).
As with emm clusters and emm patterns, the presence or absence of sof is nearly always predicted by identification of the emm type (13,14). We used emm typing to determine that emm type-based predictions of sof presence/absence resulted in the same seasonality pattern in 2012 to 2014 that was seen in 2015 on the basis of the actual presence of an intact sof gene (Fig. 6).
Invasive group A S. dysgalactiae subsp. equisimilis. The ABCs program is based on the identification of iGAS isolates without identification to the species level. Almost all of the isolates (1,442/1,454; 99.2%), including 12 S. dysgalactiae subsp. equisimilis isolates, reported to ABCs in 2015 were gacI positive, which is predictive of group A carbohydrate production (28). A single gacI-negative isolate of emm type stG643.0 was subsequently found to be serogroup G S. dysgalactiae subsp. equisimilis and was removed from the study. Twelve S. pyogenes isolates (0.8%) were negative for the gacI query; however, these were found to be serogroup A.
All 12 group A S. dysgalactiae subsp. equisimilis isolates identified through phylogenetic analysis (38) were of ST128 according to the S. dysgalactiae MLST scheme at https://pubmlst.org/sdysgalactiae/. The recovery of these 12 ST128 isolates of three different emm types from four different states suggests that this is a long-standing group A lineage of this species. The single group G 2015 ABCs isolate of this species was found to be ST48 (S. dysgalactiae MLST scheme). Analysis of the gac (group A carbohydrate) operon from these strains revealed a hybrid structure with an upstream crossover point in gacE and a downstream crossover point in the second open reading frame immediately downstream of the gacA-gacL operon (data not shown). This approximately 11,500-bp recombinational fragment apparently originating from S. pyogenes corresponds to coordinates 609389 to 620916 of the S. pyogenes sequence with GenBank accession number CP000017. This fragment encompasses the gacI, gacJ, and gacK genes, which shared 99.4 to 99.7% sequence identity with counterparts in S. pyogenes. These three genes were recently shown to be essential for expression of the immunodominant N-acetylglucosamine side chain of the Lancefield group A carbohydrate (28).

DISCUSSION
While emm typing and antimicrobial resistance phenotyping have served as the basis of ABCs iGAS strain surveillance for the past 2 decades (2,43,44), the addition of WGS-based strain characterization to this population-based surveillance system encompassing nearly 34 million individuals provides much more insight into underlying strain features and strain emergence. We found in invasive GBS that PBP2x typing was actually more reliable and sensitive for detecting first-step mutations leading to ␤-lactam nonsusceptibility (17), and having this system in place for iGAS allows us greater vigilance for this potential threat. We now see that ermT, discovered in GAS only in the last decade (19), actually accounts for the major percentage of emerging GAS resistance to macrolides and lincosamides. Through our current WGS pipeline data, we have several additional parameters to evaluate in association with disease manifestations, virulence, and as vaccine components. Nearly all (~99%) of the study isolates would be covered by a combination M-Mrp vaccine (7)(8)(9), with more than half of this isolate set putatively targeted by both vaccine components. This is an important observation, since recent work indicated that the combination vaccine would provide more effective opsonization than either vaccine alone (9). We were able to quantitate MLST-defined diversity and to determine the extent of emm type switching in the same manner that pneumococcal strains have been assessed for capsular serotype switching in the past 15 to 20 years. From the results shown here, it appears that emm type switching is rare and might not be a significant immune escape mechanism should an M protein-based vaccine be implemented. In the entire sample set, we detected only one example of a past switching event, represented in two isolates, where the emm12 gene in the ST36 genetic background was replaced with the emm82 gene. In addition, we detected only one emm-negative iGAS isolate (also in the ST36 background), consistent with the M protein's historical role as an essential virulence factor. Nonetheless, the detection of these three unusual invasive isolates does present the possibility that such variant strains could emerge as successful pathogens in the presence of selection exerted by an M protein-based vaccine.
Increased documentation of GAS strain parameters may hasten the understanding of features that affect pathogenic potential. In particular, the association of the three major iGAS lineages (emm1/ST28, emm89/ST101, emm12/ST36) with an upregulated nga operon is compelling, especially when this feature directly correlated with the marked emergence of emm89 in ABCs isolates over the past decade (4,5,36). Individual strain parameters may provide greater understanding of GAS tissue tropism and disease manifestations. For example, emm28 has been shown to be significantly associated with postpartum iGAS infections (39,40). Vaginal tissue tropism could be influenced by expression of the R28 determinant, detected primarily in the emm28 isolates in this study. This possibility is further suggested by the existence of a close R28 homolog in group B streptococci that commonly colonize the vaginal epithelium (30). The acquisition of the tee89 gene in emergent clade 3 emm89 may have conferred new functional adherence or immune evasion properties (3,36,41). The recent increased superantigen complement in emm1 subclones described in China is reason for increased awareness of the enhanced virulence potential imposed by already impactful iGAS strains, wherein emm12 strains facilitated the horizontal transfer of scarlet fever-associated mobile elements carrying speC and ssa to the emm1/ST28 lineage (33). In our strain set, we observed a single emm1 isolate (20156011) that was positive for speC and ssa in addition to the usual emm1/ST28 superantigen complement (speA, speG, speJ, speZ) that was situated on a prophage highly related to previously described HKU488.vir (33).
It is very interesting that in the two recombinant (emm type switching) emm82/ST36 strains, the normally inactive sof12 gene reverted to an active allele upon the insertion of a single nucleotide. This observation is compatible with previously established emm type associations with sof (13,14). It is plausible that some biologically defined barrier prevents the presence of an active sof gene in association with cluster A-C and D emm types. The reverse association also seems to be indicated, in that the combination of an active sof gene and most cluster E emm types might be essential for strain success. The association of the emm type with the presence or absence of the multifunctional sof virulence gene (37) appears to have an underlying clonal basis, since MLST divides isolate sets into defined sof-negative and sof-positive groups. The observed differences in seasonality between sof-negative and sof-positive strains could be based on the presence or absence of sof or could be based on other, unknown, clonal features.
The appearance of a specific group A lineage of the diverse subspecies S. dysgalactiae subsp. equisimilis widely spread among different ABCs sites is indirectly indicative of the considerable disease burden attributable to this subspecies (42), which is almost always associated with group C or G carbohydrates (38). The acquisition of the ability to express the group A antigen, itself a virulence factor (28), is reason for continued close monitoring of this iGAS subspecies. We provide evidence here that the ST128 iGAS lineage arose through a single interspecies gene replacement event. The association of group A S. dysgalactiae subsp. equisimilis ST128 with three distinct emm types is indicative of a successful longstanding lineage. We have previously shown that the association of multiple emm types in a single ST is not unusual in S. dysgalactiae subsp. equisimilis (38), although our data indicate that it is extremely rare in S. pyogenes.
An important aspect of WGS-based strain surveillance in ABCs is the ability to deduce close temporal and geographic relatedness between GAS isolates. The predominance of emm59 in New Mexico in 2015 shows the potential of the use of WGS to elucidate disease transmission patterns and therefore to potentially guide efforts to control disease. We are working toward faster identification of such clusters in ABCs and trying to identify potential outbreaks for which public health intervention may be effective.
To summarize, through WGS, we have examined several aspects of iGAS strains that we were previously unable to explore in a systematic population-based manner. We provide our basic WGS genetic data in association with the genomic accession data from a full year (2015) of ABCs isolates, along with lab identifiers, in Table S1. These isolates and some accompanying epidemiological data can be acquired at https://www .cdc.gov/abcs/pathogens/isolatebank/overview.html for further investigation.

MATERIALS AND METHODS
Isolates. ABCs conducts active laboratory and population-based surveillance for iGAS infections (including necrotizing fasciitis, streptococcal toxic shock syndrome, and other infections associated with GAS isolated from a normally sterile site) in geographic areas of 10 states, representing 33.7 million persons. The 1,454 available isolates, representing 89.6% of the cases that occurred in 2015, were subjected to WGS and antimicrobial susceptibility testing. Key features of ABCs iGAS surveillance from 1997 to 2015 have been described previously (2, 43, 44; https://www.cdc.gov/abcs/reports-findings/surv -reports.html).
Whole-genome sequencing. GAS chromosomal DNA preparation, library construction, and WGS generation for the 1,454 isolates were performed as previously described (16).
Conventional MIC determinations. Isolates were subjected to broth dilution testing (BDT) for determination of MICs with the panel previously described for GBS that included a well containing both erythromycin and clindamycin to detect inducible clindamycin resistance (17). Discordant results where WGS-based predictions differed from BDT results by Ն2 dilutions (Ն4-fold MIC differences) were retested by E test as described by the manufacturer (BioMérieux) or by D test (20).
Serum opacity factor determination. Serum opacity factor determination was performed with bacterial supernatants from specific isolates as previously described (45).
WGS GAS typing pipeline. Bioinformatics methods are described and updated at https://github .com/BenJamesMetcalf. emm subtypes were obtained on the basis of a database of defined 180-bp sequences maintained at the CDC (ftp://ftp.cdc.gov/pub/infectious_diseases/biotech/tsemm/). This subtyping scheme is based on a sequence that consists of 10 codons corresponding to the C-terminal end of the M protein signal sequence and 50 codons corresponding to the N terminus of the mature M protein (46). The WGS emm typing scheme employs de novo assembly and queries sequences closely linked to 21-bp emm typing primer 1 (27) situated adjacent to the emm type-specific region.
MLST. MLST relied upon SRST2 and the database at http://pubmlst.org/spyogenes/. MLST-based CCs and groups. MCs were defined as isolates sharing at least five alleles with the reference ST, which represented the major ST found in an emm type. An eBurst (53) group was defined as an ST set where each member shared at least four alleles with one or more other members of the set.

Statistical analyses.
A chi-square test was performed to evaluate differences in seasonality between sof-negative and sof-positive groups.
Accession number(s). Accession numbers for the 1,454 fastq files used in this work are provided in Table S1, along with lab identifiers, WGS-generated genetic data, and quality metrics.