Improved Subtyping of Staphylococcus aureus Clonal Complex 8 Strains Based on Whole-Genome Phylogenetic Analysis

Staphylococcus aureus is a major human pathogen worldwide in both community and health care settings. Surveillance for S. aureus strains is important to our understanding of their spread and to informing infection prevention and control. Confusion surrounding the strain nomenclature of one of the most prevalent lineages of S. aureus, clonal complex 8 (CC8), and the imprecision of current tools for typing S. aureus make surveillance and source tracing difficult and sometimes misleading. In this study, we clarify the CC8 strain designations and propose a new typing scheme for CC8 isolates that is rapid and easy to use. This typing scheme is based on relatively stable genomic markers, and we demonstrate its superiority over traditional typing techniques. This scheme has the potential to greatly improve epidemiological investigations of S. aureus.

strains in clonal complex 8 (CC8) (1)(2)(3). CC8 methicillin-susceptible S. aureus (MSSA) strains are also common agents of infection (4)(5)(6). Lineages within CC8 include the major so-called epidemic "clones" USA300, USA500, Archaic, Iberian, and the lineage identified by multilocus sequence typing as sequence type 239 (ST239) (7). ST239 is an HA lineage with distinct populations distributed throughout Asia, in Eastern Europe, South America, and Australia (1,8,9). ST239, a hybrid of strains ST8 and ST30 (10), is often classed in CC30, given its distant relationship to the rest of CC8 and its spa gene type similarity to ST30 isolates. The Archaic (ST250) and Iberian (ST247) strains are also HA; the Archaic clone was widespread in parts of Europe decades ago; however, it has largely disappeared with the appearance of other more antimicrobial-resistant CC8 lineages such as USA500 (11). Additional strains have also emerged and waned over time-often in geographically limited areas (12,13) (e.g., the Hanover clone, ST254), adding to CC8's epidemiological complexity. The CA-MRSA strain USA300 emerged clinically only around 2000 and is currently the most prevalent pathogenic strain circulating in the United States (2,3). Despite its relatively recent emergence, USA300 has diverged into lineages distinct from its early branching ancestors (called the "Early Branching" clade here) (14). USA300 variants that lack the arginine catabolic mobile element (ACME) characteristic of USA300 strains have more recently been isolated in South American countries. These USA300-LV (for Latin American variant) isolates instead carry a copper and mercury resistance cassette, COMER, and were shown to belong to a monophyletic "sister" lineage, named USA300-SAE (for South American epidemic strain), to the USA300 North American epidemic (USA300-NAE) strain (14).
Distinguishing among the sublineages of CC8 is critical for purposes of epidemiology and surveillance, especially as the epidemiological separation between HA and CA strains disappears (1). Although strain-typing techniques have improved over time, they still have many limitations. Pulsed-field gel electrophoresis (PFGE), the method by which the "USA" strains were originally defined (15), is laborious, and determination of a strain type can be subjective. Heterogeneity in banding patterns and discordance with other typing methods is not uncommon (16). Sequencing and interpretation of the spa gene is relatively expensive, and spa types are not always consistent with evolutionary lineages (8,(16)(17)(18)(19). Furthermore, PFGE and spa typing alone are often not able to distinguish among sublineages within CC8 or other clonal complexes (20). Currently, many laboratories use PCR typing that targets factors located on mobile genetic elements: e.g., Panton-Valentine leukocidin (PVL) genes, ACME genes, enterotoxin genes, and the SCCmec variants.
Confounding the issue is the multitude of names given to a strain type (21) and the use of one name for divergent strain types. The Iberian name identified a CC8 lineage that circulated in Europe in the 1990s, ST247 (22), but due to some shared genetic elements used for strain typing, "Iberian" has been used more recently to identify an ST8 strain closely related to USA500 (4). This confusion extends to the phylogenetic relatedness among the major strain types in CC8. Relatively imprecise methods of strain characterization and lack of consistency with regard to reference isolates have resulted in variation in the classification of the CC8 sublineages. Most strains were originally defined and deposited in repositories prior to the routine use of whole-genome sequencing (WGS) and WGS-based phylogenies, and relatedness to these type strains was inferred based on various criteria, resulting in inconsistent application of strain nomenclature. An influential study by Li et al. (7) on the evolution of virulence in CC8 suggested that USA300 is a lineage derived from USA500. In that study, the authors identified a now widely used set of genetic markers to distinguish between USA500 and Iberian strains, using a USA500 reference isolate called BD02-25. Two recent studies refuted the idea that a USA500 strain is the progenitor to USA300 using different USA500 isolate genomes as references: Jamrozy et al. (23) (15) (deposited at BEI Resources as USA500, catalog no. NR-46071). We postulate that not all of these isolates belong to the same phylogenetic clade, although they were previously described as the same strain, USA500.
In this study, our first goal was to closely examine the cladistics of CC8 with whole-genome sequence (WGS) data, illustrating the issues that have arisen from lack of consistency in type nomenclature, with the hopes of more clearly defining CC8 sublineages. Second, as significantly fewer studies address MSSA than MRSA, our goal was to gain a better understanding of MSSA's role in CC8 epidemiology and its placement in the CC8 population structure. Our third goal was to develop a rapid and simple, yet robust strain-typing scheme based on more stable genomic markers: i.e., real-time PCR assays targeting canonical single nucleotide polymorphisms (canSNPs) or SNPs that define a lineage (20,26). We hypothesized that as S. aureus shows clonal evolution with little evidence of recombination within lineages (8,(17)(18)(19), we could identify canSNPs from our CC8 phylogeny to target each of the major lineages, including the widely circulating USA300 subtype USA300-0114, an oft-cited etiologic cause for MRSA clusters (27). A canSNP-based approach eliminates the lineage confusion seen with PFGE, spa typing, and mobile genetic marker typing, as SNPs are relatively stable and quantify relatedness among strains.

RESULTS
Whole-genome phylogenetic analysis. The overall S. aureus phylogeny ( Fig. 1) shows the context of CC8 among other S. aureus lineages and shows that the CC8 strains in this tree all belong to one of three main lineages, ST239 (the HA SCCmec type III-carrying MRSA), ST630 (a lineage that branches off basal to the rest of CC8 and comprises five MSSA strains), and "Inner CC8" comprising the other known lineages. Table 1 shows common characteristics of these strain types. This phylogeny comprises 1.84 Mb shared by each genome and includes large regions exchanged among lineages that resulted in hybrid strains (e.g., ST34 and ST42 of CC30 and ST239 [10]). This tree, therefore, illustrates sum total relationships among lineages within S. aureus rather than within-lineage evolutionary history, as removal of these regions would imply a closer than actual relationship between a hybrid strain and one of its parent lineages.
The topology of our Inner CC8 SNP-based phylogeny (excluding ST239 and ST630) comprising 348 genomes is similar to those reported recently (23,28), showing multiple, distinct nested clades, with MSSA (orange branches) interspersed among the MRSA isolates ( Fig. 2; Table 1). CC8a, which includes the Archaic and Iberian strains, is the most basal CC8 lineage, which supports the early circulation then disappearance of this lineage over time. All but one MRSA strain in CC8a carry SCCmec type I. To our knowledge, CC8b has not been characterized previously and contains the old strain NCTC 8325 and the Brazilian vancomycin-susceptible and resistant S. aureus isolates BR-VSSA and BR-VRSA, respectively, thought to be closely related to USA300 due to their carriage of SCCmec subtype IVa (29). The majority of the isolates in this clade are MSSA, a few of which carry ACME (suggesting previous SCCmec carriage [30]) or sea, and one of which has the PVL genes. Our phylogeny also shows that isolates known as USA500 fall into two distinct clades separated by CC8d, the Canadian HA-MRSA lineage, CMRSA9 (31): clade CC8c contains NRS385 (15) and BAA-1763 (ATCC), while group CC8e contains BD02-25 (7). This suggests that the CMRSA9 strains might be defined as USA500 by traditional typing methods. The CC8c clade includes an apparent rapidly expanded lineage (containing BAA-1763), illustrated as shallow branches with low bootstrap support, and several of these isolates were collected in Georgia in the United States. This clade is now known to be an epidemic lineage in Georgia (see the companion paper by Frisch et al. [32] and Fig. S1 in the supplemental material).
Our data support the idea that USA500 in CC8e and USA300 share a direct common ancestor (Fig. 2). The WGS phylogeny indicates that the PVL genes were acquired by an early branching USA300 (14) ancestor (nested within CC8e) and passed down to the USA300 lineage, as most USA300 strains carry PVL, including USA300-SAE (14). As a predictor of USA300, the PVL genes have high sensitivity (97%) and specificity (99%) in our data; however, these genes are not confined to CC8 (see Table S1 in the supplemental material). The phylogeny also confirms that ACME was acquired by the USA300-NAE ancestor and passed vertically, as noted previously (14). ACME is present in six MSSA isolates in CC8f. As ACME is closely associated with SCCmec (30), Fig. 2 suggests at least four losses of SCCmec while retaining ACME. Spread across the CC8f USA300-NAE clade are 80 subtype USA300-0114 isolates interspersed with 41 non-0114 isolates, indicating that this important PFGE pattern subtype (27) is not a distinct lineage. Therefore, 0114 strains cannot be phylogenetically distinguished from other USA300 strains, and no canSNP marker can differentiate the 0114 strain type from non-0114 strains.
The incorporation of a significant number of MSSA genomes in the CC8 phylogeny makes it apparent that MSSA was the founder of several of these CC8 strains. A majority of CC8b is MSSA, and the five MRSA isolates in this clade carry four different SCCmec types, suggesting independent acquisitions of the SCCmec cassettes and that much of CC8e remains or has reverted to MSSA. The mostly MRSA clades are each dominated by a single, different SCCmec type, indicating acquisition by the common ancestor to the clade, except in the Early Branching USA300 group, in which several different SCCmec types exist. All SCCmec types in the Early Branching USA300 group, however, are SCCmec IV subtypes. The MRSA strains in this clade could be a result of one acquisition event followed by recombination (33), or several separate SCCmec acquisitions. USA300-SAE comprises two SCCmec types, IV and IVc; however, it is not clear whether the typing schemes used always included an IVc subtype test. Although USA300-SAE is made up entirely of MRSA strains, this could be a sampling artifact. Besides their importance in CC8b and CC8e, MSSA genomes are interspersed with the MRSA genomes throughout CC8. The appearance of MSSA dispersed across the CC8 phylogeny supports the idea that the SCCmec cassette is highly mobile and upholds the notion that MSSA plays a principal role in S. aureus evolution and pathology.
GMI typing. In general, the clades ( Fig. 2) correlate with strain type inferred by the genetic marker inference (GMI) method (see Materials and Methods and Fig. 3), except in the case of CC8b and the USA500 groups. CC8b comprises mostly CC8-Unknown isolates, owing to our unfamiliarity with this clade and the clade's makeup as mostly MSSA, for which GMI typing uses few markers. Using GMI, USA500 appears in two separate lineages, CC8c and CC8e, while GMI USA500/Iberian isolates fall within CC8c, with only one exception in CC8a. CC8c also contains two GMI USA300 isolates (positive for SCCmec subtype IVa) and three CC8-Unknown isolates. In group CC8e, the GMI typing was not consistent with WGS. For the 12 (out of 14 total) MRSA isolates characterized as USA500 by GMI, 5 were in the USA500 group that includes BD02-25, the reference USA500 isolate from Li et al. (7), 5 were in the Early Branching USA300 group, and 2 were in the USA300-SAE clade. Among the 37 isolates in CC8c, GMI inferred 16 as strain type USA500, while 15 were called USA500/Iberian. Evidence is inconclusive whether the CC8c lineage inherited the sea and seb genes from an Archaic or Iberian ancestor. The sea and seb genes are presumably frequently gained or lost. The mixture of enterotoxin-positive and -negative isolates in one lineage and the presence of sea or seb in other clades demonstrate that the sea and seb markers are not reliable indicators of a phylogenetically related group.
Within CC8f, almost all of the isolates typed by GMI in this study (157 of 158) typed as USA300 (Fig. 2). Only one isolate within CC8f was inferred as CC8-Unknown, an MSSA isolate lacking both PVL and ACME genes. Two other isolates in CC8f are negative for PVL, but both carry SCCmec subtype IVa and thus were typed as USA300. Four of the GMI-typed isolates outside CC8f carry SCCmec IVa: two in CC8e and two in CC8c. Interestingly, the addition of publicly available sequence data added eight genomes that carry SCCmec IVa that fall into CC8e in the early branching USA300 group and two that fall in the CC8b clade (BR-VSSA and BR-VRSA). Although SCCmec subtype IVa has been a hallmark of USA300-NAE, it is clear this trait is not unique to USA300-NAE. CC8e contains 10 isolates called by GMI as USA300. Besides the two that carry SCCmec subtype IVa, the remaining isolates are MSSA and PVL positive, which is the only criterion other than PFGE used to determine a USA300 strain type in MSSA (Fig. 3). Assay screening. The phylogenetically informative canSNPs identified using the genomic data presented above and used to design the assays are represented in Fig. 2.  All assays ( Table 2) can be used as stand-alone typing assays for any S. aureus, except for the CC8b assay, which must be used in combination with either the CC8 assay or the "Inner CC8" assay to confirm the phylogenetic placement of an isolate. Although the allelic state that the CC8b assay targets is unique within CC8, some isolates outside CC8 share this SNP state with the CC8b isolates-possibly due to recombination; therefore, an isolate positive for the CC8b SNP state should be screened across the CC8 or Inner CC8 assay to confirm (or refute) that it falls in CC8b.
Each assay was first validated across a set of isolates used to generate the original phylogeny (WGS followed by SNP assay in Table S1). In short, the SNP assays performed well and their results always agreed with the phylogeny (Table S1). A second set of 208 isolates that had not been sequenced was then screened, and results from here onward refer to this second set. Here, 144 MRSA and 64 MSSA isolates were compared between GMI and the SNP assay panel ( Table 3). Out of the MRSA samples, both methods' distinctions between CC8 and non-CC8 isolates were in full agreement; the PFGE/spa strain typing matched the CC8 SNP assay, where 114 isolates fell within CC8 while 30 were outside. Out of the MSSA samples, 61 were in agreement that all were CC8, but 3 isolates called CC8-Unknown by GMI were non-CC8 by the SNP assay ( Table 3).
Comparison of subtyping results within CC8 by GMI and the SNP assay panel gave fairly concordant results for MRSA isolates ( Table 3). Out of the 114 CC8 isolates screened, 93 fell into their expected clade. Of the other 21, 11 were USA500 (SCCmec type IV, negative for sea and seb genes) and 2 were CC8-Unknown by GMI and typed as CC8c by the SNP panel. Eight isolates were typed as a strain for one method for which there was no assay by the other method: seven were CC8-Unknown by GMI and CC8a by the SNP panel, and one was CMRSA9 by GMI and CC8-Other by the SNP panel. Six of the seven CC8a MRSA isolates were collected in the 1960s and were SCCmec type I positive. This is the SCCmec type observed in the first Archaic and Iberian strains (11) ( Table 1), but as these strains seem to have disappeared from circulation, the GMI approach does not account for them. For the 57 isolates typed as USA300 by GMI, all were typed in CC8f as expected ( Table 2). All USA500/Iberian isolates by GMI were typed as CC8c by the SNP panel, and although testing was limited, all four ST239 isolates were concordant between the two typing methods. For MSSA, 45 of the total 64 isolates were typed as CC8-Unknown by GMI. These 45 by the SNP panel were typed as CC8f, CC8e, CC8c, non-CC8, or CC8-Other. No MSSA isolates were typed as non-CC8 by GMI, although three were by the SNP panel (Table 3).
A subset of isolates (n ϭ 71) were sequenced and added to the CC8 or S. aureus overall phylogeny to determine their true strain type ( Rapid Typing of Staphylococcus aureus CC8 Strains supplemental material). All samples in agreement between the two tests also agreed by WGS phylogenetic analysis (n ϭ 7). For MRSA, the 11 samples called USA500 by GMI that were CC8c by the SNP panel were all typed as CC8c in the phylogeny. CC8-Unknown (GMI)/CC8a (SNP panel) isolates, of which five of the six typed in this study were sequenced, all fell into CC8a. Of the 45 MSSA samples that were labeled as CC8-Unknown by GMI, all the strain types called by the SNP panel were corroborated by phylogenetic analysis. The three non-CC8 isolates fell outside CC8 and were sequence typed as ST6. Of the four CC8-Unknown (GMI)/CC8-Other (SNP panel) isolates, two were sequence typed as ST630 (Fig. 1). The other two diverged after CC8b but before CC8c in the phylogeny (one of which is shown in Fig. 2), confirming that both GMI and SNP assay methods were correct but creating previously unseen lineages. It is likely that as we sequence more S. aureus strains, especially more MSSA strains, we will see additional CC8 lineages and a more complex CC8 tree topology develop.
Overall, the SNP assays were 100% specific and sensitive on the set of unknown isolates, according to the phylogeny generated through WGS; this result is expected due to the stability of SNPs. The genetic marker inference assay performed fairly well, except in the cases of the USA500 and USA500/Iberian types, as well as for MSSA isolates where the only genetic marker for CC8 subtyping was the PVL genes.

DISCUSSION
S. aureus remains an important pathogen in health care institutions as well as in healthy populations in the community. CC8 strains are among the most prevalent in both environments, especially USA300, and each sublineage has different clinical and pathological characteristics (1,11,25,34,35). Strain typing of S. aureus is important because of these phenotypic differences and their implications for virulence potential, and tracking strains and their prevalence in a health care system or network informs epidemiology and infection control practices to help focus resources effectively. Unfortunately, typing is not a routine practice in clinical microbiology laboratories, in part because of the cost, time, and expertise required, as well as the frequent inconclusiveness of results. PFGE, spa typing, and multilocus sequence typing (MLST) often do not provide the scale of resolution required to determine relationships among a given set of samples, and the presence of particular virulence factors-often located on mobile elements-can be misleading (16). The simple typing system we have developed here, based on presumably stable canSNPs, allows for wide use in clinical laboratories for robust tracking of both MRSA and MSSA infections. Additionally, this method can rapidly and inexpensively assess the possibility of an outbreak or transmission event. Isolates of the same strain type should be investigated further (by WGS), while isolates of different strain types would preclude an outbreak or transmission event, which is just as important (36).
The S. aureus CC8 strain nomenclature, including Iberian, Archaic, USA500, and USA300, was originally based on PFGE typing schemes that used an 80% banding pattern similarity threshold to classify isolates (15). Although adopted for tracking purposes, the continuous evolution and diversification of S. aureus over the years have rendered PFGE a misleading tool for this application. Strains that are within 80% banding pattern similarity may belong to multiple genetic lineages, as shown in this study. USA500 comprises at least two well-established lineages (see the companion paper by Frisch et al. [32]) and may encompass the Canadian CMRSA9 lineage. Strain BD02-25, called USA500 by Li et al. (7) and currently the CDC's USA500 reference isolate (L. McDougal, unpublished observation), is not in the same lineage as strains NRS385 (the USA500 reference in the article by McDougal et al. [15]) and ATCC BAA-1763, although it is Ն80% similar, suggesting USA500 encompasses a wider genomic range  than previously appreciated. Additionally, NRS385 and BAA-1763, which are sea and seb positive, share their clade with several isolates negative for these genes, which were used in the GMI typing scheme. It is necessary to exercise caution in interpretation of typing via mobile elements, as their sensitivity and specificity are not ideal. Likewise, the GMI typing system, although sensitive and specific for USA300-NAE, has limitations. The presence of SCCmec subtype IVa can be used for MRSA but not MSSA isolates, and we show that SCCmec subtype IVa is often found outside USA300-NAE. The presence of PVL, apparently vertically passed to USA300 from its progenitor (19), is a good predictor of USA300, as shown in other studies (16) as well as this one. However, the sequencing of the Early Branching USA300 and USA300-SAE genomes shows that PVL is inclusive of these newly understood strains and not specific to the highly clonal USA300-NAE (14). Also, we show that MSSA isolates are easily mistyped this way, and PVL is found in other CC8 strains as well as other clonal complexes (16,(37)(38)(39). The topologies of several whole-genome phylogenies recently generated for CC8 are in agreement (23,25,28), despite the differences in interpretations. Li et al. concluded that the USA500 strain is the progenitor of the widespread USA300 strain. Recent studies show that genomes labeled as USA500 fall into a more distant clade from USA300 (CC8c) but that there is an additional clade that shares an ancestor with USA300 (23,25). We show here that both of these clades contain USA500 and surround the CMRSA9 clade, suggesting CMRSA9 might be considered a USA500 strain. By traditional typing methods, USA500 and other strains named for pulsed-field patterns do not represent monophyly. Future studies should note that different lineages contain "USA500" strains and use WGS phylogenetics or the assays presented here (or the SNPs they target) for strain typing within CC8. The importance of MRSA is well known. MSSA, on the other hand, continues to have a critical impact on public health (5, 6, 40) and remains understudied. MRSA evolution evidences local selection and spread of particular strain types originating from successful MSSA lineages (12), and we demonstrate this within the CC8 lineage. Additionally, diverse MSSA strain types appear to be ubiquitous (6,19,41), and we show that MSSA strains are present in every major CC8 clade, advancing our understanding of the highly significant role that MSSA plays in S. aureus population structure. Importantly, MSSA may ultimately prove more of a challenge to clinically manage, as infection prevention measures targeting particular strain types of MRSA will be less effective against the more diverse MSSA strains (6). The MSSA strains in CC8 are interspersed with MRSA, further evidencing the significant mobility of SCCmec (12). Other species of Staphylococcus are likely active reservoirs of SCCmec, including the SCCmec subtype IVa strains characteristic of USA300 (42). The human carriage rate of SCCmec-positive, coagulase-negative staphylococcus (CoNS) can be relatively high, and cocolonization of MSSA and SCCmec-positive CoNS has been observed (42). Regardless of the directionality of SCCmec exchange among species and strains of Staphylococcus, the rate of SCCmec acquisition and/or excision may be higher than previously believed, and isolation of only MRSA in health care settings will not reveal the entire potential for MRSA carriage or infection.
Additionally, characterization of only MRSA isolates in CC8 (i.e., sampling bias) will give an incomplete evolutionary history of this important clonal complex. In our CC8 phylogeny, MSSA genomes add lineages not represented by MRSA alone, consistent with previous findings in CC8 (19). In our collection, ST630 comprises strictly MSSA isolates. ST630 may be an emerging strain of S. aureus, especially in China, where recently it reportedly caused a bloodstream infection (as MRSA) (43), endocarditis in a healthy person (as MRSA) (44), and several skin infections (as MSSA) (43,45). CC8b comprises mostly MSSA, and the three MRSA strains appear to have emerged separately from different MSSA strains. This clade includes NCTC 8325, a strain isolated in 1943. The ancestor of CC8b diverged early in CC8 evolution, like the Archaic lineage. While the Archaic lineage expanded with SCCmec type I and has since apparently declined, CC8b does not appear to have acquired and maintained SCCmec, yet contains extant members that cause disease (included in this study). The study and WGS of more MSSA isolates will likely add complexity and clarity to the story of CC8 evolution.
Almost all of the USA300 isolates fall into a distinct clade with distinct features. PFGE profiling of USA300, which was not performed on many isolates in this study, in contrast with our genetic marker-inferred typing, may indeed be 100% concordant with our USA300 SNP-based assay currently. However, USA300 is a relatively young "clone," and as more S. aureus lineages develop, a PFGE profiling system using similarity thresholds may soon prove obsolete as it has for other strains and species (46)(47)(48). Furthermore, we demonstrate that the PFGE type USA300-0114 is not a "clone" in the phylogenetic sense, as 0114 isolates do not form a monophyletic clade with a common ancestor as was previously believed (49). WGS is irreplaceable to determine if strains of the USA300-0114 PFGE type are part of a single outbreak.
The declining costs and increasingly common use of WGS and phylogenetic analysis allow for discovery of more phylogenetically informative and stable targets that can be used in rapid, relatively simple assays (28,50,51). Several advantages to the use of lineage-specific canSNPs as targets include the following: (i) they show stability over time, as they are passed vertically through generations, (ii) different SNPs provide different scales of resolution for identifying particular strains (e.g., a CC8-specific SNP versus a USA300-specific SNP) or even species in a given set of samples (51), or for use in global epidemiology (52), regional epidemiology (53), or local cluster analyses (36), and (iii) identification of canSNPs is a straightforward process using whole-genome sequence data and publicly available SNP matrix generators (e.g., NASP [50]), followed by parsing the SNPs by sample sets of interest. Here we use real-time PCR assays targeting canSNPs based on WGS to classify isolates into clear evolutionary lineages of CC8, and we illustrate their robustness (working with crude bacterial lysates) and high sensitivity and specificity. Inclusion of assays for SNPs on other branches in a hierarchical fashion, as we have done here, adds confidence to any typing scheme. The hierarchical scheme also provides opportunity to screen clinical or other complex specimens, which may harbor multiple strain types. Although WGS and phylogenetic analysis are irreplaceable in true outbreak situations, WGS is still relatively timeconsuming and the analysis is complex. Robust real-time PCR assays can screen for isolates that may need further investigation with WGS. While WGS gains a foothold in both the public health laboratory and clinical laboratory, real-time PCR is a rapid, robust, easy, and therefore universal tool for clinical molecular biology and provides an excellent vehicle for the assays described here.

MATERIALS AND METHODS
Sample collection. This study's S. aureus isolates, mostly obtained from the CDC's collection, were selected to represent the diversity of known CC8 strains, including USA300, USA500, Iberian, Archaic, Canadian MRSA9 (CMRSA9), and ST239 types, and to encompass both MRSA (313 isolates) and MSSA (119 isolates). Intentionally included were FPR3757 and TCH1516 (prototype USA300 isolates), BD02-25 (the USA500 reference isolate from Li et al. [7] and used in the CDC's quality management system protocols), NRS385 (15) and ATCC BAA-1763 (two publicly available isolates typed as USA500), and the genomes of COL (an Archaic isolate from 1960 [11]), HPV107 and E2125 (ST247 Iberian strains from the 1960s [54,55]), and NCTC 8325 (a laboratory strain originally isolated from a septic patient also around 1960). Also included were genomes belonging to the USA300 South American epidemic (USA300-SAE) strain type as well as samples considered Early Branching USA300 (14,56,57), and the Brazilian MRSA-turned-VRSA samples BR-VSSA and BR-VRSA (29). Table 1 lists several of the traditional CC8 strains and their characteristics. Table S1 in the supplemental material describes the isolates used in this study that were whole-genome sequenced.
Sequencing, SNP detection, and phylogenetic analysis. Genome libraries for 288 S. aureus isolates were prepared with a 500-bp insert size using the Kapa library preparation kit with standard PCR library amplification (Kapa Biosystems) and sequenced on a 101-bp-read, paired-end Illumina GAIIx run or a 2ϫ 250-bp Illumina MiSeq run (Tables S1 and S2). Additionally, 311 S. aureus genomes published in previous studies selected for sequence type diversity were used to generate the CC8 phylogeny and an overall S. aureus phylogeny encompassing several clonal complexes (Table S2) (18,58).
The bioinformatics pipeline NASP (50) was used to detect SNPs among genomes. In brief, reads were aligned to the finished genome FPR3757 (GenBank accession no. CP000255) using Novoalign (Novocraft.com) and SNPs were called with GATK (59). Data filtered out included SNP loci with less than 5ϫ coverage or less than 80% consensus in any one sample, SNP loci that were not present in all genomes in the data set, and any regions duplicated in the reference genome as identified by NUCmer (60). The results were formatted in an SNP matrix from a core genome common to all isolates in the analysis. Phylogenetic analysis model selection and generation of trees from the NASP SNP matrices were performed using IQ-TREE (61) and subsequently plotted with genetic marker data by means of ITOL v3 (62).
S. aureus typing. The methods used for molecular typing of S. aureus were adopted from those previously described (4). These methods are based on a study conducted by the CDC (McDougal, unpublished) in which Ͼ350 CC8 isolates were tested for multiple genotypic and phenotypic markers, including the SCCmec type and the IVa subtype, Staphylococcus enterotoxin genes sea, seb, sek, and seq, PVL genes, ACME genes, and trimethoprim-sulfamethoxazole resistance. The markers with the greatest sensitivity and specificity for strain typing comprise the original typing algorithm (4).
For the purposes of this study, our modified genetic marker typing algorithm is shown in Fig. 3. In brief, traditional PFGE or spa type was used to infer clonal complex. Strain types of CC8 MRSA isolates were inferred based on SCCmec types and toxin gene profiles: SCCmec subtype IVa-positive isolates were called USA300, sea-and seb-negative isolates with SCCmec type IV (other than IVa) were called USA500, and isolates with SCCmec type VIII were called CMRSA9. We inferred that the presence of the sea and seb genes was indicative of a separate lineage, called Iberian by Li et al. in 2009 (7) and by the CDC in previous surveillance studies (4). However, as the SCCmec type I characteristic of the original Iberian strain has largely been replaced by SCCmec type IV, and because recent studies have referred to "Iberian" isolates (positive for sea or seb) as USA500 (NRS385 and BAA-1763), we called CC8 isolates positive for sea or seb that carry SCCmec type IV (other than subtype IVa) USA500/Iberian to distinguish them from the original Iberian clone. Isolates spa typed as CC30 with SCCmec type III were inferred to be ST239. CC8 MSSA isolates were called USA300 if they were PVL positive and called CC8-Unknown if they were PVL negative. Lastly, we noted whether the USA300 isolates were PF type 0114. This strain typing approach is herein termed the genetic marker inference (GMI) assay.
Multilocus sequence types (MLSTs) and spa types were determined by the traditional Sanger sequencing analysis or, when typing had not been performed and genomic sequence data were available, MLST was performed with SRST2 (63). SCCmec cassette typing using conventional methods was performed on a subset of isolates depending on the time of their collection (7,64). To determine SCCmec types for isolates that did not have PCR results and to confirm previous conventional typing, WGS data were used: reads were assembled using SPAdes Genome Assembler (65), and an in silico PCR script using the BioPerl (66) toolkit was used to search for SCCmec typing PCR primer sequences (67) and analyze in silico amplicons. For 10 isolates for which conventional typing and WGS typing results were discordant, raw read data were aligned to sequences of several SCCmec cassette types using SeqMan NGen v.12.1.0 (DNAStar, Madison, WI). Types were confirmed by read coverage breadth and depth against the reference SCCmec type sequences.
SNP assays. SNPs that differentiate specific clades of S. aureus (canSNPs), identified by NASP and phylogenetic analysis, were exploited for assay design. From the CC8 phylogenetic analysis, SNP loci at which the SNP state differed between a target lineage and the rest of the complex were selected. These loci were then checked in genomes from other clonal complexes to ensure the SNP state was unique to the targeted lineage. In this way, the potential for a shared SNP state across clonal complexes due to recombination (as has been observed [18]) was avoided. Eight sets of primers and probes targeting eight canSNPs were designed with Biosearch Technologies' RealTimeDesign software (Biosearch Technologies, Petaluma, CA). Assay information is provided in Table 2.
Cell lysates of 311 isolates were prepared as previously described (68) and used to validate the assays. Reactions were run in 10 l on the Applied Biosystems 7500 Fast real-time PCR instrument (Thermo Fisher Scientific) with 5 l 2ϫ TaqMan Universal PCR master mix (Thermo Fisher Scientific), 80 nM forward and reverse primers, 20 nM each probe, and 1 l DNA template. Thermal conditions included denaturation at 95°C for 10 min and 40 cycles of 95°C for 15 s, 60°C for 1 min.
Accession number(s). BioProject accession no. PRJNA374337 contains the whole-genome sequence read data generated in this study.

ACKNOWLEDGMENTS
We gratefully acknowledge the institutions that share their genome sequence data and the support staff that maintain the databases. We also thank the operations and administrative staff at TGen and the CDC for their support.
The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.
This study was funded by NIH grant 5R01AI90782-5 and contract 200-2016-92313 with the Centers for Disease Control and Prevention under their Advanced Molecular Detection Initiative.