Crispr Typing and Subtyping for Improved Laboratory Surveillance of Salmonella Infections

CRISPR typing and subtyping for improved laboratory surveillance of salmonella infections. PLoS One, Volume 7 (Number 5). Article number e36995. Copyright and reuse: The Warwick Research Archive Portal (WRAP) makes this work of researchers of the University of Warwick available open access under the following conditions. This article is made available under the Creative Commons Attribution-3.0 Unported (CC BY 3.0) license and may be reused according to the conditions of the license. For more details see A note on versions: The version presented in WRAP is the published version, or, version of record, and may be cited as it appears here. Abstract Laboratory surveillance systems for salmonellosis should ideally be based on the rapid serotyping and subtyping of isolates. However, current typing methods are limited in both speed and precision. Using 783 strains and isolates belonging to 130 serotypes, we show here that a new family of DNA repeats named CRISPR (clustered regularly interspaced short palindromic repeats) is highly polymorphic in Salmonella. We found that CRISPR polymorphism was strongly correlated with both serotype and multilocus sequence type. Furthermore, spacer microevolution discriminated between subtypes within prevalent serotypes, making it possible to carry out typing and subtyping in a single step. We developed a high-throughput subtyping assay for the most prevalent serotype, Typhimurium. An open web-accessible database was set up, providing a serotype/spacer dictionary and an international tool for strain tracking based on this innovative, powerful typing and subtyping tool. Copyright: ß 2012 Fabre et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Veille Sanitaire. The MLST data are publicly available from http://mlst.ucc.ie, which is currently funded by a grant from the Science Foundation of Ireland (05/FE1/ B882). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have read the journal's policy and have the following conflicts: Some of the authors (FXW, LF, VG, LD, and SB) have filed an international patent application (no. PCT/IB2008/004004) for the molecular typing and subtyping of Salmonella by identification of the variable nucleotide sequences of the CRISPR loci. This does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials. *


Introduction
Salmonellosis is one of the most common causes of food-borne diarrheal disease worldwide. Most infections are zoonotic and are transmitted from food animals to humans through the ingestion of contaminated food. In the United States, 1.4 million nontyphoidal Salmonella infections are thought to occur in humans annually, resulting in approximately 15,000 hospitalizations and 400 deaths [1]. An efficient surveillance system for salmonellosis is therefore crucial. Various non exclusive strategies have been developed, including sentinel surveillance, periodic population-based surveys, and laboratory-based surveillance. Laboratory-based approaches are a key component of monitoring strategies in developed countries. They require a network of clinical laboratories covering the population and referring isolates or information to a central public health reference laboratory. The speed with which public health laboratories obtain information after the onset of symptoms and the regular sharing of information between public health laboratories and epidemiologists are critical for the successful use of information to detect outbreaks early and to identify their source. The basic information currently provided by laboratories is the serotype of the isolates. Hence, each year, more than 200,000 human isolates of Salmonella are serotyped in the United States and Europe [2,3]. Serotyping, the reference method for Salmonella typing since the 1930s, is based on the determination of two surface antigens-O-polysaccharide and flagellin proteins -by agglutination with a large set of polyclonal rabbit antisera. This technique can recognize more than 2,500 serotypes [4], but its discriminatory capacity is limited, because two serotypes, Typhimurium and Enteritidis, are highly prevalent worldwide and account for most outbreaks. The sensitivity of serotyping for the detection of outbreaks involving these common serotypes, even with the use of cluster-detection algorithms, is therefore unsatisfactory [5].
Differentiation between isolates within the most common serotypes requires the use of subtyping methods, which were initially based on determination of the sensitivity of certain Salmonella serotypes to several bacteriophage suspensions (phage typing) [6]. DNA-based subtyping methods were subsequently developed, including, in particular, pulsed-field gel electrophoresis (PFGE) [7], which is based on analysis of the restriction pattern of high-molecular weight DNA digested with a rare-cutting restriction enzyme. Real-time subtyping methods have increased the power of laboratory-based surveillance to detect outbreaks, distinguishing them from the background of sporadic cases by identifying the phage type or molecular ''fingerprint'' of an outbreak strain. PFGE is currently the gold standard method for this purpose. This real-time subtype surveillance has been implemented in the US through PulseNet, an internet-based network of public health and food regulatory agency laboratories that perform real-time standardized PFGE and submit normalized PFGE patterns or raw TIFF gel images electronically to a national database. Regular searches of this database are made, with a view to identifying clusters of identical patterns. However, PFGE has several limitations: it is a technically demanding, non automated method. This may explain why, in a study of outbreaks of foodinfection occurring in the US in 2002, the median interval from the onset of symptoms to PFGE results was 18, with a period of 10 days elapsing between the submission of isolates to public health laboratories and PFGE results [8]. Furthermore, the interpretation and comparison of banding profiles is not straightforward, even with standard protocols and analysis software. The discovery of short DNA sequence repeats in the genomes of prokaryotic organisms has recently led to the development of new subtyping methods. Multilocus variable number of tandem repeats (VNTR) analysis (MLVA) is based on the number of contiguous DNA repeats present at several loci. Following a repeat-spanning PCR for each locus, the number of repeats can be determined by sequencing or inferred from electrophoresis (molecular weight being correlated with the number of repeats). An MLVA scheme for serotype Typhimurium based on the analysis of five loci (with repeat units of 6 to 33 bp) has been established and evaluated [9]. Unlike PFGE, MLVA is rapid, technically simple and suitable for the processing of large numbers of isolates. It can also distinguish between clonal isolates indistinguishable by PFGE, such as those belonging to the multidrug-resistant DT104 strain. However, this method has several drawbacks. MLVA schemes have been validated (i.e., shown to meet performance and convenience criteria, including the epidemiological concordance required for typing methods for use in bacterial epidemiology [10]) for only two Salmonella serotypes, Typhimurium and Enteritidis [11][12][13]. It requires a capillary electrophoresis system and it is difficult to size fragments accurately, as observed in multicenter studies. Finally, these repetitive DNA sequences may evolve too rapidly, leading to changes in repeat numbers during the course of an outbreak [12,13]. MLVA is therefore often used in addition to existing subtyping methods, such as PFGE or phage typing. Jansen et al. identified a new family of repeated DNA sequences, named CRISPR (clustered regularly interspaced short palindromic repeats) in many prokaryotes [14]. This family is characterized by 24-47 bp DNA direct repeats (DRs), separated by variable 21-72 bp sequences called ''spacers'' [15,16]. A ''leader sequence'' and cas (CRISPR-associated sequence) genes are often identified adjacent to the CRISPR locus. Since the middle of the 1990s, the CRISPR locus of Mycobacterium tuberculosis has been extensively studied and the high degree of polymorphism of its spacer content has led to the development of a subtyping method known as spoligotyping [17]. Subtyping methods based on analyses of the spacers of CRISPR loci have since been developed for bacteria of medical interest, such as Yersinia pestis [18], Corynebacterium diphtheriae [19] and Campylobacter [20]. CRISPR seem to confer resistance to foreign DNA, such as plasmids and phages, and the newly integrated spacers are derived from the invading DNA [21,22]. Interestingly, these spacers are integrated into the CRISPR locus in a polarized manner [18,21]. The spacer content of a strain therefore reflects previous DNA introductions and can provide evolutionary information.
Several studies have reported the presence of two CRISPR loci in Salmonella [14,23,24]. We previously showed, in a preliminary study of 400 Salmonella enterica and Salmonella bongori reference strains and isolates from 56 serotypes, that CRISPR polymorphisms (i.e., spacer content) were strongly correlated with serotype and subtype [25]. Two studies recently suggested that CRISPR loci might provide information useful for typing [26,27]. However, these studies considered only a limited number of serotypes from a single geographic area.
We aimed to demonstrate that CRISPR polymorphism analysis is an efficient and powerful alternative to both serotyping and PFGE methods. We first analyzed the spacer content of the two Salmonella CRISPR loci in a large global collection of reference strains and well documented isolates belonging to 130 serotypes of all species and subspecies, focusing particularly on the serotypes most frequently involved in human infections. Analysis of the distribution of the .3,800 unique spacers identified showed that spacer content was strongly correlated with both serotype and multilocus sequence typing (MLST) type. Furthermore, the microevolution of spacer content facilitated the robust discrimination of subtypes within most serotypes, including the most prevalent serotypes, Typhimurium and Enteritidis.
We also present here three applications of CRISPR polymorphisms for Salmonella surveillance. In particular, we describe a novel high-throughput subtyping assay for serotype Typhimurium (and its emerging monophasic 1,4, [5],12:i:-variant). This beadbased liquid hybridization assay is both rapid and easy to carry out, and is therefore highly suitable for use in public health laboratories.

Results
In silico Analysis of the Organization and Structure of CRISPR Loci in Salmonella Two CRISPR loci, CRISPR1 and CRISPR2, were separated by less than 20 kb in all 39 complete genomes of S. enterica and S. bongori analyzed ( Figure 1, Table 1). The CRISPR1 locus was located downstream from the iap gene, whereas CRISPR2 was located upstream from the ygcF gene. The ordered CRISPRassociated (cas) genes belonging to the Ecoli subtype defined by Haft et al. [28] were located between the CRISPR loci: cas2, cas1, cse3, cas5e, cse4, cse2, and cas3. Following the cas genes were sopD (encoding a secreted effector protein), cysH (encoding a phosphoadenosine phosphosulfate reductase), cysI, cysJ (both encoding sulfite reductase subunits), ptpS (encoding pyruvyl tetrahydrobiopterin synthase) and an ORF encoding a putative metal-dependent hydrolase. Structure A was the most frequent, and was observed in 26 (67%) genomes of S. enterica subsp. enterica (including representative serotype Typhimurium strain LT2), S. enterica subsp. diarizonae and S. bongori. Structure B, which was found only in serotype Choleraesuis SC-B67, differed from structure A by an insertion sequence, ISSen1, immediately upstream from CRISPR2. Structure F, which was found in nine genomes of S. enterica subsp. enterica (including representative serotype Typhi strain Ty2) differed from structures A and B in having a different orientation of the cas3 gene and in terms of the degree of similarity of Cas proteins (40 to 85%, depending on the Cas proteins considered; data not shown) [23]. In structures C, D and E (found in S. enterica serotype Paratyphi B strain SPB7, S. enterica subsp. arizonae serotype 62:z4,z23:-strain CDC346-86, and S. enterica serotype Javiana strain GA_MM04042433, respectively), there was a deletion beginning at the end of the last DR of CRISPR1 and encompassing all cas genes with the exception of a 59 remnant of cas3, which was in the same orientation as that in structure A.
The DRs of both CRISPR loci were conserved. They were 29 bp long and had the consensus sequence 59-CGGTTTATCCCCGCTGGCGCGGGGAACAC-39. However, some DR variants carrying single-nucleotide polymorphisms (SNPs) with respect to the consensus sequence were observed (Table S1).
There were 705 unique spacers between the DRs in the two loci from the 39 available genomes (Table S2). Depending on the genome, the number of CRISPR1 spacers varied from 1 to 55 (mean 18.66standard deviation 13.6) and of the number of CRISPR2 spacers varied from 0 (subsp. arizonae) to 32 (15.069.8). Spacers were typically 32 bp long (681/705). One was 29 bp long, two were 31 bp long, sixteen were 33 bp long, one was 38 bp long (spacer STM18var2, which contained a VNTR) one was 50 bp long, one was 72 bp long and two were 74 bp long (spacers STM7A/7B and STM7A/7Bvar2 of serotype Typhimurium) (see below). Some spacers were common to different serotypes.
While our study was underway, two CRISPR databases (CRISPRdb and CRISPI) went online [15,29]. These generalist databases containing .1500 prokaryote genomes incorporate various bioinformatics tools, including one for identifying CRISPR sequences in a selected genome. The application of this tool to Salmonella genomes resulted in incorrect results for four to six of the 17 genomes present in both databases. CRISPRfinder did not detect the short CRISPR2 locus of serotype Typhi strains Ty2 and CT18, which have a unique spacer (EntB0var1) between two DRs (DR27 and DR), one of which is degenerate (identity of 20/29 bp). CRISPRfinder detected three CRISPR in serotype Typhimurium strain LT2 and serotype Heidelberg strain SL476. The CRISPR1 locus was actually artificially split into two CRISPR, due to the presence of an unusual fused spacer-DR unit (STM7A/7B, see below). The CRISPI tool detected no CRISPR in four genomes and only one CRISPR in two others. The CRISPR loci identified in all six genomes were short (1 to 6 spacer-DR units) in our study, confirming that the CRISPI tool is not suitable for detecting short CRISPR loci. Thus, although bioinformatics tools are undoubtedly useful for screening for CRISPR within genomes, careful manual inspection is required to complete the analysis for a given species.

Spacer Content is Strongly Associated with Serotype and MLST
PCR amplification of CRISPR1 with primers A1 and A2 generated a product of between 400 bp and 3 kb in size, in 639 of 744 strains and isolates. By contrast PCR amplification of CRISPR2 with primers B1, B2 and B3 generated a product of between 500 bp and 3 kb in size in all but subsp. arizonae strains and isolates (Figure 1, Tables 2 and S2). Various deletions downstream from CRISPR1 or upstream from CRISPR2 were responsible for amplification failure (see below and Table 3). In one reference strain of serotype Mbandaka, PCR was unsuccessful because the CRISPR1 locus was very large (.6 kb) and contained 124 spacer-DR units.
More than 3,800 different spacers (mean length of 32 nucleotides) were identified in the 39 available genomes and 744 strains and isolates tested (Tables S2, S3, S4). The number of spacers present in a given strain ranged from 1 to 124 for CRISPR1, and from 0 (subsp. arizonae) to 50 for CRISPR2 (Table  S3). Two rare groups of strains displayed low levels of correlation between spacer content and serotype or MLST type. First, all the reptile-associated subsp. arizonae strains had the same single Figure 1. CRISPR/Cas system structures from 39 available genome sequences for S. enterica and S. bongori. Two CRISPR loci (CRISPR1 and CRISPR2) are present in all genomes. The CRISPR-associated (cas) genes cas2, cas1, cse3, cas5e, cse4, cse2, and cas3 genes of the ''Ecoli'' subtype [28] are located between the CRISPR loci. The most frequent structure, A, is represented by S. enterica serotype Typhimurium strain LT2. Structures B to E are represented by S. enterica serotypes Choleraesuis strain SC-B67, Javiana strain GA_MM04042433, Paratyphi B strain SPB7, and S. enterica subsp. arizonae serotype 62:z4,z23:-strain CDC346-86, respectively. Structure F is represented by S. enterica serotype Typhi strain Ty2. Black diamonds represent direct repeats, with colored diamonds indicating spacers. The CRISPR1 locus of serotype Typhi strain Ty2 is enlarged. The primers used to amplify and sequence the CRISPR loci for the spacer inventory are indicated by horizontal arrows. doi:10.1371/journal.pone.0036995.g001 CRISPR1 spacer, despite the diversity of their serotypes and STs. Second, some reptile-associated subsp. enterica serotypes, such as Urbana, Johannesburg, Reading, Pomona, Gueuletapee, Rubislaw, Goettingen and Sandiego had a limited set of spacers shared between these serotypes that belonged to a highly recombinogenic group known as clade B [30] or lineage 3 [31]. Both groups of strains displayed deletions (DA10 for subsp. arizonae serotypes and DF4 and DF5 for clade B subsp. enterica serotypes) of the cas genes (see below). However, with the exception of these two groups, spacer content was strongly correlated with serotype and/or MLST type for 730 of 744 (98.1%) strains. Moreover, for polyphyletic serotypes comprising unrelated MLST groups, spacer content was strongly correlated with the population structure defined by MLST (Table S2). For example, the CRISPR data for three recently described genetic lineages of serotype Newport [32], gave correct serotype recognition and genetic lineage assignment (Tables 4 and S5).
Most of the spacers were unique to particular serotypes. The degree of spacer sharing varied among groups of serotypes identified as closely related on the basis of MLST. For example, in serotypes such as Typhimurium (4:i:1,2) Heidelberg (4:r:1,2) and Kisangani (4:a:1,2), all the spacers on the iap gene side tended to be the same whereas the spacers present on the leader side tended to be more serotype-specific. For isolates of the ST11 group of serotype Enteritidis (9,12:g,m:-) and the closely related Gallinarum (9,12:-:-) and Dublin (9,12:g,p:-) serotypes, only a subset of common spacers was identified (Tables 5 and S6). This was also the case for the complex group of ''bioserotypes'' Paratyphi C, Choleraesuis (sensu stricto and variant Kunzendorf), and Typhisuis, which have the antigenic formula 6,7:c:1,5 in common, indicating descent from a common ancestor, consistent with MLST data ( Figure 2). The presence of ISSen1 at the same position upstream from CRISPR2 in these bioserotypes is also consistent with the hypothesis of a common ancestor. By contrast, the polyphyletic serotype Decatur (formerly known as serotype Choleraesuis variant Decatur which also has an antigenic formula of 6,7:c:1,5) did not have an ISSen1 element upstream from CRISPR2 and included various spacers not found in Paratyphi C, Choleraesuis and Typhisuis (Table S2).

Deletions Downstream from CRISPR1 and Upstream from CRISPR2 in some Salmonella Populations
For representative strains or isolates for which no PCR was obtained from CRISPR1 or CRISPR2, we carried out a longrange PCR encompassing both CRISPR loci, with primers A1 and B3. Amplicons were obtained from all isolates and were between 10 kb and 20 kb in size. DNA sequencing of the ends of the PCR products showed that amplification failure resulted principally from large deletions affecting the cas genes, ending at the cas3 gene and preventing the annealing of the A2 primer (Table 3). For subsp. arizonae and serotypes Paratyphi B and Javiana, these deletions were consistent with data for the three corresponding available genome sequences and from Fricke et al. [24]. Such deletions were also observed in other serotypes or populations within a single serotype, but they were of different types. We therefore designed and validated new CRISPR1 reverse primers binding to the residual region of the cas3 gene from structure A (A3 and A4) or F (A5 to A7) ( Table 2). Due to a deletion upstream from CRISPR2, the B1 primer did not bind to DNA from subsp. arizonae. However, the available genome sequence and sequencing of the long PCR fragment generated with the A1 and B3 primers Reverse primer for single amplification of CRISPR2 for the isolates belonging to all subspecies or reverse primer for combined amplification of both CRISPR1 and CRISPR2 loci 1 Degenerate positions: R = G or A; Y = T or C; M = A or C; K = G or T; D = G or A or T; B = G or T or C. 2 AE006468, serotype Typhimurium LT2 strain; AE014613, serotype Typhi Ty2 strain. 3 The primer pairs used for CRISPR1 amplification for each of the 744 strains are indicated in Table S2. doi:10.1371/journal.pone.0036995.t002 from four other isolates showed that there was only one DR (DR68) and no CRISPR2 spacer in this subspecies.

Microvariations of the Spacer Content Discriminate below the Serotype Level
We observed stable microvariation (duplication, triplication, loss or gain of spacers, presence of SNP variant spacers or VNTR variant spacers) within the strains of monophyletic serotypes. This was the case for the most prevalent serotype worldwide, Typhimurium and its monophasic 1,4, [5],12:i:-variant, for which we analyzed eight genomes and 150 well characterized isolates collected between 1947 and 2010 (Table S7).
In silico analysis of genome sequences identified 28 unique spacers within CRISPR1. This number increased to 40 after analysis of the additional 150 isolates (between 6 and 31 spacers per isolate). The order of spacers was strictly conserved. Most were 32 bp (31/40) or 33 bp (2/40) long. Four of the 40 spacers had SNP variants, one (STM18) had four VNTR variants (26 to 50 bp), and the 74 bp STM7A/7B and STM7A/7Bvar2 spacers contained a 28 bp spacer fused to 14 bp from the end of a DR followed by a classical 32 bp spacer ( Figure S1). This fusion spacer-DR unit may have been generated accidentally during the process of spacer acquisition.
In silico analysis identified 36 unique spacers within CRISPR2. This number increased to 39 upon sequencing of the additional 150 isolates (between 4 and 40 spacers per isolate). All spacers were 32 bp (38/39) or 33 bp (1/39) long. The 39 spacers included only two variant spacers (SNP variants, all the other spacers being unrelated). As for CRISPR1, the variability of CRISPR2 was due to duplication of a single spacer (STMB13)-DR unit and/or to deletion of single or contiguous spacer-DR units.
The order of the spacers was strictly conserved in all but four alleles of CRISPR2 (8 isolates). The variability of CRISPR1 and CRISPR2 spacer content resulted from the duplication of single spacer (STM5, STM8, STM22, STM28, and STMB13)-DR units and/or the deletion of single or contiguous spacer-DR units. This microvariation resulted in 57 CRISPR1 and 62 CRISPR2 alleles or into 83 CRISPR1-CRISPR2 combined alleles, thus providing a higher resolution than other subtyping methods, such as PFGE. Particular populations, such as multidrug-resistant (MDR) DT104  The coordinates of the deleted regions of isolates with type A CRISPR structure and those of isolates with type F CRISPR structure are based on S. enterica serotype Typhimurium strain LT2 (GenBank AE006468) and serotype Typhi strain Ty2 (GenBank AE014613) genomes, respectively. The reverse primers used for these isolates are indicated in Table 2. 2 The cas3 gene of serotype Typhimurium LT2 strain is 2663 bp in size, whereas that of serotype Typhi strain Ty2 is 2207 bp in size, due to a frameshift leading to a premature stop codon. The sizes of the cas3 gene remnant are shown in brackets, not taking into account the serotype Typhi-specific frameshift. doi:10.1371/journal.pone.0036995.t003 isolates [33], African MDR ST313 isolates [34], and ciprofloxacinresistant isolates [35], each had typical CRISPR alleles.
In vitro stability experiments showed no difference in spacer content for either of the CRISPR loci between the five original serotype Enteritidis isolates and their derived cultures after one month or two months with daily passages. Furthermore, the genome sequences of widely used laboratory strains of serotype Typhimurium, LT2 (isolated in 1947) and SL1344 (isolated in the 1970s), showed these strains to have the same spacer content (15 to 32 per locus) as strains LT2 and SL1344 available in our laboratory. The stability of this marker was also assessed by performing the microbead-based CRISPOL assay (see below) one year later on fresh cultures (grown from single colonies) of the 150 serotype Typhimurium and monophasic 1,4, [5],12:i:-isolates (mean of 4169 spacers per isolate). One isolate had a discordant CT with respect to the initial spacer content determined by sequencing. We resequenced both CRISPR loci in this subcultured isolate and found that a single CRISPR1 spacer-DR unit had been lost. During the systematic CRISPOL testing of all serotype Typhimurium and monophasic isolates obtained between January 1 2010 and July 7 2010, 43 duplicate and four triplicate isolates obtained from the same patients on different days were analyzed (see below). All but one had concordant CTs. The final isolate had lost a CRISPR1 spacer-DR unit that was present in the other two isolates from the same patient. Thus, spacer content is, at least in serotypes Enteritidis and Typhimurium, stable enough for use in surveillance and outbreak investigation. However, although rare and often minor (single spacer variant of the original CT in both cases detected), CRISPR variation may occur and, whatever its origin, occurring before or during carriage in the patient or subculture in the laboratory, should be taken into account when defining outbreak-related CTs.
We assessed the discriminatory power of the method, an important parameter for surveillance purposes, by comparing CRISPR spacer diversity with classical first-line (PFGE, phage typing) and second-line (MLVA) subtyping methods for a subset of Table 4. Comparison of CRISPR1 spacer content with the population structure of S. enterica serotype Newport, as assessed by MLST.

Lineage
Strain The deletions are named according to Table 3. For prevalent MDR DT104 isolates, the discriminatory power was higher for combined CRISPR analysis (5 profiles, DI = 0.64) than for PFGE (also 5 profiles, but DI = 0.38). The best discrimination was that achieved with the five-locus loci MLVA method, for which all the DT104 isolates had different profiles (DI = 1). However, it was not possible to amplify some MLVA loci from some isolates (null alleles were seen Other STs Ent202//2Ent35, 15 unique spacers are located between Ent20 and Ent35 (see Table S2). for STTR3 in 4 isolates, STTR5 in 1 isolate, STTR6 in 3 isolates and STTR10 in 3 isolates) and variations in the number of repeats of some loci were observed in outbreak-related isolates, indicating lower levels of epidemiological concordance, possibly due to the very rapid evolution of these markers or to outbreaks being caused by more than two MLVA types (Table S7). The spacer content of serotypes Typhimurium and Enteritidis isolates from 10 documented outbreaks was studied by sequencing (Tables S7 and S8) or with the microbead-based CRISPOL assay (see below). In all cases, the outbreak isolates had the same CRISPR type. Epidemiological concordance was thus complete.
Four strains (SARA8, 81-784, 02-7015 and 07-1777) with a known spacer content covering all the spacers identified were tested in every microbead-based CRISPOL experiment. Their CRISPR types were identical in all cases.
In addition to 100% typeability, the other performance criteria [10] for CRISPR analysis, such as stability, discriminatory power, epidemiological concordance and reproducibility, indicated that this was a very powerful method for use in the molecular epidemiology of Salmonella.

Applications of CRISPR Polymorphisms
There are at least three applications of CRISPR polymorphisms of potential interest in clinical microbiology or public health laboratories.
Application 1: CRISPR sizing by PCR for the rapid comparison of Salmonella spp isolates. The first application is a double-locus PCR assay for the rapid comparison of Salmonella isolates. We demonstrate above that variation in the number and type of spacers can be used to track strains, given the discrimination between the most prevalent Salmonella serotypes.
Remarkably, simple PCR amplification of CRISPR1 and CRISPR2 loci, followed by agarose gel electrophoresis and sizing of the PCR products, differentiated outbreak isolates from non outbreak isolates and was therefore found to be a useful screening approach. For example, for the eight isolates of serotype Typhimurium isolated from the same city during a single week in 2005 (cluster E in Table S7), it was possible to discriminate between four isolates from the same food poisoning cluster and four other isolates unrelated to this cluster ( Figure 3). This size variation results from variation in the number of spacer-DR units (total 60 bp), which thus provides some discrimination even in the absence of qualitative information (i.e., the spacer type). Another advantage of this approach is that it does not require prior serotype identification, as sequences from isolates of all serotypes can be amplified by at least one of the two primer pairs used for CRISPR amplification. A different amplicon size for one or both loci demonstrates that the analyzed isolates are unrelated, but it should be borne in mind that a similar amplicon size for both loci does not necessarily imply that the two isolates belong to the same strain. Two unrelated isolates could have the same number of spacers but of different types, and low-level variation in the number of spacers might not be detected on agarose electrophoresis of large PCR products. However, this simple screening approach is suitable for low-capacity public health or hospital laboratories, including those in developing countries, which need to be able to compare several Salmonella isolates rapidly in a single experiment and that cannot afford complete serotyping or subtyping by PFGE.
Application 2: High-throughput method for subtyping serotype Typhimurium or its monophasic variant in real time: the CRISPOL assay. We present below a second  Table S2. doi:10.1371/journal.pone.0036995.g002 application resulting from the development of a high-throughput method for the real-time subtyping of serotype Typhimurium and its monophasic variant. Based on the 83 CRISPR1-CRISPR2 combined alleles identified above, we developed a bead-based liquid hybridization assay (LuminexH technology), CRISPOL (for CRISPR polymorphism; Figure 4). A 25 to 32 bp capture probe was designed for each of 72 of the 79 spacers identified (Table 6; it was not possible to distinguish between some of the remaining seven spacers by this approach. For example, STMB8var1 has a single SNP located in position 1 of the spacer). Each capture probe was coupled to a defined xMAP bead. We used thermolysates as the DNA template and a single primer pair (including a biotinylated primer) hybridizing to DR sequences to amplify the spacer content of the two CRISPR loci rapidly. The PCR mixture was hybridized with the 72 probe-coupled beads and incubated with streptavidin-phycoerythrin for detection. The LuminexH platform was then used to measure the fluorescence associated with each bead (corresponding to a unique probe/spacer). This method gave a highly robust readout, with mean fluorescence signals of 709 to 5,707 in the presence of the spacer, and of 52 to 193 in the absence of the spacer ( Table 7). The positive/negative ratios were between 13 (for a bead for which coupling was not optimal) and 92 (mean of 50.8). It was also easy to identify the four SNP-variant spacers (Table 8). One probe, pSTMB26 had a trimodal distribution, due to an intermediate population (MFI between 300 and 1,200), whereas the positive population had an MFI of more than 3,500. This intermediate population (consisting mostly of emerging European monophasic isolates) contained spacer STMB34, which is partly complementary to pSTMB26. We designed a new probe targeting the other side of spacer STMB26, but this probe was also partly complementary to another spacer, STM28. We resolved this problem by subtracting the value for the control strain 02-7015 (STMB34 positive) from that for probe pSTMB26 in each experiment.
The repeatability of the CRISPOL assay was assessed by running 30 isolates in triplicate and was high (data for five strains provided in Table S9), with low standard deviations. The assay was also highly reproducible, based on the results for the four control strains (which, together, contained all the known spacers) analyzed in each experiment. A cutoff of five times the value for the background sample (consisting of all reaction components except template DNA, which was replaced with water), which had an MFI of approximately 300, was used to determine whether the result was positive or negative for a given spacer. The distribution of crude MFI values in a typical experiment with 65 isolates showed the clear-cut distinction between negative and positive results for spacers ( Figure S2).
The concordance of the CRISPOL assay and sequencing results was 100% for the spacer content of the 150 sequenced isolates. Despite the presence of some genetic variation, such as duplications of spacers, VNTR variation of STM18 and the appearance of SNP variants, which might result in spacers being missed by the CRISPOL assay, no such effect was observed in practice as we found no isolates with identical CTs but with alleles with different sequences.
This method has two major advantages: its rapidity and low cost. It requires 5.5 hours in total, with 2.75 hours of hands-on time, to test 65 isolates from bacterial colonies, at an estimated cost of J4 per sample for reagents and consumables.
The application of this method to almost 2,000 serotype Typhimurium and monophasic variant isolates led to the identification of 245 different CRISPOL types (CTs; Figure 5). CT21 and its variants were strongly associated with the MDR DT104 serotype Typhimurium clone (MLST type ST19). Similarly, CT1 and its variants were strongly associated with the emerging European monophasic 1,4,[5],12:i:-variant (MLST type ST34). In total, 1,084 isolates (one per patient) were received at the French National Reference Center for Salmonella (FNRC-Salm) between January 1and July 4 2010; 89 CTs were observed among the serotype Typhimurium isolates (n = 677). The two most prevalent types were CT21 (33% of isolates) and CT30 (11.5%); both were associated with the DT104 clone. For monophasic isolates (n = 407), we identified 39 CTs, of which CT1 (50%) and CT9 (14.7%) were the most prevalent. During this period a steady increase in the prevalence of CT1 and three peaks of CT21, CT136 and CT62 (two of which corresponded to documented outbreaks) [36] were observed ( Figure 6).
Application 3: Development of PCR assays targeting specific serotypes or particular strains. The presence of unique, constant spacers in certain serotypes, such as Typhi and Paratyphi A, should make it possible to develop PCR assays specific for these serotypes. As a proof-of-principle, we have successfully developed and validated such PCRs for serotypes Typhi and Paratyphi A (manuscript in preparation). Moreover, it would be possible to develop a PCR assay for the detection of any strain of interest with a particular spacer content provided that a culture of this strain was available. For strains with no specific spacers (common spacers only), we can use other stable characteristics of the strain, such as the absence of a spacer-DR unit between STM06 and STM24 (e.g., a MDR serotype Typhimurium DT104 strain), to design primers yielding a PCR product of known size.

Discussion
We demonstrate here that the assessment of CRISPR spacer content is a robust, highly discriminatory and practical method for typing Salmonella isolates. Serotyping has been the reference method for Salmonella typing for almost 80 years. However, this technique has a number of drawbacks, including low throughput, high costs due to the need for highly trained staff and expensive antisera, and accreditation problems. It is also of limited value for  Table S7). Three cases were from the same food poisoning cluster (the food isolate was also tested), whereas the other cases were unrelated. doi:10.1371/journal.pone.0036995.g003 strain discrimination, given the overdominance of a small number of serotypes. There is therefore a clear need for improved methods [37].
Recently developed ''molecular serotyping'' methods have been proposed as an alternative. These methods mimic serotyping in that they target the genes involved in biosynthesis of the flagellar (fliC and fljB) [38] and/or the O-polysaccharide (encoded by the rfb locus) antigens [2]. However, due to the complexity of the rfb locus (8 to 23 kb, including more than 10 open reading frames), it is currently possible to identify only a minority of the 46 O serogroups of Salmonella by PCR. These approaches are also subject to all the limitations inherent to serotyping in terms of a lack of discrimination and a lack of polyphyletic group recognition.
MLST is a promising method for defining evolutionarily and epidemiologically meaningful groups of Salmonella. A publicly accessible database (http://mlst.ucc.ie/mlst/dbs/Senterica) includes data for more than 4,250 isolates from more than 500 serotypes [39]. Analyses of these data have revealed that most serotypes are probably polyphyletic and therefore do not correspond to natural groups descended from a single ancestor and sharing important host association or virulence features. This recent study highlights the importance of using phylogenetically informative methods recognizing natural groups rather than serotypes. However, MLST has a low discriminatory power and is not suitable for the detection or investigation of outbreaks due to highly prevalent monophyletic serotypes.
The data presented here for a global collection of 783 reference strains and isolates from 130 serotypes of Salmonella, including the most common serotypes involved in human infections, show a high degree of CRISPR polymorphism. This polymorphism makes it possible to distinguish between most serotypes and between MLST groups within polyphyletic serotypes. Furthermore, microvariations, such as the loss, acquisition, duplication of spacers or point mutations within spacers, have a strain discrimination capacity similar to that of current gold standard methods, such as PFGE. The CRISPR method can therefore be used for simultaneous typing -defined as the determination of serotype or MLST group -and subtyping. It therefore represents a single alternative to several widely used reference methods: serotyping, PFGE and phage typing. This genetic marker is based on polymorphic DNA sequences of limited length, 0.5 to 3 kb. It therefore has major advantages, in terms of analysis, throughput, standardization,   (39) interpretation and data exchange, over current typing and subtyping methods. We believe that this novel approach will constitute a real improvement in the monitoring of Salmonella infections, by making it possible to obtain results more rapidly, thereby optimizing surveillance and outbreak detection. We propose several different strategies for CRISPR genotyping in Salmonella. First, determination of the sizes of the two CRISPR loci by PCR can be used as an initial screen that is easy to implement, even in low-capacity laboratories. This approach requires no preliminary serotyping. Second, when more precise discrimination is required, the spacer content can then be investigated by Sanger sequencing of the PCR products, which has the additional advantage of facilitating the detection of new putative spacers. However, once the spacer diversity within a serotype is known (i.e., after the analysis of a representative collection of isolates of this serotype), higher throughput is required for daily surveillance. We have developed a LuminexHbased approach that is suitable for serotype Typhimurium, which accounts for 50% of all cases of human salmonellosis in France and is one of the two major serotypes worldwide. PFGE is currently recommended for the real-time surveillance of this serotype, but is technically demanding and poorly standardized in many laboratories. It is therefore difficult to use PFGE in many countries in which a single reference laboratory processes a large number of isolates. The CRISPOL assay developed here covers both serotype Typhimurium and its emerging monophasic variant. It provides an excellent alternative to PFGE, being cheaper, less technically demanding and yielding data that are easy to interpret and exchange. An approach based on the initial use of the CRISPOL assay, followed by MLVA for genetically homogeneous populations, such as the DT104 clone (CT21) or the emerging monophasic strain (CT1) would be highly effective. However, due to limitations in the number of beads that can be mixed (500 for the latest LuminexH platform), the universal use of a LuminexHbased approach is not possible (the Salmonella serotypes analyzed to date include 3,800 different spacers). Whole-genome sequencing (WGS) is a possible alternative, provided costs and analysis times can be decreased. WGS could be customized to focus exclusively on the two CRISPR sequences. The known spacer sequences would be extracted and compared with the contents of a CRISPR/serotype database. Another alternative would involve the use of a microarray approach based on DNA oligonucleotides MFI, median fluorescence intensity (raw data); SD, standard deviation. 2 The ratio is the positive average median fluorescence intensity (MFI) divided by the negative average MFI. 3 Due to partial identity with spacer STMB34, the signal of pSTMB26 is stronger in isolates containing spacer STMB34 (such as the emergent monophasic population). This has been taken into account by subtracting the MFI of control strain #02-7015 from that of pSTMB26 in each experiment. The corrected median is 183646 with a ratio of 20. 4 Median calculated for 24 isolates. 5 Median calculated for 20 isolates. doi:10.1371/journal.pone.0036995.t007  corresponding to highly informative spacers. In the meantime, subtyping applications remain to be developed for the most epidemiologically important serotypes or MLST groups, such as Enteritidis and Newport. For this purpose, it should be straightforward to apply the strategy presented here for serotype Typhimurium.
Clearly, there is also a need to extend the serotype coverage of the spacer content inventory, as only the 130 most important serotypes have been investigated so far. We hope to capture all the diversity of Salmonella (.2,500 serotypes) in the next 10 years, and the CRISPR/serotype dictionary available from our open-access website will be updated accordingly. This web tool can be used to extract and identify spacers from a submitted DNA sequence and for comparisons with a well curated database (i.e., containing accurately serotyped isolates). The application of this tool to CRISPR sequences identified as corresponding to Enteritidis isolates by Liu et al. [27], showed that 10 of the 27 considered actually corresponded to Typhimurium (EST21, EST22), Infantis (EST17), Kentucky (EST23), and Heidelberg (EST15), rather than Enteritidis. All these discrepancies related to isolates obtained from a local diagnostic laboratory and not from the reference laboratory participating in the study, which suggests that serotyping errors were the cause.
In conclusion, we have demonstrated that CRISPR is a powerful method suitable for use in the molecular typing and subtyping of Salmonella isolates. We believe that, given its combined advantages, CRISPR strain characterization is an excellent potential alternative to both serotyping and PFGE, the current gold standard methods. Given the rapidity of this method, in particular, it should have a major impact on surveillance and outbreak investigation and is likely to be of benefit to public health.

Salmonella Strains and Isolates
In the first part of this study (spacer content inventory and comparison with current typing and subtyping methods), we used 744 Salmonella reference strains or isolates belonging to 130 serotypes (including those most frequently identified in human and food products). These serotypes belonged to two species of the Salmonella genus: S. enterica and S. bongori and the six subspecies of S. enterica: enterica, salamae, arizonae, diarizonae, indica and houtenae (Tables S2 and S4). The Salmonella serotype reference strains were obtained from the World Health Organization Collaborative Center for Reference and Research on Salmonella (WHOCC-Salm). Most of the isolates were from the French National Reference Center for Salmonella (FNRC-Salm). Both these centers are located at the Institut Pasteur, Paris. Other strain and isolate providers are acknowledged at the end of the manuscript. The strains and isolates studied were obtained from around the world, between 1885 and 2010. Larger subsets of isolates from prevalent serotypes were assembled to reflect as accurately as possible the diversity of these populations: Typhimurium or its monophasic variant with antigenic formula 1,4, [5],12:i:-(n = 150), Enteritidis (n = 187), polyphyletic serotypes such as Newport (n = 21) and Paratyphi B (n = 36), and serotypes with the antigenic formula 6,7:c:1,5 (n = 34), or clinically important serotypes, such as Typhi (n = 20) and Paratyphi A (n = 14). This test population was generally well defined in terms of its epidemiological context, treated as a numerical character indicating absence (0) or presence (1 for all spacers except BraB14, for which an arbitrary value of 10 was assigned) in BioNumerics 6.5 software (Applied Maths, Sint-Martens-Latem, Belgium). Similarities between CTs were assessed by calculating the Pearson productmoment, and a dendrogram was constructed by the unweighted pair group method with arithmetic mean (UPGMA). The four SNP-variant spacers targeted by probes 69 to 72 are shown but were excluded from the phylogenetic analysis, as they were not independent. A indicates a group of profiles derived from CT1, the main type of emerging monophasic isolates. B indicates a group of profiles derived from CT21, which is associated with multidrug-resistant DT104 serotype Typhimurium isolates. C indicates a group of serotype Typhimurium isolates of ST36 that may have one or two specific spacers on the leader side of CRISPR1 (BraB14) and CRISPR2 (STMB35). doi:10.1371/journal.pone.0036995.g005 Figure 6. Distribution of selected CRISPOL types of S. enterica serotype Typhimurium and S. enterica with antigenic formula 1,4, [5],12:i :-isolated from humans in France between January 1 and July 4 2010. Over this period, all 1,084 isolates (one per patient) were CRISPOL-typed and two outbreaks were investigated. Outbreak A (<40 cases) was due to the consumption of a raw milk cheese contaminated with a CT62 highly multidrug-resistant S. enterica serotype Typhimurium strain, whereas outbreak B (<50 cases) was caused by the consumption of a dried pork sausage contaminated with a CT136 S. enterica 4,12:i:-strain. The third peak, corresponding to CT21, in May was neither detected nor investigated, as CRISPOL typing was carried out retrospectively. doi:10.1371/journal.pone.0036995.g006 antimicrobial resistance phenotype, phage type, PFGE type, MLVA type, haplotype and MLST type, as determined by methods described elsewhere [9,32,33,40,41].
In the second part of the study, we validated the CRISPOL method on 150 serotype Typhimurium or 1,4, [5],12:i:-isolates for which both CRISPR loci were sequenced. The method was then applied to a collection of 1,900 isolates from the WHOCC-Salm and from the FNRC-Salm, including all isolates received by the FNRC-Salm from January 1 2010 to July 4 2010 (n = 1,131 isolates from 1,084 patients).

Inventory of the Spacer Content of 744 Salmonella Strains and Isolates and 39 Genomes of 130 Serotypes
In silico analysis. We analyzed CRISPR spacer content in 39 full genome sequences of S. enterica and S. bongori (Table 1). Regions containing CRISPR sequences were identified by a blast (ncbi) search of the 29 bp DR consensus sequence of S. enterica serotype Typhi strain Ty2 (59-CGGTTTATCCCCGCTGGCGCGGGGAACAC-39) [14]. Regions downstream from the iap gene and upstream from the ygcF gene were downloaded and the spacer-DR units of each CRISPR locus were extracted manually.
DNA extraction. Total DNA was extracted with the InstaGene matrix (BioRad, Marnes la Coquette, France) or the Wizard kit (Promega, Madison, WI, USA) in accordance with the manufacturer's recommendations.
PCR and sequencing of the CRISPR1 and CRISPR2 loci. Oligonucleotide primers for amplification of the CRISPR1 and CRISPR2 loci from all Salmonella spp. were designed on the basis of consensus alignments of the available Salmonella genomes (Tables 1 and 2). The CRISPR1 locus was amplified with the forward A1 primer (binding 74 bp upstream from the CRISPR1 of serotype Typhimurium strain LT2) and the reverse primer A2 (binding 130 bp downstream). Alternative reverse primers, such as A3 to A7, were required for some isolates. The CRISPR2 locus was amplified with the forward primer B1 (binding 110 bp upstream from CRISPR2) and the reverse primers B2 and B3 (binding 45 bp and 324 bp downstream from the CRISPR2 locus of strain LT2, respectively). The B3 primer was designed because no region homologous to B2 was found in subsp. diarizonae. The primers were synthesized by MWG-Biotech (Ebersberg, Germany). Single-locus amplifications were performed in a total volume of 50 ml containing DNA (2.5 ml from InstaGene matrix or 2 ml diluted 10-fold from Wizard), primers (10 pmol each), deoxynucleoside triphosphate (100 mM), Taq DNA polymerase (0.85 U of GoTaq Flexi DNA polymerase; Promega) and its buffer, MgCl 2 (1.5 mM) and dimethylsulfoxide (5%). The cycling conditions were as follows: 2 min for denaturation at 94uC (1 cycle), followed by 35 cycles of 1 min at 94uC for denaturation, 1 min at 59uC (61uC when using the A1-A4 pair) for annealing, and 90 s at 72uC for polymerization, followed by an additional 10 min at 72uC for extension.
The entire region spanning both CRISPR loci was amplified with primers A1 and B3. For this purpose, DNA was extracted with the Promega Wizard kit and PCR was carried out with the Expand Long Template PCR System kit (Roche).
Both strands of purified amplicons were sequenced with Big Dye Terminator version 3.1 (Applied Biosystems, Foster City, CA) on an ABI 3730XL apparatus (Applied Biosystems). BioNumerics 6.5 software (Applied Maths, Sint-Martens-Latem, Belgium) was used to analyze nucleotide sequences.

Development of Web-accessible Tools and CRISPR/ serotype Dictionary
We have developed a web tool for the creation and storage of catalogues of spacers and DR variants. This ''Institut Pasteur CRISPR database for Salmonella'' can be queried online at http:// www.pasteur.fr/recherche/genopole/PF8/crispr/CRISPRDB. html. The content of the catalogue is used to identify known spacers and DRs in a submitted DNA sequence, which is coded into a succession of DR and spacer identifiers by the query ''Search spacers composition for query''. If part of the sequence has no exact matches in the DR and spacers dictionary, a blast query can then be used to obtain the nearest match (''Blast unknown spacer sequence against dictionary''), to identify new spacers or new DR variants. Isolates analyzed at the FNRC-Salm and coded as spacer-DR arrays within the CRISPR/ serotype dictionary can be downloaded with the ''Browse spacers composition for the published strains'' query. .
Spacer nomenclature. The spacer names start with a threeto four-letter prefix indicating the serotype from which the spacer was extracted for the first time. The suffix B indicates spacers found in the CRISPR2 locus. Spacers were numbered consecutively in order of discovery. The start of spacer arrays is described, starting downstream from the iap gene. SNP or VNTR variants are denoted as ''var'' (e.g., EntB0var1, STM18var2).
Calculation of discrimination indices. The discriminatory abilities of the CRISPR, PFGE and phage typing methods were assessed by calculating Simpson's index of diversity (D value), as previously described [42].
Statistic analysis. The mean number and standard deviation of the spacers in the CRISPR loci were calculated with Excel (Microsoft).
Nucleotide sequence accession numbers. The nucleotide sequences of the CRISPR loci have been assigned GenBank accession numbers JF724159 to JF725640.
Development of a High-throughput Subtyping Method (CRISPOL) for Serotype Typhimurium DNA extraction. We increased the throughput of this method by using thermolysates as the DNA template. Briefly, we suspended a 10 ml loop of bacteria in 200 ml of molecular biologygrade water. The suspension was vortexed for 10s, incubated at 95uC for 10 min and then centrifuged for 5 minutes at 13000 rpm in a Jouan A14 centrifuge. The supernatant was transferred to a 1.5 ml microtube and stored at 220uC until use. PCR amplification. We followed the strategy used for Mycobacterium spoligotyping, based on the use of two primers hybridizing to the DRs in opposite directions, one of which was 59biotinylated [17]. The primers were DRSTMA (59-CCGCTGGCGCGGGGAACA-39) and DRSTMB (59Biot-CGCCAGCGGGGATAAACC-39). Amplifications were carried out in a volume of 50 ml containing 1 ml of thermolysate (or 1 ml of molecular biology-grade water for the blank), primers (50 pmol each), deoxynucleoside triphosphate (200 mM), Taq DNA polymerase (0.85 U of Go Taq; Promega) and its buffer, MgCl 2 (1.5 mM). The cycling conditions were as follows: 2 min at 95uC for initial denaturation, followed by 20 cycles of 1 min at 95uC for denaturation, 30 s at 59uC for annealing and 15 s at 72uC for polymerization. The PCR products were checked by electrophoresis in 1.2% agarose gels and were stored at 220uC for no more than three days before use in the LuminexH assay.
Probes and microbead coupling. We designed 72 spacerderived oligonucleotide probes (including four SNP-variant probes) of between 25 and 32 nucleotides in size (Table 6). These probes were synthesized by Eurogentec (Angers, France), with a 59 terminal amino group modification, using a 12-carbon spacer linker. The 72 Luminex xMap microbeads (L100-C129 to L100-C200) were coupled to the 72 probes, as previously described [43]. Each type of coupled microbead was resuspended in 1 6 TE (10 mM Tris, 1 mM EDTA, pH 8) at a final concentration of approximately 50,000 microbeads per ml. We then combined equal volumes of each type of coupled microbead in Protein LoBind tubes (Eppendorf, Hamburg, Germany). The mixture was stored in the dark at 4uC before and after use.
Hybridization. Hybridization was performed in a polycarbonate plate with 96 conical wells (Corning, Corning, NY, USA), to which we added 10 ml of PCR product, 7 ml of 1 6TE and 33 ml of probe-coupled microbeads diluted in 1. The plates were sealed with adhesive PCR film (ABgene, Epsom, UK) and heated to 94uC for 3 min for initial denaturation, followed by hybridization at 59uC for 20 min. The plate was centrifuged at 4,000 g for 3 min and the supernatant was carefully discarded. A reporter mix consisting of 90 ml of streptavidin-R-phycoerythrin (1.25 mg/ml in 1 6 TMAC; Invitrogen, Carlsbad, CA, USA) was added to each well and the microbead pellet was resuspended. The microplate was then incubated for 5 minutes and analyzed on the LuminexH platform.
Analysis on the Luminex platform. The microplate was analyzed in a LuminexH 200 system at a temperature of 50uC. Analyses were based on counts for 50 beads per set.
Data analysis. Four strains (SARA8, 81-784, 02-7015 and 07-1777) with a known spacer content covering all the spacers in the assay and a blank (in which DNA was replaced with water) were analyzed in each experiment. For all probes except pSTMB26, relative fluorescence unit values were corrected by subtracting the value for the blank. In the case of negative corrected values, an arbitrary value of 25 was attributed. For STMB26, values were corrected by subtracting those for 07-1777, due to a weak cross-reaction observed in emerging monophasic variant strains (07-1777 being a monophasic variant isolate). A cutoff value five times higher than the corrected value was defined. For pSTM03, pSTM07, pSTM12 and pSTMB9 probes and their SNP variant probes pSTM03var1, pSTM07var2, pSTM12var1, and pSTMB9var1, assignment to the wild-type spacer or its SNP variant was based on the ratio of crude values for each probe. If the wild-type probe/SNP variant probe ratio was .1.1, then the wild-type spacer was attributed to the isolate. If the wild-type probe/SNP variant probe ratio was ,0.9, then the SNP variant spacer was attributed to the isolate. An R tool was developed to automate the analysis. For each strain an allelic pattern, referred to as the CRISPOL type (CT), consisting of the presence or absence of the 68 ordered CRISPR1, then CRISPR2 probes, followed by the four SNP-variant probes, was generated. Data were incorporated into a dedicated CRISPOL database with BioNumericsH software. A provisional number was assigned to each strain with a new CT, (e.g., CT62prov) until both CRISPR regions had been sequenced, to check for consistency with the CT and for an absence of new spacers.