Identification and Validation of Specific Markers of Bacillus anthracis Spores by Proteomics and Genomics Approaches*

Bacillus anthracis is the causative bacteria of anthrax, an acute and often fatal disease in humans. The infectious agent, the spore, represents a real bioterrorism threat and its specific identification is crucial. However, because of the high genomic relatedness within the Bacillus cereus group, it is still a real challenge to identify B. anthracis spores confidently. Mass spectrometry-based tools represent a powerful approach to the efficient discovery and identification of such protein markers. Here we undertook comparative proteomics analyses of Bacillus anthracis, cereus and thuringiensis spores to identify proteoforms unique to B. anthracis. The marker discovery pipeline developed combined peptide- and protein-centric approaches using liquid chromatography coupled to tandem mass spectrometry experiments using a high resolution/high mass accuracy LTQ-Orbitrap instrument. By combining these data with those from complementary bioinformatics approaches, we were able to highlight a dozen novel proteins consistently observed across all the investigated B. anthracis spores while being absent in B. cereus/thuringiensis spores. To further demonstrate the relevance of these markers and their strict specificity to B. anthracis, the number of strains studied was extended to 55, by including closely related strains such as B. thuringiensis 9727, and above all the B. cereus biovar anthracis CI, CA strains that possess pXO1- and pXO2-like plasmids. Under these conditions, the combination of proteomics and genomics approaches confirms the pertinence of 11 markers. Genes encoding these 11 markers are located on the chromosome, which provides additional targets complementary to the commonly used plasmid-encoded markers. Last but not least, we also report the development of a targeted liquid chromatography coupled to tandem mass spectrometry method involving the selection reaction monitoring mode for the monitoring of the 4 most suitable protein markers. Within a proof-of-concept study, we demonstrate the value of this approach for the further high throughput and specific detection of B. anthracis spores within complex samples.

tinence of 11 markers. Genes encoding these 11 markers are located on the chromosome, which provides additional targets complementary to the commonly used plasmid-encoded markers. Last but not least, we also report the development of a targeted liquid chromatography coupled to tandem mass spectrometry method involving the selection reaction monitoring mode for the monitoring of the 4 most suitable protein markers. Within a proof-ofconcept study, we demonstrate the value of this approach for the further high throughput and specific detection of Bacillus anthracis is a highly virulent bacterium, which is the etiologic agent of anthrax, an acute and often lethal disease of animals and humans (1). According to the Centers for Disease Control and Prevention, B. anthracis is classified as a category A agent, the highest rank of potential bioterrorism agents (http://www.bt.cdc.gov/agent/agentlist-category.asp). The infectious agent of anthrax, the spore, was used as a bioterrorism weapon in 2001 in the United States when mailed letters containing B. anthracis spores caused 22 cases of inhalational and/or cutaneous anthrax, five of which were lethal (2). These events have emphasized the need for rapid and accurate detection of B. anthracis spores.
Bacillus anthracis is a member of the genus Bacillus, Grampositive, rod-shaped bacteria characterized by the ability to form endospores under aerobic or facultative anaerobic conditions (3). The genus Bacillus is a widely heterogeneous group encompassing 268 validly described species to date (http://www.bacterio.net/b/bacillus.html, last accessed on August 9 th 2013). B. anthracis is part of the B. cereus group which consists of six distinct species: B. anthracis, B. cereus, B. thuringiensis, B. mycoides, B. pseudomycoides, and B. weihenstephanensis (4,5). The latter three species are generally regarded as nonpathogenic whereas B. cereus and B. thuringiensis could be opportunistic or pathogenic to mammals or insects (5,6). B. cereus is a ubiquitous species that lives in soil but is also found in foods of plant and animal origin, such as dairy products (7). Its occurrence has also been linked to food poisoning and it can cause diarrhea and vomiting (6,8). B. thuringiensis is primarily an insect pathogen, also present in soil, and often used as a biopesticide (9).
B. anthracis is highly monomorphic, that is, shows little genetic variation (10), and primarily exists in the environment as a highly stable, dormant spore in the soil (1). Specific identification of B. anthracis is challenging because of its high genetic similarity (sequence similarity Ͼ99%) with B. cereus and B. thuringiensis (5,11). The fact that these closely related species are rather omnipresent in the environment further complicates identification of B. anthracis. The main difference among these three species is the presence in B. anthracis of the two virulence plasmids pXO1 and pXO2 (1), which are responsible for its pathogenicity. pXO1 encodes a tripartite toxin (protective antigen (PA), lethal factor (LF), and edema factor (EF)) which causes edema and death (1), whereas pXO2 encodes a poly-␥-D-glutamate capsule which protects the organism from phagocytosis (1). B. anthracis identification often relies on the detection of the genes encoded by these two plasmids via nucleic acid-based assays (12)(13)(14). Nevertheless, the occasionally observed loss of the pXO2 plasmid within environmental species may impair the robustness of detection (1). In addition, in recent years a series of findings has shown that the presence of pXO1 and pXO2 is not a unique feature of B. anthracis. Indeed, Hu et al. have demonstrated that ϳ7% of B. cereus/B. thuringiensis species can have a pXO1-like plasmid and ϳ1.5% a pXO2-like plasmid (15). This was particularly underlined for some virulent B. cereus strains (i.e. B. cereus strains G9241, B. cereus biovar anthracis strains CA and CI) (16 -20).
Because of these potential drawbacks, the use of chromosome-encoded genes would be preferable for the specific detection of B. anthracis. Such genes (rpoB, gyrA, gyrB, plcR, BA5345, and BA813) have been reported as potential markers (21)(22)(23)(24)(25), but concerns have also been raised about their ability to discriminate B. anthracis efficiently from closely related B. cereus strains (26). Ahmod et al. have recently pointed out, by in silico database analysis, that a specific sequence deletion (indel) occurs in the yeaC gene and exploited it for the specific identification of B. anthracis (27). Nevertheless, a few B. anthracis strains (e.g. B. anthracis A1055) do not have this specific deletion and so may lead to false-negative results (27).
In the last few years, protein profiling by MS, essentially based on matrix-assisted laser desorption/ionization time-offlight (MALDI-TOF MS), has emerged as an alternative (or a complement) to genotypic or phenotypic methods for the fast and efficient identification of microorganisms (28,29). Such an approach is based on the reproducible acquisition of global bacterial protein fingerprints/patterns. The combination of MS-based protein patterns and chemometric/bioinformatic tools has been demonstrated to efficiently differentiate members of the B. cereus group from other Bacillus species (30).
However, the specific discrimination of B. anthracis from the closely related B. cereus and B. thuringiensis remains difficult (30). This study of Lasch and coworkers, performed on vegetative cells, identified a few ribosomal and spore proteins as being responsible for this clustering (30). Closer inspection of the data revealed that B. anthracis identification was essentially based on one particular isoform of the small acid-soluble spore protein B (SASP-B) 1 (30 -34), which is exclusively expressed in spores, as the samples were shown to contain residual spores. However, the specificity of SASP-B has recently been questioned as the published genomes of B. cereus biovar anthracis CI and B. thuringiensis BGSC 4CC1 strains have been shown to share the same SASP-B isoform as B. anthracis (35). Altogether these results underline that the quest for specific markers of B. anthracis needs to be pursued.
Mass spectrometry also represents a powerful tool for the discovery and identification of protein markers (36,37). In the case of B. anthracis, this approach has more commonly been used for the comprehensive characterization of given bacterial proteomes. For example, the proteome of vegetative cells with variable plasmid contents has been extensively studied (38 -40), as the proteomes of mature spores (41,42) and of germinating spores (43,44). Only one recent study, based on a proteo-genomic approach, was initiated to identify protein markers of B. anthracis (45). In this work, potential markers were characterized but using a very limited number of B. cereus group strains (three B. cereus and two B. thuringiensis). Moreover, this study was done on vegetative cells, whereas the spore proteome is drastically different. To our knowledge, no study has characterized and validated relevant protein markers specific to B. anthracis spores, which constitute the dissemination form of B. anthracis and are often targeted by first-line immunodetection methods (46).
Here we report comparative proteomics analyses of Bacillus anthracis/cereus/thuringiensis spores, undertaken to identify proteoforms unique to B. anthracis. Preliminary identification was performed on a limited set of Bacillus species both at the peptide (after enzymatic digestion) and protein levels by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) using a high resolution/high mass accuracy LTQ-Orbitrap instrument. The pertinence of 11 markers was further demonstrated using proteomics and genomics approaches on a representative larger set of up to 55 different strains, including the closely related B. cereus biovar anthracis CI, CA, and B. thuringiensis 9727. Lastly, as a proof-of-concept 1 The abbreviations used are: SASPs, small acid-soluble spore proteins; SRM, selected reaction monitoring; LC-MS/MS, liquid chromatography coupled to tandem mass spectrometry; IAA, iodoacetamide; RS9, 10, 30S ribosomal protein S9, S10; RL5, 28, 50S ribosomal protein L5, L28; GSP26, General Stress Protein 26; DPS1, DNA protection during starvation protein 1; IS, Internal Standard; MLST, multilocus sequence typing; MLEE, multilocus enzyme electrophoresis; AFLP, amplified fragment length polymorphism. study, we also report for four B. anthracis markers the implementation of a targeted LC-MS/MS method using selected reaction monitoring (SRM), based on the extension of a previous one focused on SASP-B (35). Preliminary results regarding method usefulness for the high throughput and accurate detection of B. anthracis spores in complex samples were also obtained and will be reported herein.
Safety Considerations-Bacillus anthracis is a highly virulent bacterium and all strains were handled according to safety considerations applicable at the Institut Pasteur (Biosafety Level 3 laboratory). After protein extraction, complete inactivation and elimination of spores were systematically checked (see below). After this control, protein samples can be handled safely using standard laboratory practice.
Bacterial Strains Included in the Study and Preparation of Spores-An overview of the 55 strains from the B. cereus group used throughout this study is provided in Table I. Presence or absence of pXO1 and pXO2 plasmids was evaluated by PCR (47). Most of the bacterial strains originated from the strain collection of the laboratory of Pathogé nie des Toxi-Infections Bacté riennes (Institut Pasteur, Paris, France). B. cereus biovar anthracis CI and CA were from the Robert Koch Institute (17,19). All strains were handled according to safety considerations applicable at the Institut Pasteur (Biosafety Levels 2 and 3). B. anthracis 9602R, RA3R, 7611R, and 6183R are derivatives of the virulent natural isolates 9602, RA3, 7611, and 6183, cured of plasmid pX02 and therefore do not produce a capsule but still produce toxins (i.e. phenotypic equivalent of the Sterne strain) (48). The CAR-P strain is derived from the virulent natural isolate CA, cured of plasmid pX02 and deleted of the pagA gene. Strains were grown in BHI medium and spore production was performed for 48 strains (see Table I) on NBY medium as previously described (49). Sporulation could not be achieved for the seven remaining strains, which were therefore only used for the genomic approach. Spores from these 48 strains were then purified on Radioselectan (Renografin 76%) using previously described methods, and were stored at 4°C in sterile deionized water until use (49).
Preparation of Protein Extracts From Bacillus Spores-Proteins were extracted from the spores by using the trifluoroacetic acid (TFA) protocol for highly pathogenic microorganisms described by Lasch et al. (50). Briefly, TFA solution was added to ϳ5.10 7 spores (to a final concentration of 80% TFA) and the mixture was vortexed for 10 min. The total spore extract was then centrifuged at 14,000 ϫ g, 4°C for 15 min. Finally, the supernatant was transferred into Millipore's Ultrafree MC filter tubes of 0.22 m pore size (Millipore, Billerica, MA) and spun at 10,000 ϫ g for 5 min. The filtrate containing the total protein extract was stored at Ϫ20°C until use. The total inactivation and elimination of spores were systematically checked after neutralization of TFA with 9% Tween-80, 0.9% Lecithin, 3% Histidine in TSB and verification of the absence of viable bacteria in each extract after cultivation on BHI agar during 3 days, as described by Lasch et al. (50). The authors demonstrated that spores of 14 strains were reliably sterilized in each of the 67 separate inactivation replicates performed with the TFA inactivation protocol. We have confirmed the full efficiency of this protocol for spore inactivation in the course of experiments performed on 55 strains. One may consider that spore extracts treated with the current protocol could be handled safely in further experiments (and potentially without any systematic control by cultivation).
Extracted samples could be then handled safely outside the Biosafety Level 3 laboratory. Each protein extract was dried by vacuum centrifugation to eliminate TFA. Proteins were then resuspended in 20 l of 50 mM ammonium bicarbonate (NH 4 HCO 3 ) buffer, pH 8.0. Each sample was then reduced by the addition of 5 l of 45 mM dithiothreitol (DTT) solution (in 50 mM NH 4 HCO 3 buffer) at 55°C for 30 min and alkylated by the addition of 5 l of 100 mM iodoacetamide (IAA) solution (in 50 mM NH 4 HCO 3 buffer) at room temperature for 45 min. Finally, enzymatic digestion was performed by the addition of 1.5 l of 1 g/l trypsin or Glu-C (in 50 mM NH 4 HCO 3 buffer) with overnight incubation at 37°C.
Marker Discovery by Untargeted Bottom-up Approach-Enzymatically digested samples were diluted in 0.1% formic acid (1:5, v/v) and injected into the LC-MS/MS system. LC-MS experiments were performed using an Accela HPLC system coupled to an LTQ-Orbitrap Discovery mass spectrometer, both from Thermo Scientific (San Jose, CA). Chromatographic separation was performed on a Zorbax SB-C18 column (150 ϫ 2.1 mm i.d., 5 m particle size, 300 Å porosity) from Agilent Technology (Palo Alto, CA). Peptides were eluted from the column at a flow rate of 200 L/min using water as mobile phase A and acetonitrile as mobile phase B, both containing 0.1% formic acid. A linear gradient from 5 to 60% B in 40 min was performed. Then the column was washed for 10 min at 95% B. The equilibration time before the next analysis was set at 10 min, so that the time between two experimental cycles was 60 min. Column temperature was maintained at 40°C. The column effluent was directly introduced into the electrospray source of the LTQ-Orbitrap Discovery mass spectrometer. Analyses were performed in the positive ion mode, and sequential MS2 experiments, based on collisioninduced dissociation (CID), were performed using the data-dependent acquisition method. The five most abundant ions (threshold 500 counts, charge states higher than ϩ1) were selected and CID fragment ions were analyzed in the linear ion trap. The source conditions were as follows: capillary temperature, 275°C; sheath gas flow, 35 arbitrary units; auxiliary gas flow, 20 arbitrary units; capillary voltage, 35 V; ESI spray voltage, 5 kV. The MS survey scan was performed from m/z 300 -1500 in the Orbitrap, using a resolution set at 30,000 (full width at half maximum, at m/z 400), and the ion population was held at 1 ϫ 10 6 through the use of automatic gain control. A normalized collision energy of 35% was used, with an activation q of 0.25 and an activation time of 30 ms. For tandem mass spectrometry in the linear ion trap, the ion population was set to 1 ϫ 10 4 and the precursor isolation width was set to 2 m/z units. FTMS data were collected in the profile mode and ion trap MS/MS data in the centroid mode. Data were acquired using the XCalibur software (version 2.0.7).
Raw files were converted to Mascot compatible .mgf files using Bioworks browser version 3. The parameters for searching were: either trypsin or Glu-C (V8-DE) as digestion enzyme, two missed cleavages allowed, charges states 2ϩ, 3ϩ, fixed modification of cysteine residues (carbamidomethyl), variable oxidation of methionine residues, deamidation of asparagine and glutamine residues, Gln3pyro-Glu (for N-terminal glutamine residues) and loss of N-terminal methionine (-131.04 Da), parent ion tolerance of 10 ppm and fragment mass tolerance of 0.7 amu. Protein identifications were validated if proteins were identified with at least two distinct peptides with a score Ͼ 20 and a protein score Ͼ 40.
Marker Discovery by Untargeted Top-down Approach-For intact protein analysis, each protein extract was diluted in 0.1% formic acid (1:10, v/v) and analyzed in reverse-phase liquid chromatography coupled to an LTQ-Orbitrap Discovery mass spectrometer (LC-MS) from Thermo Scientific. The source conditions were as follows: capillary temperature, 275°C; sheath gas flow, 30 arbitrary units; auxiliary gas flow, 3 arbitrary units; capillary voltage, 49 V; ESI spray voltage, 4.5 kV. The MS survey scan was performed in the positive ion mode from m/z 500 -2000 using a resolution set at 30,000 at m/z 400 and FTMS data were collected in profile mode. Top-down MS/MS experiments were also performed to further confirm protein identity. The precursor isolation width was set to 2 m/z units. A normalized collision energy of 35% was used, with an activation q of 0.25 and an activation time of 30 ms. Linear ion trap MS/MS data were collected in the centroid mode. These data were manually interpreted using the XCalibur software (version 2.0.7).
Multiplex SRM Assay-For each spore sample, trypsin and Glu-C digests (obtained as described above, 20 l of each) were mixed together. Stable isotope-labeled peptides corresponding to each targeted marker were synthesized to be used as internal standards (supplementary Table S1). They were mixed and diluted in 10% acetonitrile, 0.1% formic acid to prepare Internal Standard (IS) stock mixture. The N-and C terminus peptides of SASP-B protein were respectively at 100 and 400 ng/ml and the peptides from the other proteins at 500 ng/ml. Ten microliters of IS stock mixture and 50 l of 0.1% formic acid solution were added for a final sample volume of 100 l (IS mixture at a final concentration of 10 ng/ml for N terminus peptide of SASP-B, 40 ng/ml for C terminus peptide of SASP-B and 50 ng/ml for the 5 other targeted peptides). Thirty microliters of this sample mixture were used for LC-MS/MS analysis.
LC-MS/MS experiments were performed using an HP 1100 HPLC system from Agilent (Palo Alto, CA, USA) coupled to a triple quadrupole TSQ Quantum Ultra mass spectrometer (Thermo Scientific, San Jose, CA, USA). Chromatographic separation was performed on a Zorbax SB-C18 column (150 ϫ 2.1 mm i.d., 5 m particle size, 300 Å porosity) from Agilent Technology. Mass spectrometry and HPLC conditions were optimized using the synthetic peptides of the targeted proteins. The column temperature was maintained at 60°C. Peptides were eluted from the column at a flow rate of 200 l/min using water as mobile phase A and acetonitrile as mobile phase B, both containing 0.1% formic acid. A first linear gradient from 5 to 24% B in 13 min was performed followed by a rapid gradient from 13 to 13.5 min to reach 38% B. The third linear gradient ranged between 13.5 and 18 min to reach 43% B. The column was washed for 4 min at 95% B and the equilibration time before the next analysis was set at 8 min, so that the time between two cycles was 30 min. The column effluent was directly introduced into the electrospray source of the mass spectrometer. Analyses were performed in the positive ion mode. The electrospray voltage and the capillary voltage were set at 3.9 kV and 35 V, respectively. The sheath, ion sweep, and auxiliary gas flow rates (nitrogen) were optimized at 40, 20, and 15 (arbitrary units), respectively, while the drying gas temperature was set at 350°C. The SRM method had the following parameters: 0.100 m/z scan width, 0.1 s scan time, 0.7 Q1, 0.7 Q3, 1.5 mTorr Q2 pressure (using argon as the collision gas). Two SRM transitions were monitored for each peptide to improve detection specificity. The fragment ions for each transition were selected to carry the amino acids char-acteristic of the B. anthracis proteoform with the aim of increasing method specificity. Moreover, when possible, product ions were selected with an m/z ratio higher than that of the parent ion. A signal was considered positive when peak intensity was three times above the noise level for both selected transitions. The SRM acquisition time was divided into five segments. For a given peptide, the retention time observed for the two transitions monitored and the ratio between their peak areas must match those obtained with their corresponding IS peptides. Supplementary Table S1 reports the SRM transitions designed and monitored for each peptide and the optimized analytical parameters (tube lens and collision energy values). Data were acquired and analyzed using the XCalibur software (version 1.4 SR1).
PCR and Sequencing of Markers-Total genomic DNA of the 55 strains was extracted from vegetative cells using the High Pure PCR Template Preparation Kit (Roche Diagnostics, Manheim, Germany) according to the manufacturer's instructions. The design of the primers was based on the nucleotide sequence alignment obtained with different strains from the Bacillus cereus group available in GenBank, by targeting conserved regions framing each gene of interest and in order generally to amplify the complete gene. If necessary, some primers were degenerated to optimize the amplification (supplementary Table S2). PCR was performed using TaqDNA polymerase (Invitrogen, Carlsbad, CA). Amplification involved a first denaturation step at 94°C for 2 min, followed by 35 cycles of denaturation at 94°C for 30 s, annealing at 50°C for 30 s and elongation at 72°C for 1 min, and by a final cycle of elongation at 72°C for 2 min. PCR fragments were purified by ultrafiltration before sequencing (Millipore, Molsheim, France). Sequencing reactions were performed using the same PCR primers and the Big Dye Terminator v1.1 cycle sequencing kit (Applied Biosystems, Foster City, CA) and purified by ethanol precipitation. Sequence chromatograms were obtained on automated sequence analyzer ABI3730XL (Applied Biosystems). All amplicons were sequenced on both strands. For sequence analysis, the software BioNumerics version 6.5 (Applied-Maths, Sint-Martens-Latem, Belgium) was used to generate contig assembly, sequence alignments and the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) dendrogram.

Discovery of Candidate Protein Markers Specific to B. anthracis Spores
Marker discovery was first done on a limited but representative panel of 12 strains (5 B. anthracis versus 5 B. cereus and 2 B. thuringiensis), selected to reflect appropriately the natural diversity and relatedness of the B. cereus group (Table I and Fig. 1). The corresponding spore lysates were studied both at the peptide and protein levels using proteomics approaches to highlight B. anthracis-specific components ( Fig. 2A).
Identification of B. anthracis-specific Peptides Using a Bottom-up Proteomics Approach-Protein extract from each spore sample was divided into two equal parts and subjected to either trypsin or Glu-C digestion. We postulated that using two enzymes with different specificities would increase both the number of peptides/proteins identified as well as the likelihood of observing discriminating peptides incorporating mutated amino acid residues.
Marker discovery was first initiated through the in-depth analysis of the proteomics data from the five B. anthracis strains, obtained in a rather shotgun manner oriented toward major spore proteins using a fast LC gradient going from 5 to 60% acetonitrile in only 40 min and a data-dependent "top 5" MS method ( Fig. 2A). The first step of the analytical process was to perform a Mascot search against a database including solely protein sequences from B. anthracis. On average for the 5 B. anthracis strains, this database search led to the identification of 1252 tryptic and 580 Glu-C peptides per strain with an average mass accuracy better than 2 ppm for the parent ions. These peptide numbers further correspond to 228 and 122 proteins, respectively, which results in 279 distinct proteins ( Fig. 2A). The number of peptide identifications remained rather constant for each strain varying only between Ϯ 5 and 10%. Nevertheless, only a ϳ50% overlap in the peptides identified was observed between two given B. anthracis strains, as a consequence of biological variability or more probably of the well-known variability of peptide identifications among LC-MS/MS replicates (51). For instance, Tabb et al. observed similar ϳ45% overlaps in the peptides identified between pairs of replicates from both standard protein mixtures of rather low complexity as well as yeast extracts independently of the mass spectrometer used (different LTQ and LTQ-Orbitrap instruments were used in this interlaboratory study) (51). This means that even rather simple mixtures yield more peptide ions than can be sequenced during an LC-MS/MS run. To encompass this limitation, peptides identified from the 5 different B. anthracis strains were merged into a single list of 3111 peptides (2053 tryptic and 1058 Glu-C peptides, overall corresponding to 389 distinct spore proteins) that was then used for the rest of the marker discovery process ( Fig. 2A and supplementary Tables S3 and S4).
Peptides specifically belonging to B. anthracis must not be found in any B. cereus or B. thuringiensis strains. Therefore the compiled peptide list was further refined by excluding the peptides also returned by a Mascot search against a B. cereus/thuringiensis database. Only 16 peptides (11 tryptic and 5 Glu-C peptides) were consistently observed across all the five B. anthracis strains while not being detected in the digests of the seven B. cereus/thuringiensis strains, as confirmed by the thorough manual interrogation of the corresponding LC-MS data ( Fig. 2A).
As expected, these 16 peptides do not derive from B. anthracis-specific proteins but rather from proteins that have orthologs in B. cereus/thuringiensis species, with specific se-  (Table II). As an example, Fig. 3 shows the MS/MS spectra obtained of the Glu-C peptides from SASP-gamma proteoforms characteristic of B. anthracis and B. cereus species, with a few specific fragment ions. An additional level of specificity was also reached by performing Blast similarity searches against the whole Uniprot database. Thus, this step led to the identification of 15 peptides belonging to 11 distinct proteins, which could be regarded as potential candidate markers: small acid-soluble spore proteins gamma (SASP-␥) and H1 (SASP-H1), 30S ribosomal protein S10 (RS10), 50S ribosomal proteins L5 and L28 (RL5 and RL28, respectively), a putative lipoprotein, 60 kDa chaperonin, general stress protein 26 (GSP26), ribosomal silencing factor RsfS, pyrimidine-nucleoside phosphorylase, and ATP-dependent Clp protease proteolytic subunit 2 (Table II). Table III summarizes the identification data obtained for these proteins.
Identification of B. anthracis Marker Candidates by a Topdown Approach-In parallel to experiments performed on peptides, intact protein profiling by LC-MS was evaluated as a complementary approach. Such an approach has already proven valuable for classification of bacteria (52,53). Under these conditions, we were able to pinpoint ϳ140 distinct "proteoforms" (54) for each strain ( Fig. 2A), and overall detected masses were below 20 kDa. Manual comparison of the protein fingerprints obtained for the five B. anthracis and seven B. cereus/thuringiensis strains highlighted five proteins as being conserved across all the B. anthracis strains: RS10, GSP26, SASP-H1, SASP-␥, and SASP-B (Table III). These proteins were first putatively identified by matching experimentally determined masses with theoretical ones, thanks to the high mass accuracy (Ͻ10 ppm) obtained under these analytical conditions (55), and further confirmed (when possible) by performing top-down MS/MS experiments. All these proteins, except SASP-B, overlapped the proteins already identified using the peptide-centric data. Three out of these five proteins, the SASP proteins, were successfully confirmed by top-down MS/MS. Among them, we identified SASP-B, a previously characterized marker of B. anthracis spores (30 -34), which substantially validates the approach (supplementary Fig. S1). The B. anthracis-specific version of SASP-B is characterized by the simultaneous presence of two particular amino acids located at both the N-and C termini (35). Other B. cereus group bacteria have only one or none of them, thus explaining why such protein was not highlighted by the analysis performed at the peptide level. Intact protein analysis also gives complementary information on the protein sequence integrity and potential modifications. RS10 and SASP-H1 were observed as predicted from the UniProtKB/ TrEMBL database with an N-terminal methionine residue, whereas the GSP26, SASP-␥ and SASP-B proteins were detected without initial methionine (mass shift of Ϫ131.04 Da). Altogether, the results obtained at both peptides and proteins in a preliminary set of five B. anthracis and seven B. cereus/thuringiensis strains led to the discovery of 12 candidate protein markers ( Fig. 2A).

Identification of Other Putative B. anthracis Markers by Bioinformatic Analysis of the Protein List Generated by the
Bottom-up Approach-Although very efficient, bottom-up analysis is a peptide-centric marker discovery pipeline that may suffer from several drawbacks. For instance, one may

Identification of Novel Markers of B. anthracis Spores
imagine that even the use of two enzymes with different specificity (namely trypsin and Glu-C) would not exhaustively highlight the specific peptides belonging to proteoforms efficiently discriminating B. anthracis from the other members of the B. cereus group. In addition, B. anthracis-specific peptides may not be observed under the present LC-MS/MS conditions, as a consequence of their sequence or physicochemical properties (e.g. ionization efficiency, chromatographic behavior). Therefore, as a complementary approach, we also aligned the 389 B. anthracis spore protein sequences identified during the bottom-up proteomics approach against the Uniprot database using the BlastP tool to further pinpoint other potential signature proteins. In a first round, 64 candidates were highlighted, including of course the 11 proteins already identified by the peptide-centric approach ( Fig. 2A, supplementary Table S5). Such sequence-based analysis also has intrinsic limitations. Indeed, it may not predict accurately the specificity of a given protein because of the potential sequence mistakes existing in the databases, some amino acid sequences being only deduced from DNA. Therefore, additional filters were applied and implied that newly identified proteins should incorporate unique and specific amino acid substitutions and have a unique proteoform clearly identified for all the B. anthracis included in the databases. These criteria refined the list to 24 additional putative proteoforms (supplementary Table S5). Candidate Marker Validation-Markers were validated in a two-step procedure using first a proteomics and then a genomics approach. Taking into account the 12 initial strains, 55 strains (18 B. anthracis, 28 B. cereus, and nine B. thuringiensis) were targeted for the whole validation process. However, sporulation could not be achieved for seven out of these 55 strains which were therefore used only for genomicsbased validation.  (Table I, Fig.  2B). Optimal strain selection was achieved thanks to the HyperCAT database (http://mlstoslo.uio.no) (56, 57) (Fig. 1). The final strain list notably included the phylogenetically very close B. thuringiensis 9727 and B. cereus B06020 as well as recently described B. cereus var. anthracis carrying the plasmids pBCXO1 and/or pBCXO2 close to pXO1 and pX02 of B. anthracis (i.e. B. cereus G9241, B. cereus biovar anthracis CI and CA) (16 -20).

Validation of the Candidate Markers Exhibiting Specific Peptide/Protein Isoforms by LC-MS(/MS)-For
Eleven of the 12 candidate markers were confirmed as being B. anthracis-specific ( Fig. 2B and Table II). As expected, the SASP-B proteoform of B. anthracis was also detected in CA, CARP, CI and B06020 strains, and so was eliminated at this step (supplementary Fig. S2). For each of the 11 protein markers, a constant and unique proteotypic peptide se-quence was observed in all B. anthracis strains, except for the ribosomal silencing factor RsfS, the ATP-dependent Clp protease proteolytic subunit 2 and the pyrimidine-nucleoside phosphorylase, for which the unique peptide was not detected in 2/10, 2/10, and 1/10 supplementary B. anthracis strains, respectively (Table II). This was probably because of their poor detection sensitivity (as a direct consequence of the low expression levels of the corresponding proteins). It is of note that the corresponding peptides incorporating at least one amino acid modification were also detected in almost all B. cereus/thuringiensis strains (Table II).
Marker Validation by Gene Sequencing-To further confirm the data obtained at peptide-and protein-levels, as well as the uniqueness of the identified proteoform to B. anthracis, PCR amplification followed by DNA sequencing of the 11 candidate markers was performed on 55 strains (Table I). Analyses were successfully performed for every marker, except for SASP-H1 and pyrimidine-nucleoside phosphorylase for which no amplification was achieved. This may be linked to the existence of a higher variability than expected within the flanking regions of these particular genes. To be considered as reliable marker candidates, highlighted proteins had to fulfill both proteomics and genomics validation steps for every strain. For this reason, despite being confirmed by the proteomics approach, SASP-H1 and pyrimidine-nucleoside phosphorylase were excluded from the final list of markers, but this does not completely preclude their potential as markers. One interesting finding is that translation of the experimentally determined nucleotide sequences into amino acids fully confirmed the peptide sequences identified by proteomics for characteristic peptides originating from B. anthracis, B. cereus and B. thuringiensis (supplementary Table S6).
In addition, among the 24 putative proteins highlighted by a sequence similarity search as mentioned above, three supplementary markers were also selected for DNA sequencing: DNA protection during starvation protein 1 (DPS1), 30S ribosomal protein S9 (RS9) and phosphoglycerate kinase (supplementary Table S5, Table III). We reasoned that the protein candidates having the highest chance of success are the most abundant ones, and thus exhibiting the highest sequence coverage (Ͼ50%) ( Table S-3). Interestingly, the relevance of 2 of these proteins (DPS1 and RS9) was demonstrated by gene sequencing, thus showing the complementary nature of the investigated approaches.
For the sake of clarity, sequencing data are represented as dendrograms as shown in Fig. 4 for SASP-gamma (see supplementary

Identification of Novel Markers of B. anthracis Spores
was reduced when translated into amino acids, as several nucleotide changes coded for silent mutations. For example, the 18 different SASP-gamma gene sequences coded for seven distinct proteoforms, as can also be deduced from the proteomics data (Fig. 4, supplementary Table S6). The majority of identified sequences matched with proteoforms previously described in databases; however a few new ones were highlighted (noted as "new" in supplementary Table S6).
Altogether these data demonstrate a perfect match between proteomics and genomics data for the 11 markers.

Development of a Targeted Multiplex SRM Assay for Further Validation and Unambiguous Detection of B. anthracis Spores in Complex Samples
Development of the LC-SRM Assay-Among the 11 characterized markers, only nine exhibit specific trypsin or Glu-C peptides that could be further used for the development of a targeted SRM assay (Table II). Moreover, peptide selection should be refined by considering peptides the most suitable for robust and sensitive LC-SRM detection, that is, containing eight to 25 amino acids, showing good chromatographic behavior and ionization efficiency, while not containing any methionine, cysteine, or tryptophan residues that could undergo oxidation during sample handling. Peptides with two neighboring basic amino acids or RP/KP sequences were also avoided (58). Such rather stringent criteria (especially when dealing with already filtered species-specific peptides) produced a short list of four peptides corresponding to the four following proteins: SASP-gamma, 30S ribosomal protein S10, putative lipoprotein, and 60 kDa chaperonin proteins. The strategy was to monitor in a single analysis the proteotypic peptides corresponding to the B. anthracis unique proteoforms. We have recently described an LC-SRM approach targeting SASP-B for the specific detection of B. anthracis spores in complex environmental samples, following immunocapture of spores (35). B. anthracis spores have a common SASP-B proteoform, which could be targeted by monitoring a combination of two particular proteotypic peptides after a B. anthracis-specific immunocapture step (35), whereas B. cereus and B. thuringiensis share at least one of these different peptides (with some exceptions as above mentioned). Hence, we extended the scope of this first LC-SRM approach by including the four new markers described above. Consequently, the final method involved the monitoring of five distinct proteins for the specific detection of B. anthracis spores.
Following optimization of MS and MS/MS conditions on labeled and unlabeled synthetic peptides, two SRM transitions were monitored per peptide for improved specificity (supplementary Table S1). The final LC-MS/MS method was performed within 30 min and implies the monitoring of 28 different SRM transitions, corresponding to the five different markers (supplementary Fig. S4). Spore extracts from the 48 strains previously used for the marker discovery phase (Table   I) were screened with this multiplex LC-SRM assay. Typical LC-MS/MS chromatograms obtained for spores from B. anthracis (i.e. B. anthracis 9602) and a closely related B. cereus strain (i.e. Bcbva CI) are shown in Fig. 5. As expected, positive signals corresponding to the 4 newly identified markers were observed for B. anthracis strains (Fig. 5A). In addition, no signal corresponding to these proteins was detected in any other strains, even those closely related to B. anthracis such as Bcbva CI (Fig. 5B). All these results confirm the high selectivity of the identified markers. The situation was quite different for SASP-B, because we experimentally confirmed that the closely related B. cereus strains (CI, CA and B06020) have the same SASP-B proteoform as B. anthracis (Fig. 5B). It is of note that these particular strains were not initially included in our first work focused on SASP-B (35). These data demonstrate that careful strain selection is mandatory for achieving reliable marker identification.  Fig. S5A). In contrast, the four new markers clearly discriminated B. anthracis spores even when they were 100-fold less abundant than the B. cereus ones (supplementary Fig. S5B). Moreover, SRM detection proved to be linear over a 2-log concentration range, thus demonstrating that matrix components do not significantly impact the method's robustness, as illustrated for the SASP-gamma signature peptide (supplementary Fig. S6).

DISCUSSION
The discovery of protein markers specific to B. anthracis is hampered by the high genetic similarity existing within the B. cereus group (5). Such genetic relatedness is so high that some authors have raised the possibility that the different members of the B. cereus group could originate from one and the same species with varying plasmid contents (the virulence plasmids pXO1 and pXO2 encoding for the toxins and the capsule in B. anthracis strains, respectively, and the plasmid encoding for insecticidal toxins in B. thuringiensis strains) (59). These similarities further translated at the protein level, with the almost impossible distinction of B. anthracis from closely related B. cereus and B. thuringiensis by MALDI-TOF MS protein profiling (30). Therefore, the identification of specific and reliable markers of B. anthracis spores implies that a consistent panel of closely related strains of B. cereus and B. thuringiensis has to be studied, to accurately reflect the naturally occurring relatedness.
With this aim, we took special care to constitute a panel of 55 relevant strains, to ensure reliable identification of specific markers for the robust detection of B. anthracis. Two key features were considered (i) the phylogenetic proximity to B. anthracis strains (5), and (ii) the presence of the virulence plasmids (16 -18). The HyperCAT database is the most complete phylogenetic dataset for the B. cereus group with a total of 2297 bacterial isolates (as of March 17, 2012), confidently and thoroughly characterized by various gene-based approaches: MultiLocus Sequence Typing (MLST), MultiLocus Enzyme Electrophoresis (MLEE), and Amplified Fragment Length Polymorphism (AFLP) (56,57). Based on these analyses, all the B. anthracis strains fall in the same cluster (cluster III according to the HyperCAT database), along with a few closely related B. cereus and B. thuringiensis strains (Fig. 1) (5). Therefore, our selection primarily targeted this cluster (seven B. anthracis, 14 B. cereus, and one B. thuringiensis). It is of particular interest that this first set of strains included the B. cereus strain G9241, associated with severe pneumonia and carrying an almost complete pXO1-like plasmid (99.6% sequence similarity in shared regions) (16); and above all, the two recently described B. cereus biovar anthracis strains (CI and CA) recovered from great apes killed by an anthrax-like disease in tropical forests of Ivory Coast and Cameroon, and carrying two plasmids, pBCXO1 and pBCXO2 close to pXO1 and pXO2 of B. anthracis (17,18). These two plasmids carry the anthrax toxin and the poly-␥-D-glutamate capsule biosynthesis genes, respectively (17,18), which underlines the B. anthracis-like nature of these strains and the associated risk of false-positive detection if not taken into account. To the best of our knowledge, no similar other in-depth spore proteomics study including these two particular strains has ever been undertaken. Six other B. cereus strains belonging to the more distant clusters II and IV were also considered. Last but In each case, the signal of the corresponding isotopically labeled peptide is also shown and noted IS (internal standard). These data were obtained using 10 7 spores. not least, 27 other strains from the Institut Pasteur collection (11 B . anthracis and 16 B. cereus/B. thuringiensis), for which no phylogenetic classification was available, were included. Another point of interest is that the different B. anthracis strains included in this study take into consideration the possible loss of either the pXO2 plasmid and/or less commonly the pXO1 plasmid (1). Indeed, among the 18 B. anthracis strains, nine harbor pXO1 and pXO2, six only pXO1, two only pXO2, and one neither pXO1 nor pXO2 plasmids (Table I).
Overall, the total number of strains studied was 55 (Table I).
Taking into account the high homogeneity within the B. cereus group, and based on our first experimental data, it rapidly became obvious that we would not be able to highlight protein expressed exclusively in B. anthracis spores, but rather proteins that differ by a few (or even one) amino acid residues from closely related species. TFA-extracted spore proteins were submitted to trypsin and Glu-C digestion, and then analyzed by a single-step LC-MS/MS approach performed on a high resolution/high mass accuracy LTQ-Orbitrap instrument. Although not initially intended to reach comprehensive spore proteome coverage, our shotgun bottom-up proteomics approach identified ϳ400 distinct proteins (with Ն 2 unique peptides) in B. anthracis spores, well exceeding the ϳ200 -250 distinct proteins reported in previous studies using slightly different sample preparation protocols and 2D gel electrophoresis/mass spectrometry (41,42,60). The most exhaustive proteomics study described ϳ750 proteins but involved multidimensional liquid chromatography coupled to tandem mass spectrometry to go deeper into the B. anthracis spore proteome and identify less abundant proteins (44). Although less exhaustive than the latter study, our results perfectly meet the initial objective of studying the most abundant spore proteins to highlight easily detectable targets for the most sensitive detection of B. anthracis spores.
On a preliminary set of 12 strains (five B. anthracis, five B. cereus, and two B. thuringiensis), LC-MS/MS analyses highlighted 15 peptides (representing 11 distinct proteins) that were consistently observed across all B. anthracis while not being detected in B. cereus/thuringiensis strains (Fig. 2).
For diverse reasons linked to peptide detectability or enzyme specificity, such bottom-up conditions may not always pinpoint B. anthracis-specific peptides. To encompass this limitation, a complementary approach involving the profiling of protein species (with molecular weights up to ϳ20 kDa) was performed and identified only one additional protein, thus leading to a total of 12 distinct proteins potentially specific for B. anthracis (Fig. 2). Moreover, three additional protein candidates were retrieved from a bioinformatics-based screening of all the data gathered from proteomics (DPS1, RS9 and phosphoglycerate kinase).
Marker validation was performed by extending the number of strains to 55. This panel of strains was interrogated by a combination of bottom-up proteomics, intact protein profiling, bioinformatics-based and gene sequencing approaches, to guarantee the comprehensiveness and robustness of the marker discovery pipeline. This strategy confirmed the relevance of 11 proteins, but excluded phosphoglycerate kinase, pyrimidine-nucleoside phosphorylase, SASP-H1 and SASP-B proteins. Inefficient DNA amplification prevented SASP-H1, pyrimidine-nucleoside phosphorylase and phosphoglycerate kinase from being fully confirmed, but it does not completely preclude them from being regarded as potential B. anthracis markers. SASP-B is often used as a marker of B. anthracis spores (30 -34), but it was clearly invalidated when analyzing 3 closely related B. cereus strains (i.e. CA, CI, and B06020). The latter clearly demonstrates that marker specificity must be constantly re-evaluated, especially by working on closely related strains, to minimize the false-positive rate.
Gene sequencing confirmed marker identification and allowed us to go deeper into protein sequence characterization. For the 55 strains experimentally analyzed, the uniqueness of the B. anthracis proteoforms was demonstrated and perfectly matched with proteomics results. Furthermore, translation of the determined nucleotide sequences complemented the amino acid ones, by shedding light on unobserved peptide sequences in the bottom-up analysis. This is of great interest for the two proteins only retrieved from the sequence-based analysis (i.e. DPS1 and RS9), for which no proteotypic peptide was experimentally observed. These results well illustrate the complementarity between gene sequencing and proteomics.
All the data resulting from proteomics and genomics analyses confirmed that B. anthracis is highly monomorphic ( Tables II and supplementary Table S6). Conversely, a great diversity in sequences was observed for the B. cereus and B. thuringiensis strains. For instance, the 18 B. anthracis strains have exactly the same nucleic acid sequence for SASPgamma whereas the 37 B. cereus and B. thuringiensis strains were divided into 18 different DNA clusters that further translate into seven distinct proteoforms (Fig. 4, supplementary Table S6). This was in concordance with previous DNA homology studies indicating that B. anthracis strains have a similarity greater than 90% and therefore are one of the most molecularly homogenous bacteria known (59,61). The great homogeneity of B. anthracis in the natural environment may be the result of the organism spending the majority of its life as a dormant spore (62). In this particular state, B. anthracis is not exposed to DNA-altering events such as the presence of phages and constant replication of DNA (59,61); this considerably limits the amount of genetic diversity found among isolates of this species (62) compared with B. cereus, which more commonly exists in the environment as vegetative cells (63). Regarding the markers highlighted in this study, ϳ5 distinct B. cereus proteoforms exist on average for a given protein versus a single B. anthracis proteoform (supplementary Table S6). This great inter-strain homogeneity for B. anthracis is an important point as it intrinsically minimizes the risk of false-negative results (because of the misdetection of a potentially mutated peptide by targeted approaches) and in-creases confidence in marker robustness in the case of emerging strains.
Marker identification was performed on spores and therefore identified proteins expressed in this particular bacterial state. Among the 11 markers highlighted, we identified SASPgamma which is encoded by the sspE gene and is linked to amino acid storage (64). This protein is part of a protein group (SASPs) present in large amounts in the core of Bacillus spores comprising ϳ15% of the protein content (60), and protects spores from a number of external factors such as chemical and enzymatic cleavage and UV light (65). SASPs are recognized as potential markers for bacterial identification (31). These proteins are produced exclusively at the bacterial sporulation stage and are degraded during spore germination (65). Their concentration is so high in spores that even traces of spores in huge amounts of vegetative cells would result in strong SASPs signals. Indeed, Lasch et al. have previously reported that ϳ50% of their preparations of vegetative cells from 374 Bacillus strains contained detectable amounts of SASPs (30). This suggests that SASPs would represent particularly pertinent protein markers when B. anthracis detection is envisioned (whatever the targeted form, i.e. spores or cells).
We also identified 4 ribosomal proteins as potential markers, which is not so surprising because up to 21% of the cell's overall protein content is ribosomal (66).This is also in good agreement with previous MALDI-TOF studies of intact microorganisms (30,67). Among the other markers, we found proteins involved in metabolism (putative lipoprotein, ribosomal silencing factor RsfS, CH60), a protease (ATP-dependent Clp protease proteolytic subunit 2), and proteins involved in the response to stress (GSP26, DPS1). One could easily imagine that some of the newly characterized B. anthracis markers could also be detected after spore germination, such as ribosomal proteins (RL5, RL28, RS9, and RS10) or other proteins involved in metabolism (CH60 or RsfS), because germination is accompanied by the resumption of cell growth and metabolic activity. For example, Clp-protease and CH60 were detected in similar amounts in dormant and germinating spores by Huang and colleagues (43) whereas CH60 has been previously identified in the proteome of B. anthracis vegetative cells (68,69). In addition, ribosomal proteins, more generally speaking, also represent abundant proteins in vegetative cells.
These 11 specific markers of B. anthracis spores are encoded by the chromosome and have never been previously characterized. With a view to accurate identification of B. anthracis spores, these markers provide complementary results to the virulence markers encoded in pXO1 and pXO2 plasmids.
In a subsequent step, a multiplex SRM assay was designed to monitor four of the particular proteoforms of B. anthracis spores that were selected for their ability to generate a proteotypic peptide on enzymatic digestion. The method confirmed the positive detection of the spore of the 15 B. anthra-cis strains and the absence of interfering signals for the 33 B. cereus/B. thuringiensis strains tested. This method represents a very efficient tool for the validation of marker specificity by screening hundreds or thousands of B. cereus/B. thuringiensis strains, as those used, for example, in the reference phylogenetic classification of Kolsto et al. (http://mlstoslo.uio.no) (56,57). Furthermore, it could be used as a specific, sensitive and high-throughput detection method of B. anthracis spores in complex environmental samples. Indeed, targeted proteomics on triple quadrupole instruments (SRM acquisition mode) offers an attractive approach to the detection and/or quantification of proteins (70,71). We have previously demonstrated its high sensitivity in the context of B. anthracis spore detection by targeting SASP-B protein without any prior culture step (35). The value of such an approach was illustrated here with the specific multiplexed detection of B. anthracis spores in a 100-fold excess of complex spore mixtures containing several B. cereus strains. CONCLUSION We present here a proteomics (bottom-up and top-down) methodology combined with bioinformatics and genomics approaches applied to the identification of specific markers of B. anthracis spores. Using such an approach, 11 novel specific markers were successfully identified. Marker specificity was further validated by screening a large panel of 55 distinct strains chosen from B. anthracis and closely related B. cereus/thuringiensis species. Interestingly, the chromosomal location of the genes encoding these new markers also provides the opportunity to develop a robust genomic assay that would constitute an attractive alternative or complement to the existing methods targeting the B. anthracis virulence plasmids. In continuity with our previous work focused on SASP-B, we sought to implement a multiplex SRM method by targeting the four most suitable (mainly in terms of MS detection) markers among the 11 identified for the rapid and robust screening of B. anthracis spores. In addition, such a method could also be extended to the direct sensitive detection of B. anthracis spores in environmental samples without any prior culture step.
* This project was supported by the joint ministerial program of R&D against Chemical, Biological, Radiological, Nuclear and Explosive (CBRNE) risks.
□ S This article contains supplemental Figs. S1 to S6 and Tables S1 to S4.