O-Glycosylation of the N-terminal Region of the Serine-rich Adhesin Srr1 of Streptococcus agalactiae Explored by Mass Spectrometry *

Serine-rich (Srr) proteins exposed at the surface of Gram-positive bacteria are a family of adhesins that contribute to the virulence of pathogenic staphylococci and streptococci. Lectin-binding experiments have previously shown that Srr proteins are heavily glycosylated. We report here the first mass-spectrometry analysis of the glycosylation of Streptococcus agalactiae Srr1. After Srr1 enrichment and trypsin digestion, potential glycopeptides were identified in collision induced dissociation spectra using X! Tandem. The approach was then refined using higher energy collisional dissociation fragmentation which led to the simultaneous loss of sugar residues, production of diagnostic oxonium ions and backbone fragmentation for glycopeptides. This feature was exploited in a new open source software tool (SpectrumFinder) developed for this work. By combining these approaches, 27 glycopeptides corresponding to six different segments of the N-terminal region of Srr1 [93–639] were identified. Our data unambiguously indicate that the same protein residue can be modified with different glycan combinations including N-acetylhexosamine, hexose, and a novel modification that was identified as O-acetylated-N-acetylhexosamine. Lectin binding and monosaccharide composition analysis strongly suggested that HexNAc and Hex correspond to N-acetylglucosamine and glucose, respectively. The same protein segment can be modified with a variety of glycans generating a wide structural diversity of Srr1. Electron transfer dissociation was used to assign glycosylation sites leading to the unambiguous identification of six serines and one threonine residues. Analysis of purified Srr1 produced in mutant strains lacking accessory glycosyltransferase encoding genes demonstrates that O-GlcNAcylation is an initial step in Srr1 glycosylation that is likely required for subsequent decoration with Hex. In summary, our data obtained by a combination of fragmentation mass spectrometry techniques associated to a new software tool, demonstrate glycosylation heterogeneity of Srr1, characterize a new protein modification, and identify six glycosylation sites located in the N-terminal region of the protein.

acetylated-N-acetylhexosamine. Lectin binding and monosaccharide composition analysis strongly suggested that HexNAc and Hex correspond to N-acetylglucosamine and glucose, respectively. The same protein segment can be modified with a variety of glycans generating a wide structural diversity of Srr1. Electron transfer dissociation was used to assign glycosylation sites leading to the unambiguous identification of six serines and one threonine residues. Analysis of purified Srr1 produced in mutant strains lacking accessory glycosyltransferase encoding genes demonstrates that O-GlcNAcylation is an initial step in Srr1 glycosylation that is likely required for subsequent decoration with Hex. In summary, our data obtained by a combination of fragmentation mass spectrometry techniques associated to a new software tool, demonstrate glycosylation heterogeneity of Srr1, characterize a new protein modification, and identify six glycosylation sites located in the N-terminal region of the protein. Protein glycosylation takes place in bacteria. A considerable amount of information on the biochemical pathways involved in protein glycosylation has been obtained in the genera Campylobacter and Neisseria (1)(2)(3). Hence, functional studies performed on Gram-negative bacteria have revealed important properties of protein glycosylation pathways. Like in eukaryotes, proteins can be glycosylated on asparagines (N-glycosylation) or serine/threonine residues (O-glycosylation). Two modes of glycan transfer onto proteins have been characterized in Gram-negative bacteria. In the first, the activated glycan is synthesized on a lipid carrier on the cytoplasmic side of the membrane. After membrane translocation, the glycan portion of the polyprenyl-phosphate-glycan intermediate is transferred en bloc to the protein by an oligosaccharyltransferase active in the periplasm. Such mechanism has been demonstrated for N-glycosylation in Campylobacter jejuni (4 -6) and O-glycosylation in Neisseria gonorrheae (7)(8)(9). These pathways are responsible for the glycosylation of more than 65 proteins in C. jejuni (10) and 12 different proteins in N.
gonorrheae (11). The second well-characterized type of transfer is sequential and arises in the cytoplasm: glycosyltransferases (GTs) 1 catalyze the transfer of monosaccharide residues, under a nucleotide-diphosphate activated form, in a stepwise manner to the protein. This mechanism has long been considered to be restricted to O-glycosylation like in Campylobacter, Helicobacter, or Pseudomonas spp. for flagellar proteins (12)(13)(14)(15). However, it was recently demonstrated that HMW1c, a GT of Haemophilus influenzae, was responsible for the cytoplasmic N-glycosylation of the adhesin HMW1 (16,17).
Important families of glycoproteins have also been characterized in Gram-positive bacteria. The glycosylation of flagellins has been demonstrated in Clostridium spp. and in Listeria monocytogenes (13,18); the modification is required for flagella assembly in C. difficile (13) and motility in L. monocytogenes (19,20). In L. monocytogenes, the flagellin monomer is modified with N-acetylglucosamine (GlcNAc) residues at up to six different sites by GmaR, a cytoplasmic O-GlcNAc-transferase (OGT) homologous to eukaryotic OGTs (18). GmaR was the first prokaryotic enzyme of this type to be characterized (20). The type of sugar added to flagellins and the number of modified sites is different between species and is even straindependent in Clostridium difficile (13). Genomic analysis suggests that flagellin glycosylation in Clostridium spp. is performed on proteins through a sequential process in the cytoplasm (3). In mycobacteria and other actinomycetae, a general O-mannosylating system has been identified which involves a transmembrane O-mannosyltransferase similar to that found in eukaryotes (21)(22)(23)(24). Importantly, in Mycobacterium tuberculosis the disruption of the pathway responsible of adhesins mannosylation was found to strongly attenuate its virulence (21). The glycosylation of surface (S-) layer proteins which cover the cell surface of a variety of bacterial species is a common modification of this class of proteins (25,26). The extensive MS analysis performed in mutant backgrounds on the S-layer proteins of Geobacillus stearothermophilus and Paenibacillus alvei have made possible the reconstruction of the biosynthesis pathway for S-layer glycans, and also to propose a model for their export and transfer to S-layer proteins (27,28). Interestingly, the P. alvei study also revealed that tyrosine residues can be modified by a bacterial protein glycosylation pathway (28). More recently cell wall autolysins expressed in different Lactobacilli spp. have been shown to be glycosylated (29 -31); interestingly, the peptidoglycan hydrolytic activity of the autolysin Acm2 of Lactobacillus plantarum was found to be controlled by O-glycosylation (30). Although a large array of glycoproteins proteins have been characterized, it is notable that only O-type glycosylation has been described so far in Gram-positive bacteria.
Recently, a family of serine-rich (Srr) glycosylated proteins has been discovered in Streptococcus species and Staphylococcus aureus (32,33). Well-studied members include the fimbriae-associated adhesin Fap1 of Streptococcus parasanguinis (34), the human-platelet binding protein GspB of Streptococcus gordonii (35), the SraP and PsrP adhesins of Staphylococcus aureus and Streptococcus pneumoniae, respectively (36,37) and the Srr1 adhesin of S. agalactiae (36, 38 -42). These proteins are exported to the cell surface through a dedicated transport system called SecA2 and are covalently anchored to peptidoglycan. Genome comparisons revealed striking similar genetic organization of srr loci characterized by the presence of a gene cluster encoding the Srr-dedicated secretion machinery (SecA2 secretion system) and a variable number of GTs (32,33,43). The role of Srr proteins as major bacterial adhesins was first demonstrated in the oral streptococci Streptococcus gordonii and Streptococcus parasanguinis (formerly S. sanguis) where they are required for full adhesion of bacterial cells to platelets or saliva-coated hydroxylapatite, respectively (34,44). In the human pathogens S. aureus and S. pneumoniae, the Srr proteins corresponding to SraP and PrsP were shown to promote adhesion to platelets and lung cells, respectively (36,37,45). Importantly, Srr proteins have been associated with the virulence of all of these opportunistic pathogens (34,36,37,45,46).
Streptococcus agalactiae is the leading cause of sepsis and meningitis in neonates (47,48). The species colonizes asymptomatically the mucosa of 20 -30% of the human population and is an increasing cause of invasive infections in immunocompromised adults (49). S. agalactiae Srr1 is involved in adhesion to endothelial and epithelial cells (38 -42,50). Such surface proteins are attractive vaccine candidates and S. agalactiae Srr protein has been shown to confer immunogenic protection in mice (51).
Srr1 binds both keratin 4 and fibrinogen through the same N-terminal binding region located between amino acid residues 303 and 641 (Fig. 1A) (39,42,52). Upstream of this binding domain, Srr1 displays two non-repeated regions (NR1 and NR2) showing no homology with known proteins which are separated by a first serine-rich region (SR1) where the proportion of the Ser/Thr residues is above 40%. The C-terminal part of Srr1 (SR2) contains more than 70% of Ser/Thr residues with 140 repetitions of the SASM/T motif that extends over 700 residues ( Fig. 1A and supplemental Fig. S1). The C terminus also contains a LPXTG motif that is recog- 1 The abbreviations used are: GT, glycosyltransferase; BR, binding region; BSN, Bjerrum and Shaffer Nielsen buffer; CID, collision induced dissociation; ETD, electron transfer dissociation; Glc, glucose; GlcNAc, N-acetylglucosamine; HCD, higher energy collisional dissociation; Hex, hexose; HexNAc, N-acetylhexosamine; ISF, in source fragmentation; MS/MS, tandem mass spectrometry; NCE, normalized collision energy; NR, non repeat region; O-AcHexNAc, O-acetylated-N-acetylhexosamine; PAGE, polyacrylamide gel electrophoresis; SF, Spectrum-Finder; SR, serine-rich (region); Srr, serine-rich (protein); sWGA, succinylated wheat germ agglutinin; TFA, trifluoroacetic acid; TH, Todd-Hewitt broth; Tris, 2-amino-2-hydroxymethyl-1,3-propanediol; WGA, wheat germ agglutinin; XIC, extracted ion chromatogram. nized by sortase for covalent attachment of Srr1 to the peptidoglycan (53). The Srr proteins synthesized in Streptococcus spp. and S. aureus exhibit a similar organization with variation in the functional binding domains and in the length of the C-terminal SR domain (32). We previously carried out an extensive functional analysis of the srr1-secA2 locus of S. agalactiae (38) (Fig. 1B). Genetic and biochemical data showed that the GTs GtfA and GtfB were essential for the addition of a N-acetylglucosamine (GlcNAc) to Srr1 and that six other GTs (GtfC to GtfH) were potentially involved in additional glycan modifications (38). It was also demonstrated for Srr1 and GspB that glycosylation takes place in the cytoplasm (38,54).
We report here the first detailed analysis of Srr1 glycosylation at the molecular level by means of mass spectrometry. Lectin affinity chromatography was used to enrich Srr1 before trypsin digestion Fig. 1C. Tryptic peptides were then analyzed by nanoLC-tandem MS (MS/MS) using various activation techniques. A dedicated software tool SpectrumFinder (SF), allowing the retrieval of glycopeptides after higher energy collisional dissociation (HCD) fragmentation was developed for this study. This combination of approaches led to the identification of 27 glycopeptides in the N-terminal part of the protein, outside the region containing the highest serine density. We demonstrated that besides glycosylation by N-acetylhexosamine (HexNAc) or hexose (Hex) residues, Srr1 can also be modified by an unexpected sugar characterized as Oacetylated-N-acetylhexosamine (O-AcHexNAc). To localize glycosylation sites, electron transfer dissociation (ETD) experiments were performed. We were able to identify six glycosy-lation sites that were associated to five serine and one threonine residues.

EXPERIMENTAL PROCEDURES
Chemicals-Cultures media were from BD (Beckton, Dickinson and Company, Sparks, MD). Water used was ultra-pure water obtained with a Milli-Q purifier (Millipore SAS, Molsheim, France). Lectins were from Vector laboratories (Vector Laboratories, Burlingame, CA, USA). The other chemicals were from Sigma Aldrich (Sigma-Aldrich, St. Louis, MO, USA) except when indicated. Antibodies were prepared as described (15).
Bacterial Strains, Culture Media and Growth Conditions-Streptococcus agalactiae H36B is a serotype Ib strain whose genome has been sequenced (29). This strain was chosen because of its high Srr1 production level, as compared with S. agalactiae NEM316 strain. To produce Srr1, we used the H36BSrtA* mutant strain where the catalytic residue Cys206 of the sortase A protein was changed into alanine by engineering the srtA gene (gbs0949) as described (55). As a consequence, the secreted proteins carrying a LPXTG motif like Srr1 were more efficiently released into the culture medium (53). This mutation does not affect Srr1 glycosylation as this post-translational modification is a cytoplasmic event taking place prior to the export process (38). To study the role of accessory GTs (GtfC-H) and the sequence of glycosylation events, the strain H36BSrtA*⌬gtfCH was constructed as previously described (38). S. agalactiae strains were cultured in Todd-Hewitt (TH) broth in standing filled flasks at 37°C. Srr1 Protein Production-Bacterial cell were removed from culture medium (1 liter) by centrifugation (10 min at 6000 ϫ g) and the supernatant was filtered to 0.22 m. The proteins released in medium during growth were precipitated overnight with ammonium sulfate (30%) at 4°C under gentle agitation. Protein pellet obtained after centrifugation (15 min at 15,000 ϫ g) was resuspended in 10 ml Wheat germ agglutinin (WGA) buffer (20 mM Tris, and 200 mM NaCl, pH 7.5) and dialyzed on Float-a-lyzer G2 10kDa (Spectrum lab, Breda, NL) overnight against WGA buffer. For lectin affinity chromatography, Light Green for accessory GTs-encoding genes (gtfC to gtfH), dark green for essential GTs-encoding genes (gtfA and gtfB). In blue, the genes encoding proteins related to Srr1 secretion. In the opposite transcriptional orientation, in red, the rga gene encodes a positive transcriptional regulator of srr1 (gray). The embrace highlights the region deleted in the H36BSrtA*⌬gtfCH mutant strain. C, SDS-PAGE of Srr1 containing fractions. Lane 1: Supernatant of the medium after an over-night culture of H36B SrtA* strain. Lane 2: Supernatant concentrated 20-fold. Lane 3: Srr1 containing fraction after purification on WGA agarose column. Lane 4: Western blot analysis using polyclonal anti-Srr1 serum. Lane 5: Lectin blot analysis using biotinylated WGA. Arrowheads point to Srr1 band. The same proportion (5%) of the samples were loaded before (lane 2) and after (lane3) purification to highlight the efficiency of the purification procedure. a 5/50 column (Amersham Bioscience) was packed with one ml WGA-agarose conjugate. The chromatographic steps were run on an Ä kta purifier (Amersham Bioscience). The column was thoroughly washed with WGA buffer and protein sample was loaded at 0.1 ml/min at 4°C. After a washing step with 10 ml WGA buffer, WGA bound proteins were eluted with 1 ml elution buffer (20 mM Tris, 200 mM NaCl, and 500 mM N-acetyl-D-glucosamine, pH 7.5) at 0.5 ml/min. Protein fractions were extensively dialyzed against 20 mM Tris, 200 mM NaCl, pH 7.5 on 10 KDa cut-off membrane Amicon Ultra, Millipore, Billerica, MA. Concentrated fractions were stored at Ϫ20°C until further analysis. For the comparative analysis of Srr1 produced in H36BSrtA* and H36BSrtA*⌬gtfCH, the culture supernatants were concentrated 100-fold and loaded on SDS-PAGE (see next section). The Coomassie-stained protein band corresponding to Srr1 was excised and trypsin-treated as described below (Generation of Srr1 tryptic peptides).
SDS-PAGE, Blotting, Lectin Binding, and Immunodetection-Protein fractions were resolved on precast Criterion XT Bis Tris Gel 4 -12% run in XT-MOPS buffer (Bio-Rad Life Sciences, Marnes-la-Coquette, France). Gels were stained using colloidal Coomassie dye G250 (GelCode, Thermo Fisher Scientific Inc, Rockford, IL, USA). Compositional Analysis of Srr1-associated Monosaccharides-Monosaccharide composition of purified Srr1 was determined as trimethylsilyl derivatives by gas chromatography and mass spectrometry (GC-MS) after acid hydrolysis. Purified Srr1 was spotted on PVDF membrane and extensively washed in water. Hydrolysis of glycosidic bond was achieved by transferring the membrane in 100 l of 4 M trifluoroacetic acid (TFA) for three hours at 110°C.
The hydrolysate was dried under vacuum and resuspended in 100 l of the derivatization mixture containing 1.34 M N-methyl-N-(trimethyl-silyl)-trifluoroacetamide (MSTFA) (Sigma-Aldrich) and 0.5 mM of xylose in pyridine as internal standard. The trimethylsilyl derivatives of carbohydrates were analyzed by GC-MS with an Agilent system (GC 6890ϩ and MS 5973N, Agilent Technologies, Santa Clara, CA). Samples were injected with an automatic injector (Gerstel PAL, Sursee, Switzerland). Gas chromatography was performed on a 30 m ZB-50 column with 0.25 mm inner diameter and 0.25 m film thicknesses (Phenomenex, Torrence, CA). Helium was used as the carrier gas and set at a constant flow rate of 1.5 ml/min. The temperature program was 5 min isothermal heating at 80°C, followed by a 20°C/ min oven temperature ramp to 300°C, and a final 3 min heating at 300°C. Compounds were identified by both their retention time and comparison of their electron ionization mass spectra profiles with those of the NIST 05 Mass spectral library (Scientific Instrument Services, Ringoes, NJ, USA). The quantification was done using an external standard calibration curves for each molecule (2.5-25 nmol injected) established with the peak area of specific ion and expressed in nmol/mg cell walls.
Srr1 Tryptic Digestion-Coomassie-stained Srr1 bands detected on gel were cut and rinsed in water. Tryptic digestion was performed in-gel by adding 200 ng sequencing grade modified trypsin (Promega France, Charbonniè res, France) diluted in 25 mM NH 4 HCO 3 for 18 h at 37°C. Tryptic peptides were recovered by washing the gel pieces twice in 0.2% TFA-50% acetonitrile and once in 100% acetonitrile and the supernatant was evaporated to dryness. Srr1 alkylation was deliberately omitted before mass spectrometry analysis to avoid formation of Scarbamidomethylmethionine (CMM) on methionyl residues resulting from iodoacetamide treatment (there are 45 methionines in Srr1) (56). Upon collision induced dissociation (CID) fragmentation, CMM gives a typical neutral loss of 105 Da that is interfering with our manual research of glycan neutral loss. Srr1 contains only one cysteine residue.
Separated peptides were analyzed on-line with a LTQ-Orbitrap Discovery mass spectrometer (Thermo Electron SAS, Villebon, France) using a nanoelectrospray interface. Ionization (1.5 kV ionization potential) was performed in positive mode with liquid junction and a capillary probe (10 m i.d.; New Objective). Peptide ions were analyzed using Xcalibur 2.07 with the following data-dependent acquisition steps: 1) full MS scan in orbitrap (mass-to-charge ratio (m/z) 600 to 2000, profil mode, resolution 30,000 at m/z 400); and 2) MS/MS in linear trap (qz 0.22, activation time 50 ms, and normalized collision energy (NCE) 35%; centroid mode).
Step 2 was repeated for the four major ions detected in step 1. Singly charged ions were excluded and dynamic exclusion time was set to 60 s. To enhance mass accuracy in Orbitrap mass analyzer, the lock mass option was activated on dimethylcyclosiloxan (m/z 667.1764).
In Source Fragmentation (ISF) in LTQ-Orbitrap-ISF was used to form oxonium ions of interest by applying a capillary voltage of 80 V. Oxonium ions were further fragmented by CID in the LTQ using 40 NCE.
LC-MS/MS (HCD) of Srr1 Tryptic Digest-LC was performed with a NanoLC-Ultra Eksigent system. The sample was loaded at a flow rate of 7.5 l/min onto a precolumn cartridge (Biosphere C18, 5 m, 20 mm, 100 m i.d.; NanoSeparations, Nieuwkoop, NL) with 0.1% (v/v) formic acid and 2% acetonitrile. After 3 min, the precolumn cartridge was connected to the separating column (Biosphere C18, 3 m, 150 mm, 75 m i.d.; NanoSeparations Nieuwkoop, NL). The buffers used were water (buffer A) and acetonitrile (buffer B) each containing 0.1% (v/v) formic acid. Peptides were separated using a linear gradient from 5% to 35% B for 37 min at 300 nl/min. A Q Exactive mass spectrometer (ThermoFisher Scientific) with nanoelectrospray interface was used for peptide analysis. Ionization (1.5 kV ionization potential) was performed in positive mode using a liquid junction and a capillary probe (10 m i.d.; New Objective). Peptide ions were analyzed using Xcalibur 2.2 with the following data-dependent acquisition steps: 1) full MS scan (mass-to-charge ratio (m/z) 600 to 2500, resolution 70,000 at m/z 400); and 2) MS/MS scan using Quadrupole selection (window of 3 Th) and HCD (NCE 30, resolution 17,500).
Step 2 was repeated for the ten most intense ions (top 10) detected in step 1. Singly charged ions were excluded and dynamic exclusion time was set to 40 s. Lock mass option was activated using m/z 667.1764 of dimethylcyclosiloxan.
Two types of acquisition (using Xcalibur 2.1) were used. The first one was a combined CID/ETD run and is based on a i) full MS scan in the Orbitrap (m/z 300 to 2000) at resolution 15,000 (at m/z 400) and ii) CID and ETD MS/MS spectra (max 5 precursor ions) with analysis in the linear ion trap. In the second type of run, only ETD was used for MS/MS including all glycopeptide ions previously identified. The maximum trapping time, the activation time with fluoranthene and the NCE were respectively fixed at 200 ms, 180 ms, and 35%. All ETD spectra were manually inspected.
X! Tandem Analysis-The analysis of CID and HCD data was performed with X! Tandem CYCLONE (2011.12.01.1). The S. agalactiae H36B complete database (http://www.ncbi.nlm.nih.gov/ genome/186?project_id ϭ 54315) and a contaminant database (trypsin, keratins, and others classical contaminants) were used. Enzymatic cleavage was set as a trypsin digestion ([RK] (15)) with multi possible miscleavage. Precursor mass and fragment mass tolerance were set to 10 ppm and 0.5 Da, respectively for LTQ Orbitrap data and to 10 ppm for Q Exactive data. Methionine oxidation was taken into account as possible modification. A refinement search was performed that accepts semi-enzymatic cleavage of peptide and amino terminus acetylation (ϩ42.0105). Only peptides with an E value smaller than 0.05 were considered for identification.
To identify glycopeptides in CID data, Ser/Thr/Tyr variable modification with HexNAc (ϩ203.0793 Da), O-AcHexNAc (ϩ245.0912 Da), and Hex (ϩ162.0528 Da) were introduced in the search parameters. In spectra typical of neutral loss events, the intensity of b and y ions generated by peptide backbone fragmentation is much lower than the ion formed by the glycan loss. To take this effect into account, the dynamic range for scoring spectra was set to 1000.
Development of a New Software Tool for Spectra Comparison (SpectrumFinder)-The main goal of SpectrumFinder is to compare pairs of MS/MS spectra, taking into account the fact that the same peptide sequence carrying different glycans will lead to very similar fragmentation patterns in HCD after glycan loss.
SpectrumFinder v1.0 is open source software developed for the need of this study (Fig. 2). It is distributed under the terms of the GNU Public License version 3 and available for Linux and Windows operating systems from http://pappso.inra.fr/bioinfo/sf. SpectrumFinder is written in Cϩϩ. It uses a simple XML parameter file to build a collection of reference spectra and compare it to other spectra in mzXML files. The software requires MS/MS spectra recorded on the same instrument. After an initial spectral cleaning step based on background subtraction and low cut-off selection performed on a user-defined number of peaks, all MS2 spectra of a LC run are compared with a reference spectrum of interest ( Fig. 2A) to assess their similarity. The similarity is established on the basis of different criteria Fig. 2B. First, a minimum number of shared peaks is required. Second, a "cosine similarity" is calculated between the two intensity vectors of paired-peaks. When a peak is absent, a null intensity is used. The cosine similarity measurement is sensitive to the orientation but not to magnitude. Consequently, it is a good indicator of common components between two vectors independently of their intensity. Third, a linear regression and a Pearson test are calculated between intensities of all shared peaks. The resulting p value estimates the similarity between the fragmentation patterns.
SpectrumFinder output is a text file Fig. 2C. Each match between the reference spectrum and a candidate spectrum is reported on a separate line. Columns contain information on the computed spectrum (scan number, sample name), cosine similarity, linear regression coefficient, and value of the Pearson test.
In this study, the reference library was deliberately restricted to spectra corresponding to potential glycopeptides (Fig. 3). In each case, the reference was manually chosen as the best quality spectrum (minimum six fragment ions with high signal to noise ratio). For each Spectrum-Finder output, we systematically checked the corresponding MS2 spectra for the presence of diagnostic oxonium ions: m/z 163.0606 (Hex),

FIG. 2. Workflow of SpectrumFinder processing.
A, SpectrumFinder searches similar fragmentation pattern among MS2 spectra. Peaks are extracted from a reference spectrum and compared with an experimental data set. B, Similarities between MS2 spectra are statistically evaluated and a correlation score is calculated C, When a correlation is considered significant, the mass difference between the precursor ions of reference/computed spectra is calculated. In contrast to other library search tools such as SpectraST (57) or X!Hunter (58), SpectrumFinder does not use any a priori knowledge on the mass of the precursor, because only fragment ions are compared.
Semi-quantitative Analysis of Glycopeptides-A glycoform is a characterized glycosylation variant of a peptide. The relative abundance of each glycoform of the [192-211] peptide of Srr1 produced in H36BSrtA* and H36BSrtA*⌬gtfCH strains was calculated by averaging the corresponding XIC peak area measured from three biological replicates. Normalization was done using the areas of non-glycosy- To perform semi-quantitative analysis of peptide glycoforms, the area of each peak corresponding to a parent ion was computed from an extracted ion chromatogram (XIC) with MassChroQ, a software that performs alignment, XIC extraction, peak detection and quantification on mzXML data (59) with a 10 ppm tolerance. The relative proportion of each glycopeptide is reported in Table II.

Enrichment and Monosaccharide Composition of Fulllength Srr1-S. agalactiae
Srr1 is a 1310 amino acid protein with an expected molecular mass of 150 kDa. However upon SDS-PAGE Srr1 exhibits an apparent mass exceeding 250 kDa. This anomalous migration is likely to be a consequence of a high glycosylation level (Fig. 1C) combined to unusual amino acid content (50% of serine and threonine residues) which can alter the detergent binding capacity as compared with globular proteins (60,61). We previously demonstrated a strong reactivity of Srr1 toward the sWGA lectin, which is specific to GlcNAc moieties (38). We thus took advantage of this property to devise a lectin-based enrichment step using agarose linked sWGA. The efficiency of the strategy is illustrated (Fig. 1C): the lectin-based enrichment was required to detect Srr1 by Coomassie Blue staining. Monosaccharide composition of purified Srr1 was determined after acid hydrolysis by GC-MS analysis of trimethylsilyl derivatives. Two monosaccharides were identified: GlcNAc and glucose (Glc) in a 1.8:1 ratio. Unlike what was reported for Srr Fap1 protein of S. parasanguinis, we did not find any evidence for the presence of galactose or N-acetylgalactosamine associated to Srr1 (34).
Glycopeptide Search in CID Spectra of Srr1 Tryptic Peptides-Srr1 tryptic digest was first analyzed by LC-MS/MS on a LTQ-Orbitrap instrument using CID. Because the highly repetitive C-terminal domain does not contain any lysine or arginine residues, the sequence coverage was found to be restricted to the N-terminal region [93-639] (see supplemental Fig. S1). As the first 90 residues of Srr1 corresponded to the atypical signal peptide cleaved by the SecA2 secretion machinery (62), only residues spanning from 90 to 435 of its mature form are potentially accessible to trypsin. We obtained 83% sequence coverage for this region (supplemental Fig. S1).
Because of the lability of the O-glycosidic bonds during CID experiments (63) glycan loss from the precursor ion is often the major fragmentation pathway observed in MS/MS spectra. The loss of HexNAc from the [192-211] tryptic peptide at m/z 1163.5779 leading to an abundant m/z 1062.00 is shown in Fig. 4A. A MS 3 experiment performed on the deglycosylated form of the peptide at m/z 1062.00 confirmed its primary structure (Fig. 4B).
In order to identify Srr1 glycopeptides, the set of CID spectra was first analyzed with X! Tandem. The automated analysis of 900 spectra (generated in one LC-MS/MS run), led to the identification of three peptide sequences [214 -233], [234 -247], and [275-285] carrying a monosaccharide (supplemental Fig. S2A-2D). MS/MS data were also manually inspected, looking for fragmentation spectra dominated by a single ion characteristic of a glycan loss event. This manual search led to the identification of 12 other glycopeptides (Table I). The highest diversity was observed for the [192-211]peptide found associated with ten different glycan combinations, a feature illustrating the heterogeneity of Srr1 glycosylation (Table I and supplemental Fig. S3A-3K). Glycosylation with a Hex was found only on peptides carrying at least four HexNAc (or O-AcHexNAc, see below), suggesting that the latter modification is a prerequisite for further Hex addition. The reactivity of Srr1 toward lectin and the monosaccharide composition analysis of full length protein showed that Srr1 was mainly modified with GlcNAc and Glc residues, strongly suggesting that HexNAc and Hex should correspond to these two sugars.
An Unexpected Sugar Modification Identified on Srr1-We recurrently observed a neutral loss of 245.09 Da in CID spectra (see for example supplemental Fig. S3H, 3N), that was found either alone or in association with HexNAc moieties on  various peptides of Srr1. This mass did not correspond to any known post-translational protein modification but was recently associated to O-acetylated GlcNAc on Lactobacillus plantarum peptidoglycan (see proposed structure Fig. 5A) (64). To get more insight into the structure of this modification, CID spectra of oxonium ions at m/z 246.1 produced by ISF either from peptidoglycan fragments of L. plantarum (Fig. 5B) or Srr1 peptides (Fig. 5C) were generated and compared. The two fragmentation profiles matched perfectly and displayed the presence of ions m/z 210.1 and 228.0 characteristic of acetylated fragments. It was thus concluded that Srr1 can be modified by O-AcHexNAc molecule (Fig. 5). The additional acetylation cannot be precisely located from CID spectra alone, but the striking similarity between fragmentation profiles supports the hypothesis that the precursor ions share the same structure. The ␤-1,4-glycosidic linkage present in peptidoglycan makes the C4-position unlikely to be modified and we propose the C6-position as a putative site for the additional acetylation. Development of a New Software Tool to Analyze HCD MS/MS Data Obtained from Glycopeptides-As previously described, CID spectra of all glycopeptides are poorly informative because only the glycosidic bond cleavage is observed. To overcome this problem, we decided to use HCD fragmentation, which is slightly more energetic and allows for consecutive fragmentations. Digests previously analyzed on the LTQ were thus analyzed on a Q Exactive instrument (65). As expected, HCD MS/MS of glycopeptides was found to lead to different fragmentation patterns compared with CID. Interestingly, many y ions, characteristic of the backbone cleavage, could now be observed concomitantly with the formation of oxonium ions. Furthermore, various glycoforms of a given peptide were found to generate highly similar y-type ion profiles. Hence, glycopeptides sharing the same peptide backbone but varying in their glycosylation level display very sim-ilar fragmentation patterns while having different precursor masses. As an example, Fig. 6 shows the HCD fragmentation spectra of two different glycoforms of [192][193][194][195][196][197][198][199][200][201][202][203][204][205][206][207][208][209][210][211] peptide at m/z 925.4455 and m/z 1101.1749) illustrating the similarity of y ion profiles and detection of oxonium ions.
A software was developed to exploit the HCD spectra similarities between different glycoforms of a peptide. The software, coined Spectrum Finder, was dedicated to the automated search of specific peptide fragmentation pattern without a priori on the mass of the precursor (for details see experimental procedures and http://pappso.inra.fr/bioinfo/sf). As compared with X! Tandem, analysis of HCD data with SpectrumFinder revealed additional fragmentation spectra originating from the same peptide backbones. For each spectrum selected by SpectrumFinder, the mass difference between the precursor ion and the unmodified peptide was calculated and fitted to a glycan combination of Hex, HexNAc, and O-AcHexNAc (see experimental procedures and Fig. 2,  Fig. 3). All the glycopeptides previously identified in the CID experiments (listed in Table I) could be retrieved, demonstrating the robustness of this new approach. Furthermore, 12 novel glycopeptides could be identified: nine corresponded to a novel sugar combination on a peptide sequence already described as glycosylated but two corresponded to peptide sequences that were not previously found as glycosylated ([255-265] and [451-464]) (Table II).
Using our strategy, 17 variants of the [192-211] peptide bearing one to seven sugar residues were identified. Importantly, the semi-quantitative analysis performed on all glycoforms of this peptide revealed that the non-glycosylated form represented less than 1%, whereas more than 80% of the glycosylated forms corresponded to modifications with one to four HexNAc residues (Table II). In this regard it should be stated that HexNAc peptide ion are considered to be suppressed in favor of non-glycosylated peptides (66). As noted above, we confirmed that a Hex was observed only on HexNAc containing peptides.
Two additional glycoforms carrying HexNAc or O-AcHexNAc were identified for the [214 -233] peptide and one additional form for [275][276][277][278][279][280][281][282][283][284][285]. The comparison of the extracted ion chromatogram peak areas showed that the new glycopeptides retrieved by SpectrumFinder were of low abundance. These results highlight the gain of sensitivity provided by this approach. To check that none of the observed glycoforms were the result of ISF of larger ones, retention times measured for 16 different glycoforms of the [192-211] peptide were compared (supplemental Fig. S6). Except for the ϩ1 HexNAc and ϩ2 HexNAc glycoforms, all other glycoforms exhibited different retention times. This shows that ISF does not lead to a major bias in our measurements but has in some cases to be taken into account.
As expected an increase in the number of sugar residues leads to a shorter retention time. It is worth noting that in some cases two retention times are observed for the same glycopeptide, probably resulting in the presence of different isomers.
Searching HCD MS/MS spectra with SpectrumFinder led to the identification of a novel segment of Srr1 subject to glycosylation: the [451-464] peptide, which is located distant from the serine-rich region of the N-terminal part and belongs to the IgG domain of Srr1 bearing the fibrinogen (Fg) binding activity (39,52).
Determination of Glycosylation Sites-ETD has been shown to be a valuable tool for the analysis of glycopeptides because the glycosidic bond can be preserved during the fragmentation process, allowing glycosylation sites to be accurately mapped (10,67). Srr1 digest was therefore analyzed in both untargeted/targeted approaches using ETD as the fragmentation mode. All acquired ETD spectra were manually analyzed and annotated to locate the modification sites (supplemental Fig. S7). When combined, the different runs enabled the identification of five different peptide sequences bearing glycosylation (Table III).
In total, six different glycosylation sites could be unambiguously assigned. These results confirm that Srr1 is O-glycosylated and suggest that serines are preferred over threonines. Interestingly, we noted that the four identified glycosylated serines were followed by a leucine, suggesting that the sequon SerLeu could be a preferred target of GtfAB. However, the threonine modification demonstrates that glycosylation sites are flexible.
Influence of GT Encoded in the srr1_locus on Srr1 Glycosylation-WGA lectin binds to all bacterial Srr adhesins tested so far, indicating that GlcNAc is the predominant sugar in this family of proteins (Fig. 1B) (35,54). Our data show that all glycopeptides carried at least one HexNAc residues, suggesting that this addition could constitute an initial step in Srr1 glycosylation. We have previously shown that the GtfAB GTs encoded in the Srr1 locus are essential for Srr1 stability (38) as opposed to the six GtfC-H GTs (strain H36BSrtA*⌬gtfCH) (38). However, the protein produced in this genetic background displays a lower apparent molecular weight on SDS-PAGE that could be associated with a glycosylation defect (supplemental Fig. S8). To examine this hypothesis, we compared the abundance of selected Srr1 glycopeptides produced in strains possessing the full set of GT-encoding genes (H36BSrtA*) or where the so-called accessory GT-encoding genes were deleted (H36BSrtA*⌬gtfCH). The Srr1 protein bands were trypsin digested and analyzed on the LTQ-Orbitrap. The XICs corresponding to the glycoforms of the [192][193][194][195][196][197][198][199][200][201][202][203][204][205][206][207][208][209][210][211] peptide (Table I) were extracted from the two experiments. This peptide was chosen because of the large abundance and diversity of its glycoforms. As shown in Fig. 7, all the Hex-containing glycopeptides are now absent in the ⌬gtfCH strain, suggesting that one or more of the corresponding GT-encoding genes is involved in its transfer onto GlcNAc-modified residues.

DISCUSSION
Srr proteins constitute a widespread family of bacterial adhesins expressed in several Gram-positive bacterial species of human and veterinary clinical importance. Using MSbased approaches and dedicated software tool, we demonstrate that Srr1 produced in S. agalactiae displays a wide diversity of glycoforms attached to a protein segment located outside of SR1, the highest Ser/Thr density region of the N-terminal region (Fig. 8). Some glycopeptides are found with a various number of sugars, which highlights Srr1 glycosylation heterogeneity. The most striking example is associated to the [192][193][194][195][196][197][198][199][200][201][202][203][204][205][206][207][208][209][210][211] peptide that can be modified with at least 16 different combinations of HexNAc and Hex. Semi-quantitative analysis on this glycopeptide family shows that glycosylated forms would represent more than 98% of the total amount of the peptide (Table II). Among the glycoforms observed, some  (Table II) issued from Srr1 synthesized in H36BSrtA* (gray bars) and H36BSrtA*⌬gtfCH (white bars) strains were quantified by calculating the area of each parent ions computed from XIC and normalized (see experimental procedures). LC-MS/MS experiments were performed in triplicates and standard deviation (S.D.) is reported for each quantified glycoform. All Hex containing glycoforms were absent in Srr1 produced in ⌬gtfCH background. are not eluted as single and symmetrical peaks (see supplemental Fig. S6) suggesting the presence of different positional isomers that further increase the heterogeneity of the protein.
Monosaccharide composition analysis of full-length Srr1 demonstrated that GlcNAc and Glc were the major sugar associated to the protein. Although, the present mass spectrometry analysis is restricted to the N-terminal part which contains only a fraction (28 Ser/Thr of the potential O-glycosylation sites (624 Ser/Thr) of Srr1, it is reasonable to assume that the HexNAc and Hex modifications correspond to GlcNAc and Glc residues. A definitive conclusion will require compositional analysis of the N-terminal region. An additional level of complexity is obtained by the optional and surprising acetylation of GlcNAc. The same chemical modification was recently described on the glycan backbone of L. plantarum peptidoglycan where it provides an increased resistance toward the muramidase activity of lysozyme (64). This modification is performed by OatB, a membrane-embedded acetyltransferase, which possesses 40% homology with Gbs0052, an ortholog present in S. agalactiae. It is possible that this enzyme also acetylates GlcNAc residues decorating Srr1. The capacity of an enzyme to act on different surface compounds is not uncommon; in several Gram-negative species it was demonstrated that enzymes belonging to the lipopolysaccharide biosynthesis machinery were required for O-glycosylation of flagellins (68 -70). It was also found that a common pentasaccharide is used to O-glycosylate proteins and synthesize the capsule of the opportunistic pathogen Acinetobacter baumannii (71). Interestingly, a 245 Da compo-nent has recently been associated with glycan modifying surface proteins in Campylobacter hominis, the sole Campylobacter species considered as a human commensal (4). That the same post-translational modification of surface proteins was found in C. hominis and S. agalactiae, two unrelated species living in the same environment, could correspond to a niche-specific adaptation.
An important question is to understand what causes the high heterogeneity of Srr1 glycosylation. The biosynthesis of glycan and their attachment to proteins is dependent on a variety of biochemical parameters like the sugar nucleotide donor availability or the activity of glycosylating enzymes (77). Because glycosylation of proteins is a non-template process, variation of these parameters during cultivation could affect protein glycosylation as it has been shown for eukaryotic proteins (78). The consequences of varying S. agalactiae cultivation conditions (medium, carbon sources, temperature etc.) on Srr1 glycosylation characteristics (location, sugars added, and heterogeneity) could provide valuable clues on the mechanisms susceptible to influence the pathway.
The biological importance of protein glycosylation and how it can assist Gram-positive species in their commensal lifestyle is an important issue. Unfortunately, our current knowledge of the consequence of Srr1 glycosylation on its func- The black line displays the serineϩthreonine content of the sequence, calculated as a floating average with 50 residues in length sliding windows and expressed as proportion. The gray lines relate to the sequence numbering and correspond to region covered by the present mass spectrometry analysis. The red lines relate to the sequence numbering and correspond to glycosylated regions listed in Table II. The numbering starts at residue 91 corresponding to the secreted form of the protein.
tional properties remains limited. Biochemical analysis of Srr proteins in GT expression mutants have established that GtfAB-dependent glycosylation is essential for Srr production, stability or export depending on the species studied (34,38,79,80). When considering the biology of protein modification by GlcNAc, it is important to mention that all major pathways of central metabolism are required for UDP-GlcNAc synthesis (66). Thus, it can be hypothesized that O-GlcNAcylation of Srr1 could be indexed to nutrient availability because the sensor role of this modification is well established in eukaryotes (66).
Our data indicate that Srr1 glycosylation is not dependent on a strict consensus sequence, although we observed that glycosylation of serine was favored when it was followed by a leucine. There must be some additional structural elements of Srr1 that might control GT activity. The identification of these features will require to being able to explore the glycosylation of regions presenting the highest Ser/Thr density. The achievement of this objectives could be facilitated by the development of an in vitro glycosylation model similar to that developed for Fap1 (33).
Major efforts are underway to elucidate the mechanisms of bacterial pathogenesis for preventing and treating infections. A majority of the bacterial glycoproteins identified so far are localized at the cell-surface where they are involved in a variety of processes such as biofilm formation, adhesion, cell motility, or immune system escape (2,79,81). In Bacteroides fragilis, a Gram-negative human symbiont, the general O-glycosylation system is involved in the modification of tens of proteins, contributes to in vitro fitness of the species and is also required for the colonization of the mammalian gastrointestinal gut (82,83). These observations support the idea that protein glycosylation systems participate in the interaction of bacteria with mammalian host and contribute to commensalism or pathogenesis (1,84).
Surface proteins represent obvious targets for the development of vaccines and Srr proteins of S. agalactiae have been proposed as good candidates (51). In this perspective it will be important to characterize the glycosylation dynamics of protein susceptible to be chosen as attractive antigens because this mechanism could circumvent antibody recognition.