Identification of Novel Sites of O-N-Acetylglucosamine Modification of Serum Response Factor Using Quadrupole Time-of-flight Mass Spectrometry*

The addition of a single N-acetylglucosamine moiety O-linked to serine and threonine residues of nuclear and cytoplasmic proteins is a widespread post-translational modification. The conventional method for detecting and locating sites of modification is through a multistep radioactivity-based approach. We have recently shown that sites of O-GlcNAc modification can be determined using quadrupole time-of-flight tandem mass spectrometry (Chalkley, R. J., and Burlingame, A. L. (2001) Identification of GlcNAcylation sites of peptides and α-crystallin using Q-TOF mass spectrometry. J. Am. Soc. Mass Spectrom. 12, 1106–1113). In this work utilization of this new approach has revealed previously undetected sites of O-GlcNAc modification of the transcription factor serum response factor.

Since O-GlcNAc modification was first detected (1), a wide variety of proteins have been shown to be post-translationally modified with a single N-acetylglucosamine residue (2,3). The modification is transient, and the enzymes responsible for its addition (4,5) and removal (6,7) have been identified. Although not fully understood, the importance of O-GlcNAcylation is becoming increasingly apparent (8). All known GlcNAcmodified proteins are also potential phosphoproteins. Indeed in many instances the sites of phosphorylation and GlcNAcylation are localized to the same or neighboring residues (9 -11). Hence in many instances there appears to be direct interplay between the two post-translational modifications (3,12).
The major stumbling block in the understanding of O-Glc-NAcylation is the difficulty of detection and subsequent determination of sites of modification. The traditional method for characterization of sites of GlcNAcylation is through the use of radioactivity (1,13). Modified proteins are identified by radio-labeling GlcNAc residues with [ 3 H]galactose in vitro using a galactosyltransferase. Modified proteins are enzymatically digested, and peptides are separated by multiple rounds of HPLC. 1 Radiolabeled peptides are located by screening fractions for radioactivity. When a fraction containing only the radiolabeled peptide is isolated the peptide can be identified by Edman sequencing by monitoring for the cycle during which the radioactivity is released from which the modification site can be assigned.
Mass spectrometry has previously been used to indirectly identify sites of GlcNAc modification of synthetic peptides using base-catalyzed ␤-elimination of GlcNAc (14,15). This approach has recently been adapted to provide an affinity isolation strategy that has allowed for the identification of novel sites of O-GlcNAc modification of Synapsin I and nuclear pore proteins (16). However, the use of this strategy requires care to prevent misassignment between GlcNAc modification and phosphorylation as both modifications are eliminated chemically by the same mechanism. Removal of all phosphorylation with phosphatases prior to derivatization has been used to avoid this problem (16). However, this prevents the characterization of both modifications simultaneously.
We have shown that the new generation of quadrupole time-of-flight mass spectrometry (Q-TOF) tandem electrospray mass spectrometers can be used to directly identify sites of modification of synthetic peptides and that this approach was also able to identify a previously known site of O-GlcNAc modification on ␣A-crystallin (17). Samples are digested and then analyzed by LC-MS-MS, and GlcNAc-modified peptides are identified on the basis of the facile production of the GlcNAc oxonium ion. The masses of modified peptides are then used to create an "inclusion list" of peaks in a second LC-MS-MS analysis of the same sample. In this second run, lower collision energies are used to increase the probability of observing glycosylated fragment ions, and MS-MS spectra of each precursor ion are acquired over a longer time period. We now demonstrate the power of this strategy for studies of another GlcNAc-modified protein, serum response factor (SRF).
SRF is a ubiquitous transcription factor that binds to the serum response element, a regulatory sequence found upstream of the genes of many proteins transiently expressed upon growth factor stimulation (18). SRF is a member of the MADS box family of transcription factors, which are named after the four original members of the family: MCM1, AG, DEFA, and SRF. Members of this family contain a conserved domain of 56 amino acids. The N-terminal part of this domain defines the DNA binding specificity of the protein, while the C-terminal part effects protein dimerization. Dimers of SRF form ternary complexes on the serum response element along with accessory ternary complex factors such as Elk-1, Sap-1, and Sap-2. SRF can be activated by serum, lysophosphatidic acid, aluminum fluoride, and G proteins via the GTPase RhoA (19,20) and triggers the expression of key proteins involved in cell cycle progression, differentiation, and development (21)(22)(23).
SRF is a 50-kDa protein for which a number of phosphorylation sites have been identified. SRF is phosphorylated on serine 103 in response to stimulation by mitogen-activated protein kinase-activated protein kinase 1 (p90 RSK ) (24) in response to stress by mitogen-activated protein kinase-activated protein-2 (25) and by calcium/calmodulin-dependent kinases II and IV (26). It is also phosphorylated at serine 83 by casein kinase II (27) and serine 435 by DNA-activated protein kinase (28).
SRF is also O-GlcNAc-modified (29), and Reason et al. (30) reported previously the characterization of a number of Glc-NAc modification sites. Using recombinant SRF overexpressed in baculovirus (31), they identified serine 283 as a major site of GlcNAc modification. Serine 316 was also found to be glycosylated, and one of either the serine 307 or serine 309 sites appeared to be modified at very low stoichiometry. The sites were identified by first carrying out sequential enzymatic digestion of the protein using a combination of enzymes. Peptides were then separated by HPLC, and fractions were collected. These were analyzed by fast atom bombardment mass spectrometry to detect fractions containing peptides that differed in mass by 203 Da, which could correspond to unmodified and GlcNAc-modified versions of the same peptide. Selected fractions were subjected to in vitro [ 3 H]galactose labeling to tag the GlcNAc residue, and sites of modification were finally determined by Edman degradation analysis by monitoring for the release of the radioactive modified amino acid. This previous work was carried out starting with 5-10 nmol of highly purified protein. In the present study, only 5-10 pmol of gel-purified protein was required to confirm some of the GlcNAc modification sites previously reported (30) and reveal the presence of two novel GlcNAc modification sites and yet another phosphorylation site.
Serum Response Factor-Human SRF was a kind gift from Dr. Rob Nicolas (Cancer Research UK, London, UK). The protein was overexpressed in baculovirus and then purified (31). This was in fact the same sample that was previously used to identify O-GlcNAc modification sites using a combination of mass spectrometry, radioactivity, and Edman sequencing (30). The sample was supplied in 20 mM Hepes, pH 7.9, 300 mM KCl, 0.2 mM EDTA, 0.2 mM EGTA, 0.1% Nonidet P-40, 10% glycerol, 1 mM dithiothreitol. The presence of detergents and salts made the sample unsuitable for direct analysis by mass spectrometry. Hence protein was purified by one-dimensional SDS-PAGE on a 10% polyacrylamide gel and visualized using Coomassie Brilliant Blue staining. A single band was observed as published previously (31).
In-gel Digestion-Excised gel bands were further destained in 25 mM NH 4 HCO 3 , 50% acetonitrile, reduced using dithiothreitol, alkylated using iodoacetic acid, vacuum-centrifuged to dryness, and then allowed to swell in 5 l of 25 mM NH 4 HCO 3 containing 4 ng/l modified trypsin for 5 min. Gel pieces were then overlaid with a further 25 l of 25 mM NH 4 HCO 3 and digested overnight at 37°C. Peptides were extracted using two aliquots of 20 l of 50% acetonitrile, 5% trifluoroacetic acid, vacuum-centrifuged to dryness, and resuspended in 10 l of water prior to analysis.
On-line LC-ESI-MS-HPLC was carried out using the Ultimate/ Famos/Switchos suite of instruments (LC Packings, Amsterdam, The Netherlands). Samples were loaded onto a guard column (300-m inner diameter ϫ 5-cm C18 PepMap) in the injection loop and washed using 0.1% formic acid at 40 l/min for 2 min using the Switchos pump. Peptides were then separated on a nanoflow column (75-m inner diameter ϫ 15-cm C18 PepMap) at a flow rate of 200 nl/min using a gradient from 5 to 40% buffer B (80% acetonitrile, 0.1% formic acid) over a period of 32 min. Four 2-s MS-MS scans were performed on each precursor ion. For ions of m/z 400 -900 two scans each at 28 and 32 V were acquired, whereas ions observed between m/z 900 and 2000 were fragmented with collision energies of 32 and 35 V.
For subsequent CID-MS-MS analysis of observed GlcNAc-modified peaks, using the automatically acquired CID-MS-MS spectra as a reference, collision energies were adjusted to optimize for the observation of GlcNAc-modified fragment ions. This resulted in reducing the collision energies to 26 and 28 V for the glycosylated peptides spanning residues 303-323 and 396 -417. MS-MS was acquired on these precursor ions for the whole duration of their elution (ϳ30 s).
HPLC Fractionation of Chymotryptic Digest-HPLC was carried out using an ABI 140D pump and 785A detector (Applied Biosystems). The digest was loaded onto a guard column (300-m inner diameter ϫ 5-cm C18 PepMap) in the injection loop and washed using 20 l of 0.1% formic acid. Peptides were then separated on a microbore column (300-m inner diameter ϫ 15-cm C18) at a flow rate of 5 l/min using a gradient from 5 to 40% buffer B (80% acetonitrile) over a period of 32 min. Fractions were collected when peaks were observed on the UV chromatogram. Fractions were vacuum-centrifuged to dryness and resuspended in 4 l of water for MALDI-MS screening.
Pro-C Subdigestion-5 l (2 milliunits) of Pro-C in 25 mM NH 4 HCO 3 was added to the HPLC fraction containing the GlcNAc-modified peptide. Digestion was carried out at 37°C for 1 h.
MALDI-MS-0.5 l of each HPLC fraction was mixed on target with 0.5 l of 2,5-dihydroxybenzoic acid (saturated solution in water). Sample was allowed to dry at room temperature before introduction into the mass spectrometer. Spectra were acquired on the Reflex III (Bruker Daltonics, Bremen, Germany), surveying a range of m/z 600 -4000. Spectra were externally calibrated from a nearby spot using a peptide calibration mixture (calibration mixture 2 from Sequazyme kit, Applied Biosystems).

RESULTS
A one-dimensional SDS-PAGE band of 5 pmol of serum response factor was digested with trypsin and analyzed by LC-MS with data-dependent selection of precursor ions for fragmentation analysis.
Characterization of Phosphorylated Ser 101 -Arg 135 -A triply protonated peak, [M ϩ 3H] 3ϩ of m/z 1092.15 was observed in the LC-MS run and was automatically selected for CID fragmentation. The resulting spectrum is shown in Fig. 1, and the corresponding fragment ion identities are listed in Table I. This peptide component is a phosphorylated version of the peptide spanning residues 101-135 of SRF. This particular sequence is not a predicted tryptic fragment as although residue 135 is a lysine residue, residue 136 is a proline, and trypsin is not predicted to cleave after lysines that are followed by prolines (20). The peptide contains a known phosphorylation site at residue 103 (24 -26). None of the "y" ions are observed in a phosphorylated state, but all "b" ions observed are either phosphorylated or have lost H 3 PO 4 to produce an ion 18 Da smaller than the equivalent unmodified fragment ion. The smallest phosphorylated b ion is the b 6 ion at m/z 757.42. This defines the phosphorylation to be on either serine 101 or serine 103. b 1 and b 2 ions are not observed, so there is no information in this spectrum that permits the differentiation between the two possible sites. However serine 103 is more likely as trypsin is unlikely to cleave after arginine 100 if serine 101 is phosphorylated due to steric hindrance of the bulky phosphate group.
Characterization of Phosphorylated Ala 213 -Arg 235 -Two additional triply protonated components were observed in the LC-MS analysis: [M ϩ 3H] 3ϩ ϭ m/z 861.77 and [M ϩ 3H] 3ϩ ϭ m/z 888.39. These correspond to residues 213-235 of SRF containing an acrylamide-modified cysteine residue and a phosphorylated version of this peptide. The crucial region in the MS-MS spectra that permits determination of the site of phosphorylation is shown magnified in Fig. 2. Both the nonphosphorylated and phosphorylated CID spectra contain a doubly protonated (charged) y 11 ion at m/z 635.35. However, whereas the y 12 2ϩ ion in the spectrum of the non-phosphorylated peptide appears at m/z 678.88, in the phosphorylated spectrum a doubly charged peak at m/z 718.84 is observed instead. Also a doubly charged peak at m/z 669.89 is observed, which corresponds to the loss of H 3 PO 4 from the phosphorylated y 12 ion. Hence interpretation of this fragmentation spectrum establishes serine 224 as a new site of phosphorylation of SRF.
A number of GlcNAc-modified peptides were also observed in this tryptic digest, and some of these were selected for CID fragmentation. However, none of the resulting automatically acquired MS-MS spectra contained sufficient data to deter-  mine the identification of a particular site of modification (data not shown). This situation appeared to be due to the presence of a basic C-terminal residue formed by tryptic digestion that directed fragmentation toward C-terminally derived fragment ions, whereas overlapping N-terminal and C-terminal fragment ion series are generally required to identify sites of GlcNAc modification (17). Therefore, a second digest of 5 pmol of SRF was performed using chymotrypsin. Characterization of Glycosylated Leu 303 -Leu 323 -A Glc-NAc-modified peptide was observed at [M ϩ 2H] 2ϩ m/z 1059.01 in this digest corresponding to a glycosylated peptide of residues 303-323. A CID-MS spectrum of this glycopeptide is shown in Fig. 3, and the relative peak identities are listed in Table II. The presence of four glycosylated y ions in this spectrum establishes that serines 307, 309, and 311 are not modified. A glycosylated b 13 (b 13 G ) ion is also present in this spectrum at m/z 1369.80. The combination of b 13 G and y 13 G ions defines the site of glycosylation among the middle four residues, PSAV. Of these, only serine 313 can bear the modification. Thus interpretation of the ions present in this CID spectrum establishes a new site of GlcNAc modification on serine 313.
Characterization of Glycosylated Thr 396 -Tyr 417 -Another GlcNAc-modified peptide was observed at [M ϩ 3H] 3ϩ m/z 851.60. This is a GlcNAc-modified peptide of residues 396 -417. The CID-MS-MS spectrum of this peptide is shown in Fig. 4, and the fragment ions are identified in Table III. Despite the observation of a large number of fragment ions, the only glycosylated fragment ions present were y 18 G -y 20 G . There are also a large number of internal fragment ions whose formation was favored by the presence of multiple proline residues in the sequence, and three of these internal ions are observed in a glycosylated state. These glycosylated ions discount the four most N-terminal residues as sites of modification and restrict the GlcNAc modification site to one of threonine 401, threonine 402, or serine 411.
To determine which of these residues bear the modification, a second chymotryptic digest of SRF was carried out, this time using 10 pmol of gel-purified protein. This digest was separated by HPLC, and fractions were collected. Each fraction was then screened by MALDI-MS to locate the fraction that contained the GlcNAc-modified peptide corresponding to residues 396 -417. The mass spectrum of fraction 9 is shown in Fig. 5. This fraction contains several peptides, but unfortunately the ions at m/z 2350.24, m/z 2366.20, m/z 2382.20, and m/z 2398.13 correspond to the non-glycosylated peptide 396 -417 itself as well as analogs containing one, two, and three oxidized methionine residues, respectively. However, weak ions are observed at m/z 2553.45, m/z 2569.38, and m/z 2585.42, which correspond to the GlcNAc-modified analogs of this peptide itself and analogs bearing one and two oxidized methionine residues.
This fraction was then subdigested with Pro-C, which  Table III. The ions at m/z 1369.80 (b 13 G ) and m/z 1392.76 (y 13 G ) determine the modified residue to be serine 313 cleaves peptides C-terminal to proline residues and analyzed by LC-MS. If this enzyme had worked efficiently, it should have digested the GlcNAc-modified peptide present to produce the sequence TTVGGHMMYP. Fig. 6A shows the extracted ion chromatograms of m/z corresponding to doubly charged peaks for this peptide in non-glycosylated and Glc-NAc-modified states containing one and two oxidized methionine residues. A peak is observed in the chromatogram of the non-glycosylated, singly oxidized peptide, and a weaker peak is observed for the doubly oxidized version of this peptide. However, there are no peaks for GlcNAc-modified versions of this peptide since the peak after 28.5 min in the chromatogram of m/z 656.8 relates to a singly charged peak at m/z 657.38 (data not shown). Fig. 6B confirms that the two peaks in the ion chromatograms of the unmodified species relate to doubly charged peptides of the correct mass values. Thus, the enzyme had successfully produced unmodified peptides expected from this region, but there was no evidence of any GlcNAc-modified analogs. Fig. 7 shows extracted ion chromatograms of non-glycosylated and GlcNAc-modified peptides with one and two oxidized methionine residues for the peptide spanning residues 396 -410 (TSSVPTTVGGHMMYP), which is the product expected from one missed cleavage. There is no extracted ion chromatogram peak for a singly oxidized non-glycosylated peptide, and the peak in the doubly oxidized ion chromatogram is formed by a singly charged peak at m/z 799.39, so it is not the component of interest (data not shown). However, there are peaks in the chromatograms for GlcNAc-modified singly and doubly methionine-oxidized species. The mass spectra recorded at the corresponding times in the extracted ion chromatograms (Fig. 7B) show these to be doubly charged peptides of the correct mass values. Hence apparently this peptide is present only in a GlcNAc-modified state.
Together these results suggest that the Pro-C enzyme normally cleaves this peptide after proline 400 to produce the peptide spanning residues 401-410. However, the presence of the GlcNAc moiety prevents cleavage at this site, leading to the N-terminally extended peptide spanning residues 396 -410. Thus, it appears that the presence of a GlcNAc moiety provides sufficient steric hindrance to prevent the enzyme from attacking this site. This strongly suggests the site of modification is the immediately neighboring threonine 401. Indeed Reason et al. (30) proposed that a GlcNAc on a neighboring residue would prevent enzymatic cleavage. This previously unknown site is the first modification of threonine identified thus far in SRF; all other known sites are on serine residues.   Table III. DISCUSSION In this work a well characterized phosphorylation site on serine 103 (24 -26) was detected and confirmed, and a previously unobserved site of phosphorylation was identified at serine 224. This is immediately contiguous to a region determined to be important for DNA binding (32). Thus, phosphorylation at this site may regulate the ability of the protein to bind to the serum response element.
A GlcNAc modification site in the peptide spanning residues 303-323 was identified as serine 313. This represents a previously undetected site of modification because it was not observed in the work of Reason et al. (30). One of the glycopeptides Reason et al. focused on spans residues 313-324 and was reported to be modified on serine 316. The probable reason they did not observe GlcNAcylation of serine 313 may relate to the approach they used to detect the presence of GlcNAc-modified peptides. They used a sequential digestion approach using cyanogen bromide followed by trypsin and then a proline-specific endopeptidase (Pro-C). The resulting peptide mixture was then separated by HPLC, and fractions were collected and screened for pairs of peaks that differed by 203 Da in mass, which is characteristic of an unmodified and GlcNAc-modified peptide. However, if residue 313 was GlcNAc-modified, this would most likely prevent Pro-C from cleaving after proline 312 due to steric effects from the proximal sugar residue (further evidence for this effect is presented above in the analysis of the GlcNAc-modified peptide spanning residues 396 -417). Hence this site will be cleaved only when serine 313 is unmodified, but if there is a bulky sugar moiety attached it will be inaccessible, so a glycosylated peptide of residues 306 -324 would be the product of this combined enzymatic digestion. Thus, there would not be a pair of peptides differing by 203 Da in mass, so this particular modified peptide would not have been detected by their screening strategy.
It is interesting that this site appears to be heavily modified, but we observed no evidence of the presence of any modification at serine 316 reported previously (26), i.e. there was no sign of a doubly GlcNAc-modified version of this peptide in the extracted ion chromatogram from this digest (data not shown). This suggests that the stoichiometry of modification at serine 313 is considerably higher than any possibly present at serine 316. In fact, none of the previously identified GlcNAc modification sites in the peptide spanning residues 303-324 were detected in these studies, although these experiments use exactly the same protein preparation of SRF (31). This  may show different selectivity for these two very different approaches to identifying GlcNAc modification sites in a large protein and advocates the use of multiple GlcNAc detection methods to maximize information about a post-translational glycosylation state of the protein. By using radioactivity to locate GlcNAc-modified peptides, sites modified at lower stoichiometry might be detected and identified, although much more total protein was required. The mass spectrometry-based approach used in this work, although extremely sensitive, struggles to find sites that are modified at very low stoichiometry as it is a prerequisite of this approach that GlcNAc-modified peptides must have been detected and se-lected for MS-MS in the initial "screening" LC-MS run. To find these, selective enrichment of modified peptides could be used to augment the dynamic range accessible and simplify the sample (8,33,34).
A further site of GlcNAc modification was detected during this study. Fragmentation data from a GlcNAc-modified peptide observed in a chymotryptic digest (Fig. 4) narrowed this site of modification to either threonine 401, threonine 402, or serine 411, and then data from a Pro-C subdigest of this peptide provided strong evidence that threonine 401 is indeed the modified residue. It may be possible to confirm unambig- uously this particular site using this same subdigestion approach starting with a larger amount of protein such that there would be sufficient modified peptide after the Pro-C digest to allow CID-MS-MS data acquisition on the newly formed glycopeptide. An alternative approach would be to subdigest the purified glycopeptide with a nonspecific enzyme such as pepsin and see whether a glycosylated peptide formed by a cleavage between the two threonine residues is observed. However, this approach may not be successful as presence of the GlcNAc moiety will probably prevent cleavage between the residues in the same way it prevented cleavage by Pro-C. Also a large amount of starting material would be required to yield a detectable amount of GlcNAc-modified peptide formed by cleavage between the two threonine residues as products from cleavages at all other sites would also be formed.
Of considerable interest, none of the particular sites of O-GlcNAc modification and phosphorylation coincided on the same residue. Hence no direct evidence has been obtained that these two modifications are interacting with each other in a regulatory sense. Indeed the GlcNAc modification sites appear to be concentrated in a region of the protein for which there is no evidence of phosphorylation, which is unusual for the majority of sites identified thus far although not unique (35,36). However, it is clear that sites of phosphorylation and O-GlcNAcylation are present in the same protein. This suggests that the two modifications may not be mutually exclusive at the protein level. Hence these two different modifications probably serve different roles in controlling the function and activity of this protein.

CONCLUSIONS
This work has successfully identified novel sites of O-Glc-NAc modification of a protein without the use of radioactivity or derivatization of the modification site. Our results demonstrate that sufficient sensitivity can be achieved that it may no longer be necessary to resort to recombinantly overexpressing a protein to be able to characterize sites of O-GlcNAc modification. This study also represents the first reported instance when both phosphorylation and GlcNAcylation sites have been characterized from a single analytical strategy. This approach will hopefully stimulate increased interest in the study of protein GlcNAcylation that will improve our understanding of the function of this widespread post-translational modification.