Mapping Protein Surface Accessibility via an Electron Transfer Dissociation Selectively Cleavable Hydrazone Probe*

A protein's surface influences its role in protein-protein interactions and protein-ligand binding. Mass spectrometry can be used to give low resolution structural information about protein surfaces and conformations when used in combination with derivatization methods that target surface accessible amino acid residues. However, pinpointing the resulting modified peptides upon enzymatic digestion of the surface-modified protein is challenging because of the complexity of the peptide mixture and low abundance of modified peptides. Here a novel hydrazone reagent (NN) is presented that allows facile identification of all modified surface residues through a preferential cleavage upon activation by electron transfer dissociation coupled with a collision activation scan to pinpoint the modified residue in the peptide sequence. Using this approach, the correlation between percent reactivity and surface accessibility is demonstrated for two biologically active proteins, wheat eIF4E and PARP-1 Domain C.

In the past decade the number of protein sequences without solved structures in the Protein Data Bank library has increased dramatically (1). Determination of the correlation between protein structure and function remains a primary objective of biological research, thus motivating the development of advanced analytical tools for unraveling the threedimensional structures of proteins. The most common techniques used to determine higher order structure of a protein are nuclear magnetic resonance (NMR) 1 and x-ray crystallography; however these techniques are not universal because of practical limitations related to protein size, the inability to crystallize certain proteins, or limited sample amounts. Because of these restrictions, the development of mass spectrometry-based strategies for providing protein structural analysis has gained traction because of their speed and sensitivity.
A number of protein labeling techniques have been used in combination with tandem MS analysis to provide low resolution structural information including hydrogen-deuterium exchange, crosslinking, and covalent chemical modifications prior to proteolytic digestion (2). Hydrogen-deuterium exchange provides the most detailed information about a structure as it probes the entire protein backbone; however, spatial resolution can be greatly limited because of back-exchange of the deuterium to hydrogen before analysis (3) or hydrogen/ deuterium scrambling during tandem MS (4). Electron-based dissociation methods have been able to overcome much of the error due to scrambling, but the limitations because of back-exchange still remain great (5,6). Chemical modifications are used to covalently label specific or nonspecific amino acid side chains, thus eliminating the problem of backexchange and scrambling (2). The foundation of this method is based on the premise that amino acids that are exposed to solvent and are therefore accessible to a chemical labeling agent will be modified, whereas those that are buried will be modified slowly or not at all. This type of labeling provides information about the identities of the amino acids accessible to the reagent in solution, resulting in a low resolution map of the protein's surface. In particular, the protein's surface and accessibility of amino acid side chains reflects their potential participation in protein-protein or protein-ligand interactions. The chemical modification methods are frequently combined with a bottom-up approach in which the modified proteins are subsequently enzymatically digested to facilitate MS/MS analysis. Crosslinking, a specific form of chemical modification, provides information about structure by creating new intramolecular or intermolecular bonds between specific amino acids with distance constraints on the location of the two amino acids linked based on the size of the linker. (7) Because new bonds are formed between contact areas of the protein, sequencing these sites through tandem MS can prove challenging. Identification of the crosslinked peptides provides distance parameters that can be used to reconstruct the contact points and conformation of the protein(s). With single covalent modifications of amino acids (not formation of crosslinks), the analysis is often more straightforward as the modification can be treated as a post-translational modification, thus allowing most searching algorithms to be used for the identification of the digested modified peptides. In this case, the identification of the modified residues reveals the exposed regions of the protein relative to the inaccessible regions, therefore reflecting conformational information.
Lysine is one of the most targeted amino acids for chemical labeling because of its intrinsically high reactivity, thus making it amenable to efficient modification. Moreover, the positively charged, polar side chain of lysine under physiological pH conditions means that lysines are more often located on the hydrophilic surface of proteins and consequently more often involved in protein-protein or protein-ligand interactions (2). These interactions can be studied through this covalent modification strategy based on monitoring the differential reactivity of selected residues in the presence/absence of the interacting protein or ligand, indicating their involvement in a binding interaction. Several different Lys-specific chemical modification methods have been reported such as aminoacetylation (8 -13), amidination (14,15), and many using biotin labeled reagents (16 -20). These modifications, coupled with mass spectrometric analysis, have been used to characterize the topology of multiple proteins, protein-ligand complexes and protein-protein interactions (8 -20).
The development and application of new crosslinking and surface accessibility strategies has been impeded by the difficulty of detecting the low abundance crosslinked or modified peptides amid a large array of more abundant unmodified peptides produced upon enzymatic digestion of the proteins or protein complexes. This detection problem has been addressed by efforts to selectively enrich the modified peptides (21)(22)(23)(24)(25)(26)(27), incorporation of isotopic labels in the reagents to give the modified peptides distinctive isotopic signatures (28 -32), and development of selectively cleavable reagents that yield a traceable neutral loss or reporter ions upon MS/MS analysis (33)(34)(35)(36)(37)(38)(39)(40). In the context of surface accessibility studies, Reid et al. has addressed this issue using a "fixed charge" sulfonium ion with specificity for methionine (41), cysteine (42), and lysine (43) targets where upon collision induced dissociation (CID), the exclusive loss of a dialkylsulfide (e.g. a neutral loss of 62 Da) is indicative of a modified species. Our group has previously found that a N-N hydrazine bond can be selectively cleaved upon electron transfer dissociation (38). Bisarylhydrazone crosslinked peptides were created by reacting succinimidyl 4-formylbenzoate modified peptides with succinimidyl 4-hydrazinonicotinate acetone hydrazone modified peptides (38). These crosslinked peptides were then readily identified through the production of two partially modified peptides, one even-electron product ion and one odd-electron product ion, following cleavage of the N-N hydrazine bond (38). Here we describe a new surface accessibility reagent, NN (see Fig. 1), that modifies primary amines (the sidechain of Lys and/or the N terminus) by incorporating an N-N hydrazine bond. Upon ETD, the preferential cleavage of the N-N bond leads to a dominant loss of 93 Da from all chargereduced species that allows confident differentiation of modified peptides from unmodified ones. The N-N bond cleavage is both highly efficient and very selective, an outcome attributed in part because of the unique traits of electron-mediated activation. CID and/or ETD are used to pinpoint the exact locations of the modifications and to identify the modified peptides. This information is used in tandem with molecular modeling to map the surfaces of two biologically active proteins of considerable recent interest, wheat eIF4E and PARP-1 Domain C.

EXPERIMENTAL PROCEDURES
Materials and Reagents-Ubiquitin, proteomics grade trypsin, endoproteinase Arg C, and Glu C were purchased from Sigma Aldrich (St. Louis, MO). PARP 1 Domain C protein was provided by Dr. Hung-wen Liu (Department of Pharmacy, University of Texas at Austin) and eIF4E was provided by Dr. Karen Browning (Department of Chemistry and Biochemistry, University of Texas at Austin). All other chemicals and solvents were purchased from Fisher Scientific (Fairlawn, NJ). The surface accessibility reagent, NN, was synthesized in house as summarized in the supplemental Material (see supplemental Fig. S1).
Mass Spectrometry and Liquid Chromatography-All experiments were undertaken on a ThermoFisher LTQ XL linear ion trap mass spectrometer (San Jose, CA) equipped with an ETD unit. Direct infusion analysis on the LTQ XL was performed using an online nanoESI setup as previously described using a flow rate of 3 l/min at a concentration of 10 M in 49.5:49.5:1 MeOH:H 2 O:Acetic acid (44). An ESI voltage of 2 kV and a heated capillary temperature of 180°C were used for experiments on the LTQ XL. Liquid chromatography was performed using a RLSC Dionex UltiMate 3000 system (Sunnyvale, CA). An Agilent ZORBAX 300Extend-C 18 column (Santa Clara, CA) (150 ϫ 0.3 mm, 3.5 m particle size) was used for all separations. Eluent A consisted of 0.1% formic acid in water and eluent B 0.1% formic acid in acetonitrile. A linear gradient from 5% eluent B to 40% eluent B over 65 min at 0.3 l/min was used. Injections of approximately one picomole were used for each digested sample. For all liquid chromatography-tandem MS (LC-MS/MS) runs, the first event was the full mass scan (m/z range of 400 -2000) followed by five sets of consecutive MS/MS events on the five most abundant ions from the full mass scan. The first MS/MS event in each set was acquisition of an ETD spectrum using an electron transfer reaction time of 100 ms. Following ETD, CID was performed using a q-value of 0.25, a normalized collisional energy of 35%, and a collision activation of 30 ms. The maximum injection time for all events was set to 100 ms, and each mass spectrum and tandem mass spectrum was the average of five microscans.
Derivatization and Sample Preparation-Each protein (1 mM, 25 l) was mixed with NN at a variety of molar ratios relative to NN (20 mM) in PBS buffer at pH 7.2-7.4. The protein:NN ratios were varied from 1:2 to 1:10. Higher protein/NN ratios were used to enhance the modification and subsequent detection of some of the less accessible sites that might otherwise be nondetectable under more limiting protein:NN reaction conditions. Upon covalent modification of any protein, slight changes in structure are possible following derivatiza-tion. Thus to preserve the structure of the protein as much as possible, the lowest possible protein/NN molar ratios are typically used to limit the number of modifications per protein. All reactions were carried out at room temperature for 30 min and cleaned up using 10 kDa MWCO filters. Derivatized samples were then split into three aliquots for digestion. For tryptic digestion, the derivatized protein (8 nmol) was diluted with 100 mM NH 4 HCO 3 and digested with 1 mg/ml trypsin in 1 mM HCl in a 1:100 w/w ratio of protein to trypsin overnight at 37°C. For GluC digestion, the derivatized protein (8 nmol) was diluted with the GluC Reaction Buffer from BioLabs, consisting of 50 mM Tris-HCl and 0.5 mM Glu-Glu buffer at pH 8.0, and digested with 1 mM GluC in water in a 1:20 w/w ratio overnight at 37°C. For ArgC digestion, the derivatized protein (8 nmol) was diluted in 50 mM NH 4 HCO 3 at a pH of 8.0 with a small addition of 20 mM CaCH 3 COOH to enhance digestion, and then digested using a 1:16 w/w molar ratio of protein to ArgC at 37°C overnight. The digested samples were diluted to 10 M before ESI analysis with 49.5/49.5/1 H 2 O/MeOH/ Acetic Acid. Denatured samples were prepared by diluting the protein in 49.5/49.5/1 H 2 O/MeOH/acetic acid prior to derivatization.
Determination of Surface Accessibility-Upon identification as a modified peptide from the ETD spectrum based on the characteristic loss of 93 Da, peptides were sequenced manually using the subsequent CID fragmentation pattern of the charge-reduced species. The percent reactivity was calculated based on the sum of the peak areas of all peptides containing a modified residue based on the total ion chromatographic profiles and integrated using QualBrowser, divided by the sum of the area of all peptides containing the unmodified and modified residue as shown in equation 1.

% Reactivity
ϭ ⌺ Area of all peptides containing modified residues n/ ⌺ Area of all peptides containing residues n (Eq. 1) For proteins where a known structure was not available, ITASSER was used to predict a structure based on the primary sequence of the protein (1). Predicted surface accessibilities for all Lys side chains and the N terminus were calculated using GetArea software online (45) with all parameters set as the default values based on known or ITASSER model structures as indicated. All predicted accessibility values for the lysine residues are based on the entire lysine sidechain. The GetArea algorithm classifies residues as solvent accessible if the surface accessibility ratio is calculated to exceed 50%, whereas residues are categorized as buried if the calculated value falls below 20% (45,46). Correlation between the percent reactivities associated with the reactions of NN with the proteins and the surface accessibilities calculated from modeling programs such as GetArea is not expected to be quantitative. For our experimental strategy, only residues that reside on the surface of a protein will readily react with the NN reagent, thus establishing the general correlation between the surface accessibility of the residue with its percent reactivity. However, as the GetArea computations consider the entire side chain and not only the reactive amine group, there is not an exact parallel between the two (i.e. surface accessibility from GetArea versus percent reactivities from our experimental method). Moreover, our reported strategy uses a bulkier reagent and monitors its ability to interact with accessible primary amines, whereas the computed surface accessibilities are derived from contact of small solvent molecules within van der Waal's distances when rolled along the protein. (Fig. 1) was designed to selectively react with primary amines, such as the -amino groups on lysine side chains and the N terminus which have proven consistently to be among the most reactive sites of proteins, via conventional N-hydroxysuccinimide (NHS) coupling. NN contains a N-N hydrazine bond that has previously been shown to preferentially cleave upon ETD (38), thus specifically facilitating the identification of modified peptides based on an easily monitored MS/MS reaction. For NN, the characteristic fragmentation upon ETD results in the neutral loss of 93 Da from the charge-reduced precursor, and the number of consecutive neutral losses is indicative of the number of modified residues. This pathway is the dominant fragmentation pathway of NN-modified peptides upon ETD, thus providing a facile way to differentiate NN-modified peptides from unmodified peptides and allow confident sequencing by ETD or CID (Figs. 2 and 3). The characteristic N-N bond cleavage and neutral loss of 93 Da may also be used to implement a data-dependent scan mode in which the acquisition of a CID spectrum can be triggered by this neutral loss to identify the peptide and locate the modifications.

Design of NN Surface Accessibility Reagent-NN
The use of ETD as the primary activation method for analysis of the peptides produced from the modified protein also offers a strategic advantage. Because the site of modification (side-chain of lysines) is also the cleavage site for trypsin, the most frequently used protease, missed cleavages at the modified lysine sites are common. Therefore, GluC and ArgC digests were used in combination with trypsin to gain a more detailed picture of each protein's structure. ETD proves to be remarkably efficient for activation of the larger, more highly charged peptides generated from the GluC and ArgC digests, thus making ETD a natural fit for this strategy. For the reactions of NN with each protein, the protein/NN ratios were varied over a range of values in order to optimize reaction efficiencies while maintaining the native tertiary protein structure. At lower molar ratios, only the most accessible sites are modified and the structural integrity is most readily maintained. With higher molar ratios, the reaction efficiencies of less accessible modification sites are increased, thus enhancing detection of those sites.
Method Verification-Ubiquitin contains eight primary amine sites including seven Lys and the N terminus. The reaction of ubiquitin with the NN reagent is very efficient, as evidenced by the ESI-mass spectrum of the protein after modification (see supplemental Figs. S2A and S2B). For ubiquitin, a 1:2 protein/NN ratio produced predominantly singly modified proteins with minor contributions of doubly and triply modified proteins. Subsequent tryptic digestion and LC-MS/MS analysis resulted in identification of nine unmodified peptides plus 14 modified peptides, the latter based on the characteristic N-N bond cleavage upon ETD followed by CID. The modified peptides were easily distinguished from unmodified peptides using ETD through the preferential cleavage of the NN bond (Fig. 2). This cleavage leads to a loss of 93 from all charge reduced species. In addition, sequential losses of 93 Da can be used to identify the number of modifications. For example, the N-terminal peptide for ubiquitin, MQIFVK-TLTGK, has two possible modification sites, the N-terminal M 1 and K 6 . Upon ETD of the singly modified peptide a single loss of 93 Da from both the singly and doubly charged reduced species reflects the incorporation of a single NN modification ( Fig. 2A), whereas the doubly modified peptide shows two sequential losses of 93 Da, thus indicating two modifications (Fig. 2B). In this example, the sites of modification can be pinpointed directly from the ETD spectra. For the singly modified peptide, the unmodified c 2 , c 3 , z 3 , and z 4 ions and the modified z 8 and c 7 ions localize the first modification on K 6 ; for the doubly modified the absence of the unmodified c 2 , c 3 , z 3 , z 4 , z 5 , singly modified z 8 , and the doubly modified c 7 ions indicates the second modification is located at M 1 . However, in some cases the location of modification cannot be pinpointed from the ETD spectrum alone because of insuffi-cient fragment ions. Thus, a subsequent CID step was included to target the peptide (see supplemental Fig. S3 for an example from ubiquitin), thus ensuring the ability to sequence each modified peptide. A direct MS 3 strategy to sequence the peptides is also possible in which CID is undertaken on the product formed upon the diagnostic neutral loss (-93 Da) in the subsequent ETD, making the strategy amenable to more elegant data-dependent work-flows.
The abundances of the unmodified peptides and NN modified counterparts were quantified through manual integration of peak areas in the TIC profile using QualBrowser software, and the results are summarized in supplemental Table S1 in terms of percent reactivities along with the surface accessibilities estimated from the GetArea algorithm. The surface accessibilities of the sites predicted by the GetArea algorithm based on the previously determined tertiary structure of ubiquitin ranged from 11 to 77% (supplemental Table S1). Ubiquitin was reacted with NN using four protein/NN molar ratios from 1:2 to 1:10, yielding up to three modifications for the low molar ratio of 1:2 and up to five modifications for the higher molar ratios of reagent to intact protein. These results are in general agreement with the expected surface accessibilities of the amine sites given that four Lys residues have surface accessibilities greater than 50% and one Lys is slightly lower at 47% based on the GetArea predictions. These five Lys residues are situated on the surface of the protein and therefore are more accessible to modification whereas the other three primary amines reside on the interior of the tertiary structure, making their side-chains largely inaccessible and unreactive. Three proteases, trypsin, GluC, and ArgC, were used to digest the protein prior to LCMS/MS analysis, thus facilitating identification of the modified peptides in a bottom-up approach. Although trypsin is the most popular protease for conventional proteomics strategies, it proves to be problematic for proteins modified at lysine sites, such as the NN-modified proteins in the present study, in which the modification disrupts proteolytic cleavage after the lysines. However, trypsin does give the most comprehensive sequence coverage of all three digests and thus provided information about some of the more inaccessible residues. Note that any differences in ionization efficiencies of the peptides, whether NN-modified or not and regardless of the sizes, charge states, or elution times, were not compensated nor corrected because the percent reactivity values reflect ratios of abundances of modified to sums of modified and unmodified peptides. Thus, the percent reactivities show relative differences in reactivities of various accessible (or inaccessible) amino acids, not absolute values.
For ubiquitin, K 27 and K 29 were shown to be the least accessible sites as they displayed no reactivity based on the GluC and ArgC results and very little reactivity for the set of trypsin results as shown in supplemental Table S1. These sites are heavily involved in hydrogen bonding on the interior of the protein. Our general reactivity trend for all three proteases was Lys6 Ϸ Lys63 Ϸ Lys 48 Ͼ Met 1 Ͼ Lys 33 Ͼ Lys 11 Ͼ Lys 27, Lys 29; in which the only discrepancy from previously reported results was for the N-terminal methionine which was found to be slightly less reactive than Lys 6, Lys 63, and Lys 48. This trend agrees well with the surface accessibility predictions for the tertiary structure of ubiquitin from GetArea as the N terminus had only a 14% surface accessibility whereas Lys 6, Lys 63, and Lys 48 all averaged about 60% reactivities.
Our NN-based accessibility results are in general agreement with other surface accessibility studies that have been conducted for ubiquitin (47,48). Upon amidination of the intact protein, the Reilly group found that the only inaccessi-ble Lys was K 27 whereas all other possible modification sites were fully amidinated (47). This Lys residue resides in the bottom of a hydrophobic pocket of ubiquitin's tertiary structure, thus leaving it completely shielded from the solvent. A top-down approach that utilized N-hydroxysuccinimidyl acetate to acetylate primary amines in ubiquitin found the reactivity trend to be Met1 Ϸ Lys6 Ϸ Lys48 Ϸ Lys 63 Ͼ Lys 33 Ͼ Lys 11Ͼ Lys 27, Lys 29 (48). The Met1, Lys6, Lys48, and Lys63 sites were found to be the most reactive as they were involved in only weak hydrogen bonds involving other backbone carbonyl groups, whereas Lys 11, Lys 27, and Lys 29 were the least accessible because of their involvement in strong hydrogen bonds to carboxylic acids on other amino acid side chains.
Upon denaturing, the extent of NN reaction significantly changed (supplemental Figs. S2 and S4). These changes in reactivity confirm that the accessibility and thus reactivity of NN is dependent on the conformation of the protein, as desirable for a chemical probe of tertiary structure. Circular dichroism was also used to monitor any potential structural changes that occurred upon NN binding. No notable differences were seen for the NN-modified and native proteins, as exemplified by the CD comparison for native PARP to NNbound PARP in supplemental Fig. S4. The CD results support that the NN reactions do not significantly disrupt the secondary structures of the proteins.
Surface Accessibility of Wheat eIF4E and PARP-1 Domain C-Upon successful demonstration of ETD-selective cleavage of the NN-modified peptides from ubiquitin, the surface maps of two other proteins, eukaryotic translation initiation factor-4E (eIF4E) and Poly(ADP-ribose) polymerase-1 (PARP-1) domain C, were evaluated using a similar strategy. The structure of eIF4E has been extensively studied in mammalian and yeast cells but only recently in wheat (49). Upon crystallization, a dimeric structure has been observed because of the artifact formation of a disulfide bond. The only known monomeric structure was determined for a mutant bound to 7-methyl-GDP that cannot form the disulfide bond (49). Wheat eIF4E has 15 Lys and the N terminus as possible modification sites. At the low molar protein:NN ratios utilized for ubiquitin, minimal modifications were observed for eIF4E; however, at higher protein/NN molar ratios of 1:15 to 1:20, the dominant species was the singly modified protein (Fig. 4A). The relatively high protein/NN ratio required for this protein, as well as the rather low modification rate, suggests that there are fewer highly accessible Lys residues and the native structure is only minimally disrupted after the first modification. The NN-modified protein was then subjected to enzymatic digestion using three proteases (trypsin, GluC, ArgC), followed by LCMS/MS analysis. In each case, spectral acquisition involved collection of full ESI mass spectra, followed by ETD and CID spectra of the five most abundant ions. All peptides were manually identified. The characteristic loss of 93 Da upon ETD was used to pinpoint the NN-modified peptides, and then CID was subsequently used to identify the sequence (see supplemental Fig.  S5 for an example from eIF4E). Unmodified peptides were identified using the ETD spectra for those not exhibiting the loss of 93 Da. Percent reactivities were calculated based on Equation 1 in which the peak area for each modified residue was integrated and compared with the peak areas of all peptides containing the residue using QualBrowser software.
The artifact dimeric structure of wheat eIF4E previously determined by x-ray and NMR measurements (49) arises from disulfide bond formation and does not give a good representation of the monomer because of the extensive artificial protein-protein interface. A structure for a known mutant of wheat eIF4E has also been characterized previously as well, but this mutant is formed upon binding to 7-methyl-GDP which may cause considerable changes in tertiary structure, especially near the binding pocket. Therefore the tertiary structure of monomeric eIF4E was predicted using molecular modeling software, ITASSER (1), prior to calculation of the surface accessibility values for all amino acid side-chains via the GetArea program which were compared to those of the known mutant monomer and dimeric forms ( Table I). The known structure for the dimer, the 7-methyl-GDP mutant, and the monomer structure predicted from ITASSER are shown in Fig. 5 with key Lys residues labeled. The surface accessibility values calculated by GetArea for each structure were compared with the experimental percent reactivities determined for each enzymatic digest based on the LC/ETD/CID strategy in Table I. Based on the surface accessibilities calculated for the ITASSER-predicted monomer, three to four Lys residues are expected to exhibit much higher accessibility (above 90%) than all other possible modification sites; K 18 , K 65 , K 122 , and K 169 . For the ESI-MS/MS results of the tryptic digest of the NN-modified protein, all four of these sites were identified as the primary sites of modification, thus indicating substantial NN reactivities. The N terminus (A 1 ) exhibited low accessibility (2%). Upon increasing the molar ratio of protein/NN from 1:15 to 1:20, the reactivities for K 65 , K 122 , and K 169 all increased to 100% whereas the reactivity of K 18 showed little change and that of the N terminus increased to 26%. The latter change may signal a structural change for this part of the protein upon binding of NN to K 18 , thus opening up the structure so that the N terminus is more readily accessible. For the GluC digest, significant reactivity was observed for A 1 , K 18 , K 122 , and K 169 as seen in the tryptic digest. However, based on the GluC digest, no reactivity values could be determined for K 63 , K 65 , K 80 , or K 89 as no peptides were identified that contained these residues. We believe the absence of these peptides arises from the sizes of the peptides produced via GluC proteolysis and possibly spontaneous disulfide bonds forming following digestion. For the ArgC digest, again K 18 , K 65 , K 122 and K 169 proved to be the most reactive residues with A 1 showing a small amount of reactivity at a 1:15 molar ratio. The lack of reactivity of other Lys sites indicates these Lys resi-  (50,51) has recently been isolated and studied by NMR spectroscopy to determine its solution structure (52). The NMR characterization yielded the "shortened" structure shown in Fig. 6 because three amino acids at the N terminus and 15 amino acids at the C terminus were not detected. No other NMR or x-ray crystallographic structures have been reported for the true full monomeric form of the protein. In its full form, PARP-1 domain C contains 18 Lys residues plus the N terminus as possible modification sites. The "shortened" version determined by NMR maintains all 18 Lys residues but one potential surface accessible reactive site (the true N terminus amine) is excluded in the structural map because the three amino acids at the N terminus are invisible by NMR. Because the tertiary structure of the full sequence version of domain C was unknown, its conformation was predicted by ITASSER (1), resulting in five structures whose surface accessibilities were predicted by the GetArea program.
Upon reaction of PARP-1 Domain C with NN, several residues are modified even at low molar ratios (Fig. 4B). The NN-modified protein was enzymatically digested using trypsin, GluC, or ArgC, and the resulting peptides were analyzed by the LCMS/MS strategy described above and with MS/MS examples shown in Fig. 3. Percent reactivities and surface accessibilities for all known and predicted structures were calculated and summarized in Table II. DISCUSSION This ETD-selectively cleavable surface accessibility reagent allows facile tracking of lysine-reactive sites. Enzymatic digestion of the surface modified proteins results in an array of peptides, some of which contain the hydrazone moiety. As illustrated for the applications involving wheat eIF4E and PARP-C, the hydrazine bond cleaves selectively and with high efficiency upon ETD to yield a unique and characteristic loss of 93 Da from the charge-reduced peptides. CID and/or ETD allows sequencing of the peptides and the locations of the modification sites of the peptides to be pinpointed. eIF4E is a crucial protein for the initiation of protein synthesis (49). In comparing the experimental percent reactivities to the predicted surface accessibilities of the dimer, mutant, and ITASSER-predicted monomer of eIF4E (Table I), a few key variations are noted. In both the dimer and mutant forms, K 18 and K 122 are much less accessible whereas in the predicted monomer form, these residues have surface accessibilities of 96 and 92% respectively, making them two of the most accessible sites. This notable difference mirrors the percent reactivities obtained from the NN protein modification results, confirming that both of these sites are two of the most reactive in wheat eIF4E. K 69 is 100% accessible in the putative dimer but is completely unreactive with NN based on analysis of the various protein digests, indicating it resides in the interior of the monomer. K 172 for the 7-Me-GDP mutant is expected to be located on the protein surface as a highly accessible residue, yet the experimental percent reactivity value is zero. The site of this amino acid coincides with the binding location of the ligand 7-methyl-GDP in the mutant, thus suggesting that the bound ligand induces a significant structural change that does not occur in the native monomer. In light of these differences in predicted accessibilities and experimentally determined reactivities, the NN reactivity results support the ITASSER predicted structure for the native monomer.
PARP-1 is a multi-modular protein consisting of six domains (domains A-F) that is involved in several key biological processes including DNA repair and cell death (50,51). Domain C has been shown to be vital to these functions and without it, PARP-1 ABDEF exhibits no activity by itself. Neither does domain C bind to DNA, confirming that it is the interactions of the entire multi-domain protein that stimulates biological activity. Because the tertiary structure of the full se- quence monomeric version of domain C was unknown, its conformation was predicted by ITASSER, resulting in five structures whose surface accessibilities were predicted by the GetArea program. The surface accessibilities for the shortened domain C structures are shown in Table II. The percent reactivities obtained for the NN-modified peptides do not uniformly parallel the surface accessibilities predicted from the NMR structure, as reflected by the differences in reactivities of a few key residues, including the highly reactive N-terminal G 1 and the nonreactive K 123 which was predicted to be 100% accessible based on the NMR structure. Of the structures predicted by ITASSER, the fifth model (IT5) showed the best correlation with the experimental surface accessibility results. This structure showed a highly reactive N-terminal portion of the protein with G 1 , K 4 , K 7 , and K 10 all having surface accessibility values above 75% whereas the K 123 showed somewhat lower accessibility at 54%. Upon examination of the structure of IT5, the low reactivity of K 123 may be rationalized by steric blocking by the chain of amino acids on the adjacent C terminus (Fig. 6). The percent reactivities and predicted surface accessibilities of the other regions of the protein show good agreement with K 40 , K 76 , K 108 , and K 118 all predicted to be more than 60% accessible, in line with the values from the NN modification reactions. The only residue that does not correlate between the percent reactivity and surface accessibility is K 7 . It was predicted to be one of the more accessible residues but in all cases exhibited low reactivity in comparison to G 1 and K 4 . We hypothesize that this deviation is because of structural effects caused upon modification. Since G 1 and K 4 are so highly reactive, we suspect that upon their modification by NN, the K 7 site is blocked, therefore hindering its reaction with the NN reagent. The modeled structure has not been verified by NMR or crystallographic measurements, thus reinforcing the need for alternative experimental methodologies for probing protein structures.
As reported herein, this selective ETD/CID method has proven to be efficient and robust for several proteins, ultimately providing maps of surface accessibility that are consistent with the predicted structures of the proteins. A decision-tree could be used to automate this method in the future, in which the selective cleavage triggers a subsequent CID scan for only the modified peptides.