Unbiased Selective Isolation of Protein N-terminal Peptides from Complex Proteome Samples Using Phospho Tagging (PTAG) and TiO2-based Depletion*

A positional proteomics strategy for global N-proteome analysis is presented based on phospho tagging (PTAG) of internal peptides followed by depletion by titanium dioxide (TiO2) affinity chromatography. Therefore, N-terminal and lysine amino groups are initially completely dimethylated with formaldehyde at the protein level, after which the proteins are digested and the newly formed internal peptides modified with the PTAG reagent glyceraldhyde-3-phosphate in nearly perfect yields (> 99%). The resulting phosphopeptides are depleted through binding onto TiO2, keeping exclusively a set of N-acetylated and/or N-dimethylated terminal peptides for analysis by liquid chromatography-tandem MS. Analysis of peptides derivatized with differentially labeled isotopic analogs of the PTAG reagent revealed a high depletion efficiency (> 95%). The method enabled identification of 753 unique N-terminal peptides (428 proteins) in N. meningitidis and 928 unique N-terminal peptides (572 proteins) in S. cerevisiae. These included verified neo-N termini from subcellular-relocalized membrane and mitochondrial proteins. The presented PTAG approach is therefore a novel, versatile, and robust method for mass spectrometry-based N-proteome analysis and identification of protease-generated cleavage products.

In shotgun proteomics, proteins are digested into peptides, typically using trypsin as protease, separated by liquid chromatography (LC) 1 , and analyzed by online-coupled tandem mass spectrometry (MS/MS). Identifying significant portions of all proteins present in complex samples by LC-MS remains a major challenge, even for advanced proteomics workflows (1). To address the challenges, new concepts in sample preparation have been proposed, aiming at reduction of sample complexity while preserving the proteome fingerprint (2)(3)(4). The most useful methods for this purpose yield a single, positional-defined peptide for each individual protein. Mc-Donald and Beynon argued that the two most obvious positional locations within every protein are the extreme ends, thus the N-terminal and the C-terminal peptides (positional proteomics) (2). As a result of drastic sample simplification, positional proteomics analysis provides insights in a variety of post-translational modification (PTM) processes and proteolytic processing that proteins may undergo at their N-terminal and C-terminal ends (5,6).
Positional proteomics strategies rely on the ability to differentiate between the N-or C-terminal parts of a protein and the internal counterparts (7,8). Protocols for target enrichment of C-terminal peptides have only recently been introduced (9, 10), mainly because of lower chemical reactivity of C-terminal carboxyl groups compared with N-terminal amino groups. Gevaert et al. (4,11) introduced the well-established combined fractional diagonal chromatography (COFRADIC) technology. In their method, N-terminal sequences are distinguished and separated from the internal peptides by differential labeling of protein N-terminal amino groups on the one hand and the ␣-amino groups of internal proteolytic peptides on the other hand such that the latter obtain a shift in retention on reversed phase chromatography. To prevent the discriminative retention shift for N-terminal amino acid sequences, free amino groups of protein N termini (␣-amino) and lysine side chains (-amino group) are protected by acetylation prior to digestion. Another negative selection approach, proposed by McDonald et al. (2,3) involves the protective blocking of amino groups at the protein level followed by digestion and subsequent depletion of internal peptides by reaction with an amine reactive scavenger resin. Kleifeld et al. (12,13) have developed terminal amine isotope labeling of substrates (TAILS) for the negative selection of N-terminal peptides and identification and quantification of proteolytic events. They used a novel water-soluble aldehyde polymer for the selective binding of ␣-amine containing internal peptides (14). Positive selection methods employ a reversed approach (15). These protocols are based on the incorporation of an affinity group (e.g. biotin) to the protein N-terminal amino groups, followed by digestion and enrichment of the modified N-terminal peptides (16,17). Unwanted cross-reaction with the side chain amino group of lysines is prevented by guanidination (lysine to homoarginine conversion) at the protein level. Selective and complete lysine labeling on the protein level can be problematic, hence the group of Wells introduced an enzymatic approach to selectively label the protein N-terminal amino group in a single step (18). A severe drawback of positive selection approaches is the loss of N-terminal peptides for proteins having naturally acetylated or otherwise modified N termini, because these termini do not react with the affinity labeling agents.
Selective enrichment of N-terminal peptides constitutes a major challenge because of the consecutive sample preparation steps (i.e. protective labeling, purification, and enrichment), each prone to side reactivity and sample losses. Moreover, tryptic protein digests contain many more internal peptides than N-terminal peptides, therefore posing high demands on the efficiency of depletion to prevent significant contamination of the final sample fraction with internal peptides (15). For example, Timmer et al. (16) used NHS-activated biotin for the positive selection of protein N termini, however it was stated that a substantial portion of positive identifications were observed as a result of nonspecific biotinylation (5). In case of amine-reactive scavenger beads, multiple incubation steps were needed for effective coupling of internal peptides (3,19). Zhang et al. reported specific loss of histidinecontaining N-terminal peptides when using NHS-activated Sepharose (19). In addition, histidine-and arginine-containing N-terminal peptides are generally underrepresented in N-terminomics when strong cation exchange (SCX) is used for pre-enrichment prior to depletion of internal peptide (20). N-acetylated N termini, widely present in higher eukaryotes, can be enriched more easily using SCX fractionation (without derivatization chemistry), but unfortunately such approaches are blind for unmodified protein N termini (21)(22)(23). TAILS, however, used water-soluble aldehyde polymer for effective coupling and depletion of internal peptides in a single-step, thereby minimizing possible sample losses (12).
In view of the challenges associated with the enrichment of N-terminal peptides, a novel positional proteomics strategy was developed. The strategy uses a highly selective PTAG-labeling reagent for the modification of internal peptides, after initial protection of protein N termini and lysine side chains. PTAG-derivatized peptides have similar properties as naturally phosphorylated peptides in terms of binding to titanium dioxide (TiO 2 ). Hence, the flow-through fraction of TiO 2 affinity chromatography is highly enriched in N-terminal peptides and could be directly analyzed by LC-MS/MS. It is demonstrated that PTAG is a straightforward and efficient N-proteome enrichment strategy because of the use of extreme selective derivatization chemistry, both at the protein and peptide level, in combination with robust and relative easy-to-implement TiO 2 technology.

EXPERIMENTAL PROCEDURES
Cell Culture-Neisseria meningitidis strain used in this study is a recombinant nonencapsulated variant of the group B isolate H44/76, with a nonfunctional porB gene and truncated galE LPS (24). Bacterial cultures were grown at 35°C in a chemically defined medium (150 ml) in disposable, baffled 500-ml erlenmeyer shakeflasks with vented closure (Nalgene, Rochester, NY) by shaking at 200 rpm (24). Cells were harvested by centrifugation at 13,000 ϫ g for 2 min at 4°C and resuspended in TE buffer (0.1 M EDTA, 1 M Tris-HCl pH 8.0, Sigma Aldrich, Zwijndrecht, The Netherlands) containing 0.5 mg/ml lysozyme (Sigma Aldrich) and incubated at 4°C for 3 min in this medium. Next, proteins were extracted with Trizol (Invitrogen, Blijswijk, The Netherlands) according to the manufacturer's protocol and stored at Ϫ80°C prior to use.
Outer membrane vesicles (OMV) from N. meningitidis (grown as described above) were released by adding EDTA extraction buffer (0.01 M EDTA, 0.1 M Tris-HCl pH 8.6) and further purified by consecutive centrifugation and ultracentrifugation steps, as described by van de Waterbeemd et al. (NOMV protocol) (24). Concentrated OMV (ϳ100 l sample, containing 1 mg total protein) were mixed with 1 ml Trizol reagent and proteins were extracted and stored as described above.
Sacchoromyces cerevisiae strain BJ5460 (LGC Standards, Almere, The Netherlands) was cultured in 150 ml YPD medium in baffled 500-ml erlenmeyer shakeflasks with vented closure (Nalgene) at 30°C, by shaking at 200 rpm. Cells were harvested from 300 ml culture (OD 590 ϭ 1.7) by centrifugation at 2000 ϫ g for 5 min. Cells were washed three times with PBS and resuspended in 200 l lysis buffer (2 M guanidine hydrochloride, 12 mM EDTA, 50 mM Tris-HCl, pH 7.5, Sigma Aldrich) to which 5 l protease inhibitor mixture (Sigma Aldrich) was added. The cell suspension was subjected to three rounds of freeze-thaw cycles. Next, cleaned glass beads were added and cells were further disrupted by six vortex cycles with intermediate cooling steps (at 4°C). Supernatants after each cycle were pooled and centrifuged at 2000 ϫ g at 4°C for 10 min. The resulting supernatant was incubated overnight with a fourfold excess of acetone at Ϫ20°C and proteins were subsequently pelleted at 13,000 ϫ g at 4°C for 10 min. The protein pellet was washed twice with acetone/ water 4/1 (v/v), pelleted after each wash step as described before and dried for 5 min by vacuum centrifugation at room temperature.
Dimethylation of Primary Amines-The protocol for dimethylation of primary amines was adapted from Boersema et al. (25). An aliquot corresponding to 100 g of protein was dissolved in 100 mM KH 2 PO 4 (the pH adjusted to 7.5) containing 4 M guanidine hydrochloride. Disulfide bridges were reduced by adding dithiotreitol (Sigma Aldrich) to a final concentration of 20 mM and incubated 37°C for 30 min. Free thiol groups were alkylated by adding iodoacetamide (Sigma Aldrich) to a final concentration of 100 mM and incubation at room temperature for 30 min in the dark. Excess reagent was exhausted by the addition of dithiotreitol at a final concentration of 100 mM (incubated at 37°C for 30 min). The free N-terminal and lysine amino groups were dimethylated by formaldehyde at a final concentration of 180 mM (CH 2 O, Sigma Aldrich) in the presence of 30 mM freshly prepared sodium cyanoborohydride (NaCNBH 3 , Sigma Aldrich) at room temperature. Freshly prepared sodium cyanoborohydride at a final concentration of 30 mM was added after 1-h and 2-h time intervals and the sample was further incubated overnight. Subsequently, the mixture was diluted 4 times with water to decrease the guanidine hydrochloride concentration to less than 1 M and proteins were extracted by acetone precipitation as described above. Precipitated proteins were reconstituted in 15 l of 100 mM KH 2 PO 4 (pH 7.5) containing 4 M guanidine hydrochloride. Excess formaldehyde was exhausted by the addition 50 l of 1 M ammonia hydroxide (Sigma Aldrich) and incubation at room temperature for 1 h. Ammonium hydroxide was removed by vacuum centrifugation at room temperature till dryness.
Protein Digestion-Dimethylated proteins were (parallel) digested in 50 l 100 mM KH 2 PO 4 (pH 7.5) and guanidine hydrochloride with a concentration of less than 1 M with either chymotrypsin (Roche, Woerden, The Netherlands) in 4 mM calcium chloride (Sigma Aldrich) at 37°C, trypsin (Promega, Leiden, The Netherlands) at 37°C or endoprotease GluC (Roche) at room temperature, all with an enzyme/ protein ratio of 1:20 (w/w). After 1 h, digest mixtures were diluted twice with 100 mM KH 2 PO 4 (pH 7.5) to reduce to guanidine concentration to less than 0.5 M. Fresh enzyme was added at a enzyme/ protein ratio of 1:20 w/w and the mixture was further incubated overnight at similar temperature conditions as described above.
Removal Pyroglutamate-N-terminal glutamine was enzymatically cyclized by glutamine cyclotransferase (Qcyclase, Qiagen, Venlo, The Netherlands) and the formed pyroglutamyl moiety was subsequently cleaved by the aminopeptidase pGAPase (Qiagen). This protocol was adapted from Staes et al. (20), with adjustment of the incubation time to 2 h at 37°C.
Preparation of PTAG Derivates-The free amino groups of the internal peptides were PTAG derivatized in 100 mM KH 2 PO 4 (pH 7.5) with DL-glyceraldehyde-3-phosphate (Sigma Aldrich) at a final concentration of 90 mM and freshly prepared sodium cyanoborohydride in a final concentration of 30 mM at room temperature. Freshly prepared sodium cyanoborohydride was added at 1 h and 2 h intervals and the reaction mixture was further incubated overnight. Following PTAG derivatization, peptide mixtures were extensively purified by C18 solid phase extraction (SPE) chromatography (column dimensions 5 cm (L) ϫ 200 m inner diameter (ID), in-house packed with 5 m Reprosil Pur C18-AQ, Dr Maisch, Ammerbuch-Entringen, Germany).
Depletion of PTAG-peptides-PTAG-peptides were depleted by TiO 2 affinity chromatography, essentially as previously described (26,27). Briefly, SPE-purified samples were evaporated to dryness, reconstituted in 0.1 M acetic acid in water (pH 2.7) and loaded onto a short TiO 2 column at a flow rate of 5 l/min (100 l injection loop). The short TiO 2 column comprises of a 1-mm ID PEEK tubing with an Upchurch (Oak Harbor, U.S.A.) 360-mm ID adapter at the front and end for connection to (fritted) microcapillary tubing and is slurrypacked with a 10-mm (L) bed of 5 m Titansphere particles (10 mg) (GL Sciences, Tokyo, Japan). Unretained peptides were collected in the void volume. Next, the TiO 2 column was extensively washed with a 100-l plug (three column volumes) of acetonitrile/water/dimethyl sulfoxide in 0.1 M acetic acid (45:45:10, v/v/v) (Sigma Aldrich). The TiO 2 flow-through fraction and the wash fraction were pooled, evaporated to dryness by vacuum centrifugation and reconstituted in formic acid/DMSO in water (5:5, v/v) and stored at Ϫ20°C until analysis.
LC-MS/MS Analysis-N-terminal peptides-enriched samples (TiO 2 flow-through fraction) were prefractionated offline (6 -14 fractions) using a mixed bed anion-cation column as described by Motoyama et al. (28) or directly analyzed on an LTQ-Orbitrap XL instrument (Thermo Fisher Scientific, Bremen, Germany) and Agilent 1100 HPLC system (Agilent, Amstelveen, The Netherlands) modified for nanoflow LC separations as described previously by Meiring et al. (29). All columns were packed in house. The trap column was a 100-m ID fritted microcapillary packed with 20 mm, 5-m particle size Reprosil Pur C18-AQ particles (Dr. Maisch, Ammerbuch-Entringen, Germany). The analytical column was a 50-m ID microcapillary packed with 31 cm 3-m particle size Reprosil Pur C18-AQ, with an integral-pulled tip and operated at a flowrate of 125 nL/min. ESI voltage, typically 1.8 kV, was applied by liquid junction at the top of the column. Solvent A consisted of 0.1 M acetic acid (Sigma Aldrich) in deionized water (Mili-Q, Millipore, Amsterdam, The Netherlands) and solvent B of 0.1 M acetic acid in acetonitrile (Biosolve, Valkenswaard, The Netherlands). Gradients were as follows: 100% solvent A during sample loading (0 -10 min, flowrate 5 l/min), 7% to 26% solvent B in 160 min followed by an increase to 60% solvent B in 20 min and reconditioning with solvent A for 10 min (total runtime 200 min). The mass spectrometer was set to acquire full MS spectra (m/z 350 to 1500) for mass analysis in the orbitrap at 60,000 resolution (FWHM) followed by data-dependent MS/MS analysis (LTQ) for the top 5 or 7 abundant precursor ions above a threshold value of 10,000 counts. The normalized collision energy was set to 35%, isolation width to 2.0 Da, activation Q to 0.250 and activation time to 30 ms. The maximum ion time (dwell time) for MS scans was set to 200 ms and for MS/MS scans to 2500 ms. Charge state screening and preview mode were enabled. Precursor ions with unknown and ϩ1 charge states were excluded for subsequent MS/MS analysis. Dynamic exclusion was enabled (exclusion size list 500) with repeat set to 1 and an exclusion duration of 180 s. The background ion at m/z 391.284280 was used as lock mass for internal mass calibration.
Data Analysis-The analysis of mass spectrometric RAW data was carried out using Proteome Discoverer 1.2 software (Thermo Fisher Scientific) applying standard settings unless otherwise noted. MS/MS scans were searched against the N. meningitidis strain MC58 database (containing 2003 entries, 2010, Uniprot) or the S. cerevisiae SGD database (http://www.yeastgenome.org, 2010, containing 5821 entries) using SEQUEST (Proteome Discoverer 1.2, Thermo Fisher Scientific). Precursor ion and MS/MS tolerances were set to 10 ppm and 0.8 Da, respectively. Decoy database searches were performed with false discovery rate (FDR) tolerances set to 5 and 1% for modest and high confidence filtering settings, respectively. The data were searched separately with either no enzyme, C-terminal trypsin cleavage specificity, C-terminal chymotrypsin cleavage specificity, or Cterminal GluC cleavage specificity allowing five miss-cleavages because lysine cleavage is prevented by dimethyl modification. Cysteine carbamidomethyl, N-terminal dimethylation, and lysine dimethylation were set as fixed modifications whereas aspargine deamidation and methionine oxidation were set as variable modifications. Similar searches were performed for alternative modifications by substituting N-terminal dimethylation modification with acetylation, ammonia loss, no modification, glyceraldehyde-3-phosphate, and monomethylation (supplemental Tables S3-S5). N-terminal dimethylated, N-terminal acetylated, and N-terminal monomethylated proline-starting peptide sequences with high confidence (Xcorr values Ͼ 2.2, false discovery rates Ͻ 1%), rank No. 1 and linear sequences within the first 70 amino acids were considered for manual data analysis. Peptides possessing charge states of 6ϩ and higher were excluded. True N-terminal peptides (initiator methionine and methionine cleavage) were kept in the final data set as well as proteins with signal peptide or transit peptide cleavage sites. Peptidase cleavages sites were verified by prediction software (30) or previous data (31) (supplemental Tables  S6, S7). Annotated spectra are provided for proteins with only a single confident peptide identification (supplemental Spectra S1, S2). The raw data files and protocols associated with this manuscript are available to the reader if requested.
Assessment of Binding Efficiency of PTAG-peptides-Synthesis of isotopically deuterium-labeled PTAG reagents was started from either D 0 -tetrahydrofuran or D 8 -tetrahydrofuran (Sigma Aldrich). 4-Bromobutyl acetate was obtained by ring-opening of tetrahydrofuran and nucleophilic substitution (into the acetate ester) by incubation with 33% hydrobromic acid in acetic acid (Sigma Aldrich) at room temperature for 1 h. Tetramethylammonium salt of dibenzylphosphate was prepared from the drop-wise addition of tetramethylammonium hydroxide into a solution of dibenzylphosphate in methanol/acetone 1:1 (v/v) (Sigma Aldrich) at Ϫ10°C. Tetramethylammonium salt of dibenzylphosphate was refluxed with 4-bromobutyl acetate for 5 h in dioxane (Sigma Aldrich). The resulting dibenzyl-4-acetatebutyl phosphate was purified on silica and hydrogenated into dibenzyl-4-hydroxybutyl phosphate with 0.4 M sodium carbonate (Sigma Aldrich) in 1:1 (v/v) ethanol/water at room temperature for 24 h. The product was extracted from the ethanol/water mixture with dichloromethane and subsequently dried on anhydrous magnesium sulfate (Baker, Deventer, The Netherlands). Dibenzyl-4-hydroxybutyl phosphate was oxidized into dibenzyl-4-oxobutyl phosphate by the incubation with a catalytic amount of pyridinium chlorochromate (PCC, Sigma Aldrich) in dichloromethane at room temperature for 1 h. Dibenzyl-4-oxobutyl phosphate was purified on silicagel and the dibenzylphosphate functionality was reduced by a catalytic amount of 10% palladium on carbon (Sigma Aldrich) under a hydrogen atmosphere at room temperature for 1 h. The structure of the final products, D 0 -4-oxobutyl dihydrogen phosphate and D 5 -4-oxobutyl dihydrogen phosphate, were verified by NMR (Joel (400 Mhz), Tokyo, Japan). Partial hydrogen/deuterium exchange was observed as a result of enolization of the deuterated carbonyl functionality. After equilibration of the enolization reaction (storage in water), a ⌬M ϭ 5.0 Da between the isotopic variants was obtained.
The TiO 2 binding efficiencies of the PTAG-peptides were evaluated using the inhouse-synthesized, isotopically labeled PTAG-reagents. An aliquot corresponding to 100 g of protein from the OMV fraction of N. meningitidis was dimethylated as described above. Proteins were digested with trypsin and free ␣-amines of internal peptides were PTAG derivatized with an equimolar mixture of D 0 -4-oxobutyl dihydrogen phosphate and D 5 -4-oxobutyl dihydrogen phosphate and the addition of freshly prepared sodium cyanoborohydride (as described above). Peptide mixtures were subjected to TiO 2 affinity chromatography (as described above) and the flow-through fraction was analyzed by LC-MS. Raw data files were converted to the NetCDF file format and imported into MsXelerator software package (v.2.9, Ms-Metrix, Maarssen, The Netherlands). Co-eluting mass spectral doublets with a ⌬M of 5.0306 (PTAG-peptides) were extracted by the search algorithm as previously described by Meiring et al. (32).

RESULTS
General Description of N-proteome Enrichment-A schematic overview of phospho tagging (PTAG) for global N-proteome analysis is depicted in Fig. 1. The strategy starts with the denaturation of proteins by reduction and alkylation of cysteines to enhance the accessibility of amino groups for chemical labeling and erratic depletion of cysteine linkages in N-terminal peptides. Reductive dimethylation of primary amines using formaldehyde and sodium cyanoborohydride simultaneously blocks the free ␣-amines of protein N termini, except when they are already in vivo blocked by N-acetylation, as well as -amines of the lysine side chains (25). Sub-sequent, trypsination of the proteome will result in an ArgC like digestion pattern as lysine cleavage is prevented because of the dimethyl modification, as previously shown (12). The free N-terminal ␣-amines of the, upon digestion, newly generated internal peptides are susceptible for tagging with the PTAG reagent glyceraldehyde-3-phoshate (GAP3) (Fig. 2). PTAG-labeled peptides are subsequently depleted through binding to TiO 2 , with the flow-through fraction being highly enriched in N-acetylated and N-dimethylated peptides is analyzed by LC-MS/MS (supplemental Fig. S1). Confident protein assignment may be problematic in N-terminal peptide enrichment strategies because identification is inherently based on a single peptide, referred to as "one-hit wonders" (33). For this reason, parallel replicates of each proteome sample were digested with three different proteases (trypsin, endoprotease GluC, and chymotrypsin) to generate overlapping N-terminal peptides with different lengths. In addition, N-proteome coverage is increased by the parallel use of proteases, because trypsin alone may generate N-terminal peptides of inappropriate length or poor ionization and fragmentation properties to be identified by LC-MS/MS (2,22). Despite the multiple digestion strategy, many proteins are expected to be identified based on a single peptide. To enhance confidence in protein assignment for these single peptide identifications, offline ion exchange prefractionation (typically six fractions) was performed in combination with highly accurate MS analysis using an LTQ-Orbitrap to increase the number of MS/MS identifications of a single peptide from technical replicates, different charge states, and deamidation or oxidation states (12). Also, stringent database search criteria were used (false discovery rates Ͻ 1%) to obtain high confidence in sequence assignment and annotated spectra for proteins with a single peptide identification are provided (supplemental Spectra S1, S2).
Dimethyl Labeling on the Protein Level-High efficiency in reductive dimethylation of N termini and lysine side chains is critical to minimize erratic depletion of N-terminal peptides (13). Protective labeling of these amino groups with formaldehyde was investigated for the standard protein cytochrome C. Progress of the reaction was determined on the six available -amine groups of the chymotryptic acetylated N-terminal peptide Ac-GDVEKGKKIF. At 3 hours, the reaction yielded an incorporation of 5.9 methyl groups (98%) (supplemental Fig. S2). The reaction was driven to be essentially quantitative by the addition of freshly prepared cyanoborohydride after the 1-h and 2-h points followed by overnight incubation (data not shown).
Phospho Tagging (PTAG) of Internal Peptides-PTAG-labeling of proteolytically generated internal peptides should reach completion to minimize copurification of unmodified internal peptides in the final sample mixture. Evaluation of the derivatization reaction for the internal peptides generated from chymotryptic digestion of cytochome C yielded Ͼ 99% efficiency without any side reactions. Similar results were found for a more complex (100 g) trypsinated OMV fraction from N. meningitidis. OMV are nano-sized spherical proteinlipid vesicles which can be used as a vaccine against serogroup B meningitis (34). The main antigen present in OMV is the porin A protein (PorA), which constitutes ϳ75% of the total protein content. For proteolytically digested OMV proteome, with abundances spanning several orders of magnitude, PTAG derivatization chemistry was nearly quantitative (Ͼ 99%). Despite the high labeling yields, unmodified residues could still be detected however at greatly reduced abundances (Ͻ 1%) compared with their PTAG-labeled analogs (supplemental Fig. S3). Slow and/or incomplete reactivity of proline-starting internal peptides, as reported by others (4,20), was not observed for the PTAG reagent.
Depletion Efficiency of PTAG-peptides-The most critical step in positional proteomics is the effective depletion of internal peptides while maintaining maximum recovery of Nterminal peptides. The depletion efficiency of internal peptides for the PTAG strategy was evaluated for the chymotryptic digest of cytochrome C (supplemental Fig. S4). The naturally acetylated N-terminal peptide of cytochrome C (Ac-GDVEKGKKIF) was detected as the most predominant ion trace in the flow-through fraction.
For the N-proteome analysis of complex proteome samples, a high capacity, offline TiO 2 affinity chromatography column was prepared in-house (slurry packed). Stringent washing conditions were included to minimize nonspecific binding (35) (supplemental Table S2). A 100-g proteolytic digest of OMV was subjected to TiO 2 affinity chromatography (10 mg beads) and effective depletion of PTAG peptides was established in less than 20 min at a 1:100 peptide-to-beads ratio (w/w). This is illustrated by the substantial sample simplification of a proteolytic digest of OMV fraction of N. meningitidis (Fig. 3). After depletion of internal peptides, N termini of several low abundant proteins were recovered as most predominant base peak traces in the chromatogram along with the high abundant N terminus of PorA.
Evaluation of MS/MS-data revealed that PTAG-peptides with poor binding affinity were occasionally identified in the final sample (data not shown). However, identification of these peptides by data-dependent MS/MS analysis may not reflect the total fraction because identification is hampered by substantial neutral loss of phosphate during CID fragmentation (36). For this purpose, D 0 -and D 5 -labeled analogs of the PTAG reagent were synthesized in-house and employed to discriminate PTAG peptides by their unique 5-Da mass difference (doublets) from N-terminal peptides (singlets). In a single LC-MS/MS analysis of a N-terminally enriched OMV sample, ϳ5% of all peptide ions were assigned as poorly retained PTAG peptides (supplemental Fig. S5), thereby again demonstrating the high efficiency in depletion even for complex and high dynamic range proteome samples.
Peptide Modifications-The result of the PTAG strategy applied to a tryptic whole-cell lystate of N. meningitidis is summarized in Table I. In total, 645 unique N-terminal dim-FIG. 2. Phospho-tagging reaction scheme. N-terminal ␣-amines of proteolytically generated peptides react with glyceraldehyde-3phosphate (GAP3) to form Schiff bases that are rapidly reduced in the presence of cyanoborohydride. Note, the -amines of lysine side chains are protected by dimethylation with formaldehyde and cyanoborohydride on the protein level. ethylated peptides were identified by offline ion exchange prefractionation (6 fractions) of the TiO 2 flow-through fraction and subsequent LC-MS/MS analysis. Of these N-terminaldimethylated peptides, 423 peptides (312 proteins) were annotated as true protein N-terminal peptides with initiator methionine residues, methionine cleavage sites, and proteins with signal peptide cleavage sites. Redundancy was especially observed for high abundant proteins. Multiple unique N-terminal peptides per protein were identified as a result of incomplete initiator methionine or signal peptide processing or unspecific C-terminal cleavage specificity of trypsin. The remaining 222 N-terminal-dimethylated peptides are socalled neo-N termini derived from internal cleavage sites throughout the protein. Neo-cleavage sites may originate from unknown proteolytic activity before (in vivo) or during harvesting (cell lysis), or from protein degradation during sample preparation (37).
The high labeling efficiency of the method is underlined by the identification of only 45 unmodified internal peptides (free ␣-amine). These peptides lack a PTAG affinity label as a result of incomplete derivatization chemistry, but are detected in dramatically reduced abundance compared with their PTAG-  The number of unique a N-dimethyl-or N-acetyl-modified peptides are listed. Dimethylated N-terminal peptides include (true) protein N-terminal peptides and neo-N-terminal peptides with unknown cleavage specificity from e.g. (in vivo) proteolytic activity. Only N-dimethylated peptides that are assigned to as (true) protein N-termini indicated by retained initiator methionine, methionine cleavage, and proteins with signal/transit peptide cleavage are used for further data evaluation. The TiO 2 flow-through fraction was co-purified with N-unmodified (internal) peptides (lacking a PTAG because of incomplete derivatization), N-terminally cyclized peptides (N-pyroglutamate, N-pyrocarbamidomethyl cysteine, and N-aspargine (P2)) and N-monomethylated peptides. a Peptide deamination and oxidation states are not considered unique. b N-terminal glutamate was enzymatically cyclized by Qcyclase to N-pyroglutamate and subsequently cleaved by pGAPase generating an ␣-amine for PTAG and depletion. This enzymatic approach is especially inefficient for peptides with a peptidase inhibiting proline (P2) residue next to N-pyroglutamate (depicted between brackets). c Cyclization of N-terminal carbamidomethylated (iodoacetamide) cysteine residues. d N-terminal cyclization of the aspargine side chain is restricted for position 2 (P2) residues. e N-terminal monomethylation of internal peptides as a consequence of residual activity of the reagent formaldehyde during trypsin digestion.
labeled analogs (supplemental Fig. S3). Cyclized N termini are also enriched in the TiO 2 flow-through fraction because these peptides lack a free ␣-amine for PTAG derivatization. Especially N-pyroglutamyl residues were frequently identified during method development (data not shown). Staes et al. (20) introduced a method to generate a free ␣-amine (for subsequent labeling) by enzymatic cyclization of N-terminal glutamine (Qcyclase) and subsequent cleavage (pGAPase) of the formed N-pyroglutamyl moiety. This method was successfully implemented in the PTAG strategy because only 14 N-pyroglutamyl peptides were identified of which 12 peptides had a pGAPase inhibiting proline residue following the pyroglutamyl cleavage site. Furthermore, 35 cyclized peptides with a N-terminal, iodoacetamine-alkylated cysteine residue (pyrocarbamidomethyl cysteine) (20) and 30 peptides where the position 2 aspargine side chain is N-terminally cyclized (X-Asn-X motif) (38) were recovered in the flow-through fraction. For proline starting proteins (39), the N terminus (initiator methionine cleavage) is incorporated by a single methyl group by protective labeling with formaldehyde (N-monomethylation). Data evaluation revealed that 56 peptides were Nmonomethylated. Several peptides could successfully be assigned as true proline starting protein N termini, however the vast majority of peptides was identified as internal peptides. It appeared that, because of residual activity of the reagent formaldehyde, proteolytically generated internal peptides were (inefficiently) labeled at the N terminus by a single methyl group, thereby blocking subsequent PTAG derivatization.
The peptide identification results for tryptic OMV fractions from N. meningitidis and tryptic S. cerevisiae whole cells are in general similar as discussed above (Table I). As expected, also quite a few N-acetylated termini (333) were detected in S. cerevisiae and none in N. meningitidis. Of note, a high number of neo-N termini from internal cleavage sites were found for OMV in comparison to whole cell lysates. This high number may have been a result of proteolytic activity upon cell lysis to stimulate OMV release or from protein degradation during the OMV-purification procedure.  Table S1). These data sets were combined to reduce the number of single peptide identifications (one hit wonders) and hence reduce the chance of false positive peptide and protein identifications. In total, 753 unique N-terminal peptides were identified for N. meningitidis (428 proteins) predominantly by tryptic samples (Fig. 4). The majority of the proteins (75%) were assigned by multiple MS/MS spectra of the same peptide at different charge states, deamidation, or oxidation states or, more importantly, overlapping peptides with different lengths (supplemental Table S1).
N termini were assigned with retained initiator methionine, methionine cleavage sites, and proteins with signal peptide cleavage (Fig. 5A). Analysis of the native protein N termini provides a profile for in vivo specificity of methionine removal (Fig 5B). This process depends on the amino acid at position 2 and is preferentially associated with the small and uncharged residues serine, alanine, proline, glycine, and cysteine (39). Furthermore, 123 neo-N-terminal peptides (43 proteins) were identified for proteins of which the signal peptide was removed (Fig 5A). The secretory signal peptide that targets a protein for translocation across membranes is typically 14 -30 amino acids long and is removed by a signal peptidase upon translocation (30). Signal peptide cleavage sites of 43 identified proteins could be accurately verified by prediction software (30) because cleavage is governed by distinct zones (basic N terminus, a hydrophobic region, and a C-terminal region) with high sequence consistency around the cleavage site.
N-proteome analysis of S. cerevisiae resulted in the identification of 928 unique N-terminal peptides that could be assigned to 572 proteins (Fig. 4). Approximately 45% of the proteins were fully or partially N-acetylated, preferentially at alanine, serine, threonine, and methionine (Figs. 5C-5D). Methionine cleavage specificity is conserved in prokaryotes and eukaryotes and preferential for small, uncharged residues at position 2 (Fig. 5E) (30). In addition, 125 neo-N-terminal peptides (71 proteins) were identified for mitochondrial proteins of which the transit peptide was cleaved. Hallmarks for transit peptide cleavages are less well-described and predictable than those for signal peptides in prokaryotes (30). The sequence consistency around the cleavage site is low, with arginine in position -3 or -2 relative to the cleavage site as the most common motif. For this purpose, accurate verification of peptidase cleavage site specificity was not performed by prediction (30), but instead a comparison was made with a recently published COFRADIC study. Vogtle et al. (31) characterized transit peptide cleavage specificity of an enriched mitochondrial protein fraction of S. cerevisiae. Identical cleavage sites were found for both positional proteomics strategies for the majority of the PTAG-identified mitochondrial proteins (46 out of 71 proteins).
Recent N-proteome enrichment procedures suffered from under-representation of histidine and arginine containing Nterminal peptides as a result of charge retention on SCX pre-enrichment or side reactivity of histidine residues with NHS-activated Sepharose beads (19,20). To investigate possible undersampling, the frequency distribution of the amino acids (P2) following the N-terminal methionine was calculated for the experimental data as the theoretical proteome (proteincoding genes). The experimental data correlates reasonably well with the frequency distribution in the complete proteome set for both N. meningitidis as S. cerevisiae, demonstrating that the obtained data are representative for the global Nproteome (Fig. 6).

DISCUSSION
In MS-based proteomics, complexity in protein samples greatly limits proteome coverage and identification of low abundant proteins (40). Several advanced enrichment proto- . Methionine removal is preferred for small and uncharged residues at the P2 position. For S. cerevisiae, protein N termini were identified as naturally acetylated (45%) and free N termini, which are dimethylated in the PTAG protocol (labeled N termini, 55%). From the pool of acetylated proteins, 79% showed the methionine cleavage whereas 21% retained the methionine residue. Frequency distribution of naturally acetylated amino acids at the protein N termini (D). Labeled N termini are annotated as retained initiator methionine, methionine cleavage, and proteins with signal/transit peptide cleavage (C). Methionine cleavage (open bars) is similar to prokaryotes preferred for small and uncharged residues at the P2 position (E). cols have been introduced over the recent years to address this problem, including N-terminal positional proteomics strategies (8,41). Enrichment of N-terminal peptides by complete removal of internal peptides constitutes a major challenge (15). PTAG was developed for specific labeling of free ␣-amines of internal peptides after proteolytic digestion. Subsequent, PTAG-labeled peptides were depleted by TiO 2 affinity chromatography (Fig. 1). High labeling efficiency was established, both at the protein level and at the peptide level, essential for minimizing the loss of N-terminal peptides. Dimethylation by formaldehyde was preferred because it is inexpensive, resistant to hydrolysis, and full labeling of protein N termini and lysine side chains was accomplished (supplemental Fig. S2). In addition, the availability of stable isotopically labeled analogs of formaldehyde and cyanoborohydride enables multiplexed relative quantification in combination with PTAG (12,25). Internal peptides were selectively modified by the commercially available PTAG reagent glyceraldehyde-3phosphate (GAP3). This reagent is similar to formaldehyde, resistant to hydrolysis, and showed a high conversion yields (Ͼ 99%), thereby reducing possible background contamination of internal peptides (supplemental Fig. S3). Excellent compatibility to reversed phase C18 ensured easy removal of salts and excess reagents prior to either TiO 2 affinity chromatography or direct LC-MS/MS analysis.
For the effective depletion of proteolytically generated internal peptides from a 100-g proteome sample, a straightforward TiO 2 -based depletion strategy was developed using in-house slurry-packed columns and including stringent washing conditions to minimize nonspecific adsorption (35) (supplemental Table S2). The ability to efficiently reduce sample complexity and identify a large number of proteins (207 proteins) in the OMV fraction of N. meningitidis, with dynamic range spanning several orders of magnitude, underlines the excellent recovery and selectivity of this strategy (Fig. 3). It should be noted that the final enriched sample fraction of this complex OMV digest is slightly biased by breakthrough of poorly retained PTAG peptides (5%). These peptides were assigned by employing stable isotopically labeled variants of the PTAG reagent (supplemental Figs. S5 and S6).
N-proteome analysis of N. meningitidis and S. cerevisiae by the PTAG strategy resulted in the identification of 753 (428 proteins) and 928 (572 proteins) unique N-terminal peptides, respectively. Characterization of native protein N termini provided a profile for in vivo specificity of methionine removal. This process is conserved in prokaryotes and eukaryotes and preferentially associated to small and uncharged residues (Fig. 5) (42). Furthermore, in S. cerevisiae, about 45% of the proteins were fully or partially N-terminally acetylated on alanine, serine, threonine, or methionine (Fig. 5), which is a similar percentage as recently found by others (43). N-proteome data also include neo-N termini from subcellularrelocalized membrane (43 proteins, N. meningitidis) or mitochondrial proteins (71 proteins, S. cerevisiae). Accurate assignment of protease substrate cleavage sites could be problematic because these cleavage sites are typically not annotated in protein sequence databases (44). Moreover, copurification of neo-N termini generated from background proteolytic activity with unknown specificity could result in the false positive identification of cleavage sites (12). For this purpose, signal peptide and transit peptide cleavage specificity of the reported proteins were confirmed by either prediction (30) or by experimental data from a recent COFRADIC study (18).
Confidence in protein assignment may be problematic in N-proteome analysis as the majority of proteins are identified by a single peptide. Similar to other positional proteomics strategies (3,45), prefractionation of the N-terminally enriched peptide fraction was performed to increase the number of MS/MS identifications per peptide and combined with strict database search criteria (FDR Ͻ 1%) to minimize the number false positive peptide identifications. Also, a number of proteases were used to generate N-terminal peptides with different lengths and hence not only enhance the confidence in protein assignment but also increase proteome coverage (Fig.  3). It was estimated by in silico analysis that 50% of trypsin- FIG. 6. N-proteome-wide frequency distribution of the position 2 (P2) amino acid following the initiator methionine. The frequency distribution calculated for the complete set of proteins (protein-coding genes) is depicted in black bars and the frequency distribution for the experimentally identified protein N termini is depicted in open bars. For both N. meningitidis (A) and S. cerevisiae (B), the experimental data correlates with theoretical data, indicating that the outcome the PTAG strategy is representative for the global N-proteome. generated N-terminal peptides could be analyzed by LC-MS/MS and parallel use of trypsin and GluC would increase coverage up to 80% (2,15). In practice, the increase in unique protein identifications for the use GluC in addition to trypsin is ϳ15% (Fig. 3). This emphasizes one of the major challenges in N-proteomics: the identification of proteins by single N-terminal peptides with suitable LC-MS/MS characteristics, e.g. peptide length, hydrophobicity, ionization efficiencies, and fragmentation properties. Alternative fragmentation techniques complementary to collision induced fragmentation (CID) may be particularly beneficial in N-proteomics. A substantial fraction of the sample consists of long and highly charged N-terminal peptides with poor CID-fragmentation behavior. These long and highly charged peptides are generated by the ArgC digestion pattern of trypsin in combination with the remaining ionic state of dimethylated lysine residues. Electron transfer dissociation (ETD) and higher-energy collisional dissociation (HCD) have shown to be more effective for these highly charged peptides (46).
Recently, a number of large scale N-proteome studies for prokaryotes and S. cerevisiae were published. Timmer et al. (16,17) characterized the N-proteome of Escherichia coli by positive selection and identified 393 proteins, whereas Mc Donald et al. (2) identified more than 300 proteins in E. coli using NHS-activated Sepharose beads for depletion of internal peptides. N-proteome analysis of S. cerevisiae by COFRADIC in combination with SCX resulted in the identification of 650 unique N-terminal peptides (47). Helbig et al. (48) characterized protein N-acetylation in S. cerevisiae with SCX and detected 756 N-acetylated protein termini, including acetylated neo-N termini from internal cleavage sites. PTAG enabled the identification and characterization of 753 unique N-terminal peptides (428 proteins) in N. meningitidis and 928 unique N-terminal peptides (572 proteins in S. cerevisiae), thereby representing one of the largest N-proteome data sets for these organisms so far. More detailed evaluations of S. cerevisiae N-proteome data provided by Helsens et al. (47), Helbig et al. (48), and N-proteome analysis by PTAG, revealed relatively small overlaps between the studies (supplemental Fig. S6). The small overlaps are most likely because of undersampling of the full proteome or differences associated with the used method. For example, the study of Helbig et al. (48) was restricted to Nacetylated peptides whereas Helsens et al. (47) used SCX pre-enrichment with known undersampling of histidine containing N-terminal peptides (19,20). For N-proteome analysis by PTAG there is no indication of undersampling of certain amino acid sequences (Fig. 6). Of note, a clear disadvantage of the PTAG strategy is the inability to recover N termini of proteins of which the N-terminal domain is phosphorylated by nature. N-terminal phosphorylation has, for example, been demonstrated in the p53 tumor suppression protein (49) and several crucial proteins of the photosystem II (PSII) in A. thaliana (50).
In conclusion, the PTAG positional proteomics strategy greatly reduces sample complexity, while maintaining the N-proteome fingerprint of whole cell lysates. The use of commercially available and highly reactive PTAG reagents in combination with high performance TiO 2 affinity chromatography provide a robust platform for global N-proteome analysis and MS-based profiling of proteolytic events. PTAG for unbiased selective isolation of protein N termini is therefore a welcome alternative to well-established positional proteomics strategies.