Proteolytic Post-translational Modification of Proteins: Proteomic Tools and Methodology*

Proteolytic processing is a ubiquitous and irreversible post-translational modification involving limited and highly specific hydrolysis of peptide and isopeptide bonds of a protein by a protease. Cleavage generates shorter protein chains displaying neo-N and -C termini, often with new or modified biological activities. Within the past decade, degradomics and terminomics have emerged as significant proteomics subfields dedicated to characterizing proteolysis products as well as natural protein N and C termini. Here we provide an overview of contemporary proteomics-based methods, including specific quantitation, data analysis, and curation considerations, and highlight exciting new and emerging applications within these fields enabling in vivo analysis of proteolytic events.

Proteolysis involves the breakdown of proteins into smaller polypeptides or amino acids through the hydrolysis of peptide bonds by a protease. This represents a remarkably significant, but often underappreciated, post-translational modification (PTM) 1 in that is it irreversible yet also ubiquitous. Consequently, the functional sequence of a protein can very rarely be predicted from its transcript, as proteolysis products form new (neo-) N and C termini. These cleavage events, or proteolytic processing events, can result in activation, inactivation, completely altered protein function, and even excision of "neo-proteins" with growth factor activity from an extracellular matrix parent molecule, and they regulate a vast array of biological processes (1). These include DNA replication, cell cycle progression, cell proliferation, and cell death, as well as pathological processes such as inflammation, cancer, arthritis, and cardiovascular disease. For example, in protein synthesis and maturation, precise selective removal of the N-terminal methionine and the signal peptide is essential for correct protein maturation and secretion. In some proteins, scission of the chain forms a molecule with four termini when linked by disulfide bridges. Through the removal of signal, nuclear, and mitochondrial localization sequences and ectodomain shedding, proteases regulate protein localization, and in viral infection, via cleavage of pre-and pro-domains and polyprotein processing, inactive proteins are converted into their active form(s), are inactivated, or change receptor-binding affinity. Thus, proteolysis is involved in much more than the mere degradation and turnover of proteins, important though these processes are in homeostasis.
Proteases exist in all orders of life and constitute one of the largest enzyme families in humans (2), and more than 30 drugs targeting these enzymes are currently approved for clinical use (3). However, in order to fully comprehend the cellular function(s) of a given protease, one must have knowledge of the proteins processed by that protease, as well as the functions of these substrates and specific processing events. This is currently far from the case, as half of all human proteases have no known substrates (4). Degradomics is the application of high-throughput approaches to study proteases, their substrates, and their inhibitors on a system-wide scale (4). More specifically, terminomics is the specific characterization of protein N and C termini and, as such, forms a subfield of degradomics. This review provides an overview of current proteomics-based methods for characterizing protease cleavage events and protein termini. The quantitation, analysis, and curation of proteomics data, as well as exciting new applications within these fields, are also considered.
Methods for Characterizing Active Site Specificity-Several array-and library-based methods have been developed to identify protease active site specificities. These include substrate phage display (5) and bacterial substrate display (6), whereby bacteriophages or bacteria express a chimeric cell surface protein containing a peptide of random sequence and an affinity tag. Proteolysis enables selection based on the affinity tag, and cleavable sequences are determined via DNA sequencing. However, these approaches do not provide the exact cleavage site in the random sequence; for this, a sec-ond step is required. Similarly, peptide libraries and microarrays have been used. For microarrays, arrayed peptide libraries are incubated with a test protease and cleavage is detected via methods such as loss of fluorophore binding or the removal of a fluorescent quencher (7)(8)(9)(10)(11)(12). Library-based approaches are similar except that peptide mixtures are typically sequenced via Edman degradation or mass spectrometry (MS). One example is mixture-based oriented peptide libraries, which was the first approach used successfully to sequence the prime-side residues of the cleavage site in a library (13). The prime-side cleavage motif (sequence C-terminal to the cleavage site) is determined by proteolysis of a library of N-terminally acetylated dodecamers sequenced via Edman degradation. Subsequently, a second library containing this predetermined prime-side sequence, a random unblocked N terminus, and a C-terminal biotin tag is generated and a second incubation with the protease is performed. Undigested peptides and C-terminal fragments are removed by means of avidin capture, and a second round of Edman degradation determines nonprime-side specificity. In view of the multiple time-consuming steps involved in generating custom second libraries in this otherwise very successful approach, new approaches have been sought to rapidly determine the prime-side and nonprime-side sequences in combination. Proteomic identification of protease cleavage sites (PICS) is one such approach (14). PICS employs a diverse, biologically relevant, and database-searchable peptide library generated from a cellular proteome using trypsin or Glu-C (14,15). Primary amines (N-terminal ␣-amines and lysine -amines) are blocked, and this forms the library. A test protease is added, and the new terminal ␣-amines generated by proteolysis are selectively biotinylated and affinity purified. Purified peptides are sequenced via liquid-chromatography tandem mass spectrometry (LC-MS/MS) to determine primeside cleavage motifs, whereas sequences N-terminal to cleavage sites are extracted bioinformatically. This can be done because the peptide library is accessible to conventional proteomics bioinformatics, whereas randomized synthetic peptide libraries are not. Thus, PICS enables the determination of both prime and nonprime cleavage site residues in the same experiment and so has the advantage of being very rapid.
These peptide library-based techniques have been used to elucidate the cleavage site specificity of many proteases from all catalytic classes. However, a significant limitation is that they depend solely on amino acid sequences of relatively short peptides. Contributions of exosites and protein folding to cleavage site specificity cannot be observed, and as for all techniques that determine only the active site specificity, relevant in vivo protease substrates cannot be reliably identified solely from a cleavage site.
Proteomics Methods for Identifying Substrates-Several proteomics methods have been developed to identify protease substrates. These include both two-dimensional polyacrylamide gel electrophoresis (PAGE)-and LC-MS/MS-based techniques. Following two-dimensional PAGE, stained spots are excised and identified via MS. Substrates are identified by a reduction in spot intensity of the intact protein and the appearance of spots corresponding to cleavage products (16 -18). However, two-dimensional PAGE is restricted in terms of reproducibility and sensitivity, and it cannot be applied to small cleavage fragments or those differing by only a few residues.
LC-MS/MS now provides vast improvements in throughput and proteome coverage. Shotgun proteomics has been used to identify substrates, including those whose localization has been altered by membrane shedding (19 -22). This is done by comparing the secretomes of protease-treated cells to those of control cells. However, this approach cannot be used to determine the actual cleavage site. Nonetheless, these early labeling approaches utilizing isotope-coded affinity tags and isobaric tags for relative and absolute quantitation (iTRAQ) were very successful in easily identifying hundreds of biologically relevant substrates in the cellular context. Whereas these approaches are designed to determine substrates from complex proteomes, amino-terminal-oriented mass spectrometry of substrates is designed to identify multiple cleavage sites in proteins in vitro (23,24). Amino-terminal-oriented MS involves incubation of a purified substrate with a protease followed by dimethylation of the original and neo-N termini at the whole protein level. Subsequent trypsin digestion generates dimethylated semi-tryptic peptides containing the original N and C termini, as well as neo-N-terminal peptides representing cleavage sites that are identified by the dimethylated termini and their position in the protein sequence. The dimethylated cleavage sites are readily distinguished from the tryptic peptides, which contain a free primary amine at their N terminus. The protein topography and migration analysis platform uses one-dimensional SDS-PAGE in combination with LC-MS/MS to identify cleavage events by peptide mapping (25). Here, proteins from protease-treated and control samples are resolved via one-dimensional SDS-PAGE, and each lane is cut into a number of gel slices, trypsinized, and analyzed via LC-MS/MS. Peptographs representing protein sequence coverage versus SDS migration identify proteolysis products based on shifts from higher to lower molecular weight species. This represents a development of an earlier study using gel slice analysis of isotopically labeled samples separated on one-dimensional SDS-PAGE (26), but with the advantage of employing visually useful software for analysis. Like most gel-based approaches, this is very mass spectrometry intensive, and only occasionally is the exact cleavage site also directly identified. Recently, secretome protein enrichment with click sugars was developed (27). This approach involves the metabolic labeling of N-and O-linked glycans, followed by a click reaction resulting in their biotinylation. Secreted proteins and shed extracellular membrane proteins are purified from contaminating serum proteins by means of avidin capture followed by in-gel digestion and LC-MS/MS. Secretome protein enrichment is particularly useful for cell culture experiments in which the cells have fastidious growth requirements and require serum. The serum glycoproteins are not metabolically labeled, and this enables simplification of the proteomic sample before analysis by separating the cellderived metabolically labeled proteins from the glycoproteins in serum. However, only glycoprotein substrates can be identified.
The above techniques have been useful in substrate identification. However, as the vast majority of identified peptides are internal, except with the amino-terminal-oriented MS procedure, precise cleavage sites are rarely determined, especially for proteins identified with low sequence coverage.

Methods for Enriching Protein N Termini-N-terminal
PTMs, including proteolytic processing, can greatly influence the localization and activity of many proteins (28). For example, N-terminal acetylation (N-Ac) plays important roles in protein function, localization, and stability (29), and N-terminal methylation regulates protein-protein interactions (30). Thus, characterizing protein N termini not only identifies protease cleavage sites, but also is important in determining the functional physiochemical properties of a proteome. Methods employing both positive and negative selection of N-terminal peptides were developed (Table I) following the early recognition that "keeping it simple" approaches aiming to identify rare semi-tryptic terminal peptides within a complex mixture of tryptic peptides without enrichment will not lead to proteome-wide coverage and so will miss most cleavage sites. This is especially relevant for low-abundance but biologically interesting proteins such as cytokines. Thus, a variety of terminal peptide enrichment strategies have been developed to improve both the coverage and the dynamic range of terminal peptide identifications. Enrichment of N Termini by Positive Selection-Several methods exist for enriching N-terminal peptides by positive selection. Typically -amines are blocked and ␣-amines are tagged (i.e. with biotin), with tagging followed by a secondary proteolysis step and finally enrichment and elution of N-terminal peptides for analysis via LC-MS/MS. Using these methods, unmodified N-terminal and neo-N-terminal peptides can be effectively purified, but modified N termini cannot. This is quite limiting, as ϳ85% of soluble proteins are N-terminally acetylated in eukaryotic cells (31). Another challenge is discriminating between ␣-amines at N termini and -amines on lysine residues. This has been termed the lysine problem: when both are blocked, several N termini are excluded from analysis, but failure to block -amines results in contamination by the abundant internal peptides in the sample.
Mahrus et al. developed the most useful such approach with an elegant method using an engineered subtiligase to selectively label unblocked ␣-amines with a biotinylated peptide ester substrate, with labeling followed by trypsinization, avidin capture, and LC-MS/MS (32, 33) (Fig. 1A). The peptide ester substrate contains a virus cleavage site enabling the recovery of enriched peptides. However, up to 50 to 100 mg of protein is required for each sample analysis, which can be very limiting. In a clever use of Edman chemistry, Xu et al. used chemical labeling of the ␣-amine of proteins (N-CLAP) using phenyl isothiocyanate to block all primary amines (34). Similar to Edman degradation, treatment with trifluoroacetic acid triggers cyclization of phenyl-isothiocyanate-modified ␣-amines specifically, resulting in peptide bond cleavage after the first amino acid. ␣-amines are then biotinylated, with biotinylation followed by trypsinization, avidin capture, elution of N termini via reduction, and LC-MS/MS. Enrichment of N Termini by Negative Selection-Several methods exist for enriching protein N-terminal peptides by negative selection. Common to each, N-terminal ␣-amines and -amines are blocked at the protein level, and blocking is followed by trypsinization, which exposes ␣-amines of internal peptides. These unblocked ␣-amines are used to deplete internal peptides from the sample, enabling enrichment of both modified and unmodified protein N termini. This facilitates higher proteome coverage than positive selection techniques and is particularly useful for intracellular proteomes, where most N termini are acetylated. The utility of purifying the natural N terminus (whether naturally or chemically blocked) has many advantages; in particular, it enables up to 50% of the identified proteins to be identified from two or more peptides (i.e. the original N terminus and the internal cleaved neo-N-terminal peptide). This greatly increases the confidence scores in protein substrate identification relative to those for proteins identified from just one neo-terminal peptide. However, in contrast to positive selection methods in which tags are used as handles to facilitate purification and concentration, negative selection does not allow for highly selective washing, cleanup, and concentration of terminal peptides during the enrichment step(s), other than by conventional peptide concentration (e.g. precipitation or evaporation), which can result in sample losses and dirtier samples for LC-MS/MS.
The most widely reported negative selection methods are combined fractional diagonal chromatography (COFRADIC) and terminal amine isotope labeling of substrates (TAILS). During COFRADIC, ␣and -amines are blocked by acetylation, with subsequent proteolysis, pre-enrichment of N-terminal peptides via strong cation exchange chromatography, and fractionation via reverse-phase liquid chromatography (35, 36) (Fig. 1B). ␣-amines are treated with 2,4,6-trinitrobenzenesulfonic acid to form hydrophobic trinitrophenyl-peptides that are separated from N-terminal peptides by an additional round of reverse-phase liquid chromatography. Enzymatic removal of pyroglutamyl peptides has also been employed (36). One advantage of COFRADIC over several other techniques is that all the required materials are commercially available and relatively inexpensive. However, a disadvantage is that it utilizes extensive fractionation steps, providing several opportunities for samples loss and making it very instrument intensive, with up to 100 LC-MS/MS runs per sample.
During TAILS, ␣and -amines are blocked by dimethylation or iTRAQ labeling, proteins are subjected to trypsinization, and a commercially available water-soluble hyperbranched polyglycerol aldehyde polymer for proteomics is added to covalently bind the internal tryptic peptide ␣-amines through reductive amination (37,38) (Fig. 1C). The polymer provides a large contact area with peptides in solution enabling a very efficient reaction and very low nonspecific binding. Its large size (Ͼ10 kDa) allows the depletion of internal peptides via filtration. Roche has adopted TAILS and reported a refinement whereby the polymer mixture is applied directly to the precolumn (39). The unbound N-terminal peptides enter the mass spectrometer directly, minimizing handling and consequent losses. Recently, Mommen et al. developed a technique very similar to TAILS, except that instead of the hyperbranched polyglycerol aldehyde polymer, internal ␣-amines are phosphorylated via treatment with glyceraldehyde-3phosphate and depleted via titanium dioxide chromatography (40).
Both TAILS and COFRADIC block -amines to enable the retention of lysine-containing N-terminal peptides, which has an added advantage of introducing an isotope label for those N-terminally blocked peptides that otherwise would go unlabeled. Zhang et al. developed a method for the specific enrichment of modified N-terminal peptides that does not re-quire this blocking procedure (41,42). Following proteolysis, CNBr-activated Sepharose, which is specific to ␣-amines at pH 6, is added to deplete internal peptides. However, Sepharose beads retain peptides nonspecifically, and isoelectric point discrimination of the ␣and -amines is rarely quantitative, so this can reduce purity and N-terminal peptide yields.
Several additional techniques have been developed for the negative selection of N termini (43)(44)(45)(46)(47)(48)(49)(50). However, as they have not yet been applied or proven to work in large-scale proteomics workflows, they are not discussed further here.
Methods for Enriching Protein C Termini-Similar to N termini, C-terminal PTMs can also regulate protein function (51). Examples include chemokine and hormone processing, as well as modifications such as prenylation that localize proteins to lipid membranes (52,53). Methods for enriching protein C termini have lagged behind those for N termini, largely due to a lack of methods with which to selectively modify carboxyl groups in aqueous solution. Thus, it is likely that many important C-terminal modifications and processing events exist but remain unknown or poorly characterized because of our current inability to enrich and identify them.
Two methods currently exist for proteomic analysis of protein C termini: C-terminal amine-based isotope labeling of substrates (C-TAILS), and COFRADIC combined with strong cation exchange chromatography (54,55) (Table I). During C-TAILS, proteins are dimethylated at ␣and -amines (Fig.  1D). Carboxyl groups are protected with ethanolamine and then trypsinized to generate free N and C termini on internal peptides. Newly generated ␣-amines are blocked by a second dimethylation step, and newly generated carboxyl groups are removed by means of covalent coupling to a high-molecularweight polyallylamine polymer. Like in TAILS, original blocked C termini are unbound and recovered via filtration. One advantage of C-TAILS is that the C-terminal label allows for validation of original versus neo-C termini during data analysis. In the COFRADIC method, ␣and -amines are blocked by acetylation prior to proteolysis (Fig. 1B). Tryptic peptides are passed over a strong cation exchange column at pH 3, where N-and C-terminal peptides are collected in the flowthrough. Free ␣-amines of C-terminal peptides are butyrylated to increase their hydrophobicity, enabling their separation from N-terminal peptides via reverse-phase liquid chromatography. Sechi and Chait developed an additional method based on the binding of anhydrotrypsin to ␣-amines (56), but this has not yet been employed on a large scale.
Quantitation-Several quantitative methods have been applied to the techniques described above to discriminate between background proteolysis events that are present in every sample, whether in vivo or in post-sampling handling, and those induced by a specific condition (Table I). In techniques without quantification, this cannot be achieved, and so the data reflect proteolytic events of interest as well as the background, which makes data interpretation difficult. Typically, differentially labeled proteomes are used representing protease-treated or -related condition(s), with one proteome serving as a control (i.e. protease-null) ( Fig. 2A). Equal amounts of the samples are mixed and ultimately analyzed via LC-MS/MS. Neo-N-or C-terminal peptides generated by proteolytic processing events appear with either high or low ratios, whereas peptides unaltered by the treatment condition(s) appear with ratios centered on 1.0.
Specific quantitation methods include MS1-based stableisotope dimethyl labeling and stable isotope labeling by amino acids in cell culture, and MS2-based iTRAQ and tandem mass tags (57)(58)(59)(60). Specific to N-terminomics, it is important to note that the chemical labeling approaches mentioned above tar-get primary amines. Thus, there are instances when ␣-amines and -amines are otherwise modified (i.e. N-Ac), rendering these peptides unquantifiable (Fig. 2B). Although SILAC offers the advantage of complete quantitation, it is not compatible with all sample types and is impossible to use in human tissues. Label-free approaches have been used to a lesser extent, but they generally suffer from poor accuracy (25,27,61). They are also very instrument intensive when done properly to generate the numbers of spectra needed for reliable and statistically significant quantification. Thus, improperly performed spectral counting can be easy, but the resulting data will be misleading. Furthermore, like in analyses of many other PTMs including phosphoproteomics, N-and C-terminomics samples are expected to identify proteins based on a single peptide. MS1-based methods acquire quantitation information from several spectra collected across a chromatographic peak as a measure of technical variance. MS2-based quantitation relies on the generation of higher order scans, FIG. 2. Quantitation and terminomics data. A, flow diagram representing quantitation of terminal peptides between a control and protease-treated condition. B, schematic showing quantifiable N-terminal peptides following stable-isotope labeling methods targeting primary amines. X represents any amino acid except lysine; red diamonds represent an amine modification such as acetylation. and so for single peptide identifications the result is often quantitation based on a single spectrum. Given the possibility of isotope interference if careful MS acquisition methods are not employed (62,63), the opportunity for unreliable quantitation is somewhat higher when using MS2-based approaches.
Data Acquisition and Analysis-Because each protein is often represented by a single or small number of terminal peptides, terminomics methods reduce sample complexity. However, this also presents challenges to complete proteome coverage, as not all N-and C-terminal peptides are amenable to LC-MS/MS. Many will have suboptimal lengths or poor physiochemical properties for fragmentation and/or ionization. Several groups employ dimethylation or acetylation of ␣and -amines prior to digestion to increase the lengths of Nand C-terminal peptides (35,37,40,47,49). This results in trypsin cleaving with ArgC specificity. However, if peptide masses between 600 and 4000 Da are considered suitable for MS/MS, only 63.4% and 62.9% of peptides from a theoretical ArgC digested human proteome are identifiable by N-and C-terminal peptides, respectively (55). If other properties such as hydrophobicity are considered, these numbers will undoubtedly drop. Also, similar to tryptic peptides with missed cleavages, ArgC-generated peptides are typically long and highly charged. Higher energy collision-induced dissociation and electron-transfer dissociation are more effective fragmentation methods for these peptides (64). Thus, acquisition methods employing collision-induced dissociation on smaller low-charge peptides and electron-transfer dissociation or higher energy collision-induced dissociation on longer highcharge peptides may prove beneficial for terminomics. Also, following dimethylation, the positive charge on ␣and -amines is retained, whereas during acetylation it is lost. This likely offers advantages in maintaining protein solubility and improving the ionization efficiency of these peptides. To increase the percentage of MS-amenable peptides, several groups employ Glu-C and chymotrypsin digests in parallel to trypsinization (37,41). However, especially in combination with several N-terminal modifications (i.e. N-Ac), these peptides are expected to carry only a single positive charge, rendering them less amenable to conventional acquisition methods that exclude singly charged ions as contaminants. Including singly charged ions for fragmentation has been shown to significantly increase the number of N-Ac peptides identified, particularly in the low mass range (42).
Several software tools can be used to analyze terminomics data. These include search engines such as Mascot and X! Tandem and analysis suites such as MaxQuant, Proteome Discoverer, and the Trans-Proteomic Pipeline (65)(66)(67)(68). However, these applications are universally designed to analyze shotgun datasets containing tryptic peptides. N-and C-terminal peptides exhibit semi-enzyme specificities and thus increase the search space considerably. Also, chemical labeling for quantitation is typically achieved at the protein level for terminomics, meaning that ␣-amines of internal peptides are unlabeled. Current software suites such as MaxQuant and Proteome Discoverer assume labeling at the peptide level, and as a result, peptides with blocked and labeled N termini cannot be quantified within the same analysis. Currently, CLIPPER is the only software application designed specifically for N-terminomics data (69,70). Specific for MS2-based quantitation, CLIPPER is an add-on to the Trans-Proteomic Pipeline and generates quantitation confidence and isoform assignment scores as well as automated annotation of N-terminal peptides to determine their position within a protein sequence.
Databases-Several efforts are being made to curate emerging high-throughput datasets into online resources (Table II). Created in 1999, the MEROPS database is the gold standard for protease classification and stores information about proteases from all species, as well as their substrates, cleavage sites, and inhibitors (71). However, MEROPS does not easily support meta-analyses such as comparing sub- TopFIND Public knowledgebase for protein termini and protease processing Ͼ120,000 N and C termini and 10,000 cleavage sites TOPPR High-quality processed events available in an easy and intuitive analysis platform 2234 substrates, 18 studied treatments or peptidases, and 27,147 cleavage sites (75) DegraBase Non-biased description of all possible caspase substrates found in healthy and apoptotic human cells strates or protease specificities. Community-based resources such as CutDB rely on users to input data (72). Each entry contains a proteolytic event relating a protease, a substrate, and a cleavage site and a description of the biological context. CutDB is part of the larger PMAP website, which also contains protease and substrate databases and tools for determining protease specificities and mapping signaling pathways (73). The termini-oriented protein function inferred database (TopFIND) is also open to user contributions (74). TopFIND integrates data from UniprotKB, MEROPS, and experimental terminomics studies from humans, mice, bacteria, and yeast, focusing on both translated and neo protein termini, as well as upstream proteases and PTMs. TopFIND provides information on substrate structure, topology, and interaction networks and contains several filtering tools for data manipulation. One limitation of the abovementioned repositories is that they do not provide easy access to original data. The Online Protein Processing Resource (TOPPR) houses a database of proteolysis sites from human and mouse, as well as links to Mascot scores and MS/MS spectra (75). It also houses a meta-analysis platform enabling filtering and analyses of individual substrates and protease specificities. However, TOPPR only accepts N-and C-terminal CO-FRADIC data from the Gevaert lab. A similar database called DegraBase, from the Wells lab (76), houses data generated via the subtiligase-based enrichment method applied to normal and apoptotic cell lines and also provides links to original MS data. Two cell-death-specific databases also exist: CASBAH (77), which contains all reported mammalian caspase substrates with links to UniprotKB, cleavage sites, and references, and Cell Death Proteomics database (CDP) (78), which houses proteomics data from 73 cell death studies in human, mouse, and rat. Applications-Over the past decade, degradomics and terminomics have emerged as significant proteomics subfields within the broader arena of PTM analysis. Thus, it is considerably beyond the scope of this review to cover all applications of the methods described above. Instead, we focus here on two exciting new avenues: application of terminomicsbased techniques to tissues, and further characterization of protein N termini.
Until very recently, the vast majority of terminomics data were derived from in vitro and cell-culture-based systems. These have provided valuable insights into cleavage events and other PTMs at protein termini, but whether these observations hold true in a complex tissue environment has remained elusive. In recent studies from auf dem Keller et al. and Tholen et al., protein N termini were isolated from murine skin (79,80). In the former, skin of wild-type (WT) and Mmp2 Ϫ/Ϫ mice was treated with 12-O-tetradecanoylphorbol 13-acetate (TPA) to induce inflammation. 4-plex iTRAQ analysis compared WT and Mmp2 Ϫ/Ϫ mice Ϯ TPA, and peptides extracted from murine skin were analyzed before and after enrichment of N termini. Global analyses showed increased abundance of several inflammatory proteins following TPA treatment and reduced exudation of acute-phase proteins from TPA-treated Mmp2 Ϫ/Ϫ mice. Due to protease-protease and protease-inhibitor interactions that modify proteolytic activity in vivo, it is extremely difficult to directly assign in vivo detected substrates to a protease, even with a knockout. Here, direct MMP2 targets were discerned in vivo by establishing the following criteria: a neo-N terminus increased in TPA-treated versus untreated WT and also TPA-treated WT versus Mmp2 Ϫ/Ϫ mice, while an original N terminus was unchanged between TPA-treated WT and Mmp2 Ϫ/Ϫ mice. Unaltered ratios of the intact mature N-terminal peptide reflect unchanged protein abundance. Otherwise, altered synthesis or import of proteins in the exudate can be revealed as apparent increases in neo-N termini, when in fact these might have been due to steady-state turnover, but with increased (or decreased) synthesis or import. Using this approach, researchers identified an inactivating MMP2 cleavage site within C1 inhibitor and unveiled novel roles for MMP2 in regulating vascular permeability and complement activation. Reduced MMP2 cleavage of C1 inhibitor led to reduced complement activation and a lessening in the normal increase in vascular permeability due to reduced bradykinin excision and release (78).
In another study, 1191 skin proteins from WT versus Ctsb Ϫ/Ϫ and 1317 proteins from WT versus Ctsl Ϫ/Ϫ were identified via whole proteome analysis, with 15 and 32 proteins differing significantly in abundance between WT and Ctsb Ϫ/Ϫ or Ctsl Ϫ/Ϫ mice, respectively. The authors inferred direct Cstl and Ctsb substrates by comparing the sequences surrounding these cleavage sites to their previously established specificities. This revealed that the majority of cleavages stem from the altered activity of proteases other than Ctsl or Ctsb. Interestingly, periostin, which is implicated in skin physiology and cancer, increased in Ctsl Ϫ/Ϫ but not Ctsb Ϫ/Ϫ skin, and a Ctsb-dependent cleavage site was identified in dermokine, a known marker for colorectal cancer.
Terminomics data from humans, mice, and bacteria reveal that Ͼ30% of all N termini do not originate from classical protein maturation events involving the removal of signal and pro-peptides and the initiator methionine (74,79). Several of these likely represent stable cleavage products, while others arise from alternative translation initiation sites (TIS), which exist for Ͼ65% of murine proteins and primarily drive translation of upstream open reading frames (uORFs) (81). While the UniProtKB and Ensembl databases do not contain these sequences, a method termed RIBO-seq involves deep sequencing of ribosome-protected mRNA fragments to determine in vivo translation products (82). Menschaert et al. created a novel database combining 16,570 protein sequences from UniProtKB with 7785 from RIBO-seq data (83). Lysates from mouse embryonic stem cells were analyzed via conventional shotgun proteomics and following enrichment of N termini. Matching shotgun data to the combined database iden-tified 3% and 5% more peptides and proteins, respectively, while matching N termini identified 1835 TIS. 84% map to canonical TIS, 14% start beyond position two, indicating alternative or wrongly annotated TIS, 16 correspond to N-terminal extensions, and 4 correspond to uORFs. Interestingly, the majority of TIS identified for N-terminal extensions and uORFs contain near cognate start codons.
According to TopFIND, 12 PTMs occur at N termini (74). By far the most extensive is N-Ac, whereby the ␣-amine of the initiator methionine (iMet)-or the second residue if iMet is removed-is acetylated. The function of N-Ac is uncertain, but it has been reported to regulate protein function and localization and to be both protein stabilizing and destabilizing (29). Five N-acetyltransferase (NAT) complexes exist in eukaryotes, NatA-NatE, the majority of which have defined sequence specificities. Interestingly, N-Ac is much more prominent in higher eukaryotes, suggesting that it may contribute to their complexity. Van Damme et al. compared N-Ac levels between yeast and humans (84). 648 yeast and 1345 human N-Ac sites were identified within 868 and 1497 N termini. Several dipeptide sequences were preferentially acetylated in humans, including Met-Lys, which is not known to be acetylated by the existing NATs. A novel NAT, NatF, was identified with close homologs in higher eukaryotes but not in yeast. Library-based cleavage assays identified unique sequence specificity for NatF that includes Met-Lys termini. When NatF was expressed in yeast, N-Ac levels increased significantly, particularly at Met-Lys sites, indicating that it accounts for the increased N-Ac observed in higher eukaryotes.
We compared the amino acid specificity of iMet removal with N-acetylation specificity preferences. Interestingly, the residue at position two that is important in defining the preference for iMet removal matches very closely the in vivo acetylation preference found in published datasets (79). Thus Met removal is preferred for the sequences commencing MA, MG, MS, MT, and MP, and N-terminal A, G, S, and T are the preferred residues for N-Ac. That is, when Met is removed, the exposed residue at position two is also preferred for acetylation. The exception is proline. Acetylation of N termini blocks aminopeptidase activity to protect the N terminus from ragging. Pro is a special case and is resistant to most aminopeptidases. Thus, one could view N-Ac as a sequential system involving Met removal and subsequent acetylation to protect chains from aminopeptidase activity.

CONCLUSIONS
Within the past decade, terminomics methods designed to enrich for N-and C-terminal peptides have emerged as the gold standard for identifying protease substrates and cleavage sites and for characterizing protein N and C termini. These methods enable thousands of termini to be identified both in vitro and in vivo, and more recently in complex tissues. However, as these approaches are inherently restricted to single peptide identifications, complete proteome coverage is an extremely challenging, if not impossible, task. Furthermore, several tantalizing questions currently wait to be answered by degradomics methods. While the general focus appears to be the large-scale identification of cleavage sites and PTMs, Agard et al. recently employed selected reaction monitoring to assay cleavage kinetics of hundreds of caspase substrates in cell culture (61,85). Absolute quantitation by selected reaction monitoring offers the potential to assay kinetics as well as the stoichiometry of cleavage events between hundreds of tissues and disease states. This includes tracking specific substrates as biomarkers in various disease models and, similarly, utilizing propeptide removal to measure protease activity. Such targeted analyses and challenging in vivo studies herald a bright future for terminomics in elucidating novel insights into new proteolytic pathways in vivo and hence new drug targets for disease.