The DegraBase: A Database of Proteolysis in Healthy and Apoptotic Human Cells*

Proteolysis is a critical post-translational modification for regulation of cellular processes. Our lab has previously developed a technique for specifically labeling unmodified protein N termini, the α-aminome, using the engineered enzyme, subtiligase. Here we present a database, called the DegraBase (http://wellslab.ucsf.edu/degrabase/), which compiles 8090 unique N termini from 3206 proteins directly identified in subtiligase-based positive enrichment mass spectrometry experiments in healthy and apoptotic human cell lines. We include both previously published and unpublished data in our analysis, resulting in a total of 2144 unique α-amines identified in healthy cells, and 6990 in cells undergoing apoptosis. The N termini derive from three general categories of proteolysis with respect to cleavage location and functional role: translational N-terminal methionine processing (∼10% of total proteolysis), sites close to the translational N terminus that likely represent removal of transit or signal peptides (∼25% of total), and finally, other endoproteolytic cuts (∼65% of total). Induction of apoptosis causes relatively little change in the first two proteolytic categories, but dramatic changes are seen in endoproteolysis. For example, we observed 1706 putative apoptotic caspase cuts, more than double the total annotated sites in the CASBAH and MEROPS databases. In the endoproteolysis category, there are a total of nearly 3000 noncaspase nontryptic cleavages that are not currently reported in the MEROPS database. These studies significantly increase the annotation for all categories of proteolysis in human cells and allow public access for investigators to explore interesting proteolytic events in healthy and apoptotic human cells.

Annotation of the human ␣-aminome, the full set of unmodified protein N termini, can provide a wealth of information regarding protein turnover, protein trafficking, and protease activity (1). The vast majority of protein N termini in eukaryotic cells are cotranslationally blocked by acetylation through the action of N-acetyl transferases (2). Free ␣-amines occur on some proteins that are never N-terminally acetylated, and can also be regenerated by signal or transit peptide removal during protein trafficking, and endo-or exoproteolysis during protein maturation and signaling. Thus, there has been considerable effort to develop unbiased proteomic methods to characterize the ␣-aminome in healthy and diseased states (3)(4)(5)(6)(7)(8).
We have developed a positive enrichment method in which the ␣-amines of intracellular (8) or extracellular proteins (9) can be specifically and directly tagged and captured, without pretreatment or protection, using subtiligase, an engineered peptide ligase (Fig. 1A) (10,11). Following purification, tryptic digestion, and LC-MS/MS, the protein sequence and exact site of proteolysis are readily identified. We have applied this approach to study proteolysis by caspases, cysteine-class aspartyl specific proteases, during cellular apoptosis (8,(12)(13)(14), and inflammatory response (15). These studies, in a variety of cell types and apoptotic inducers, have revealed much about the targets, substrate recognition, timing, logic, and evolution of caspase cleavage events. These efforts have generated a huge amount of data that requires systematic compilation, organization, and normalization so that it can be shared and queried easily by all investigators and compared with other databases describing proteolytic events (16 -18).
Here we present the results of both previously published and new experiments that detect ␣-amines in both untreated and apoptotic human cells. These studies reveal new translational N-terminal processing, signal and transit peptide removal, and other proteolytic events associated with normal protein maturation and function in healthy cells. Comparing these data to the apoptotic dataset reveals that the greatest changes in apoptosis are caused by endoproteolysis, owing to the induction of caspases as well as other proteases. We find a total of 1706 putative caspase sites in nearly 1300 different human proteins. We further find an additional 2900 noncaspase, nontryptic, nontransit, and nonsignal peptide cleavage sites in 1415 proteins.
In addition to the analyses described here, we provide a publically available database, the DegraBase, that is dynamic, expandable, searchable, and readily accessible (http:// wellslab.ucsf.edu/degrabase/). With this database, investigators can query all 8090 unique ␣-amines detected with high confidence from 26,043 peptide observations in both previously published (8,12,13) and new subtiligase ␣-aminome labeling experiments. The DegraBase substantially expands annotated intracellular proteolytic events in healthy and apoptotic cells.

EXPERIMENTAL PROCEDURES
Cell Cultures-Jurkat, THP-1, DB, RPMI 8226, MM1-S and U266 human cell lines were acquired from the American Type Culture Collection (ATCC, Manassas, VA) and were cultured under the recommended conditions. When cells reached a density of 1 ϫ 10 6 cells/ml, an apoptotic inducer (doxorubicin, etoposide, bortezomib, FasL, CD95, staurosporine, or TRAIL) was added from 1000x stock (for individual experimental details, see supplemental Table 1A). Cell viability and caspase activity were monitored by CellTiter-Glo, Caspase-Glo (Promega, Madison, WI) and Ac-DEVD-AFC activity assays. Cells were harvested by centrifugation after 0 -40 h, washed with phosphate buffered saline solution, pelleted, and stored at -80°C. For untreated experiments, healthy cells cultured under the same conditions were harvested without any inducer added.
N-terminal Labeling-The lysis and N-terminal labeling were performed as described previously (8,12,13,15). For experiments not previously published, the following protocol was used. Cells were lysed in a bicine buffer with triton or SDS containing the protease inhibitors EDTA, PMSF, E-64, z-VAD-fmk, and AEBSF. Proteins were reduced with 2 mM tris(2-carboxyethyl) phosphine hydrochloride at 90°C for 15 min and alkylated once cooled with 4 mM iodoacetamide in dark for 1 h, then quenched with 10 mM dithiothreitol. Labeling was performed with 1 mM of a biotinylated synthesized peptide ester called TEVest and 1 M subtiligase for at least an hour at room temperature (11,19). There were four different TEVest peptide esters used to facilitate the identification of the labeled products; they were identical except for the small tag left after processing to aid in mass spectrometry recognition: serine-tyrosine (SY), glycine-tyrosine (GY), phenylalanine (Phe), or 2-aminobutyric acid (Abu). The biotinylated proteins were separated by gel filtration or precipitation and captured on NeutrAvidin agarose beads (Pierce, Rockford, Illinois, USA). The samples were digested with sequence grade modified trypsin (Promega, Madison, WI) before or after capture. After capture, the labeled peptides were released with recombinant TEV protease and collected. Samples were desalted by chromatography with C18 ZipTip Pipette Tips (Millipore, Billerica, MA) or C18 high-performance liquid chromatography (HPLC) (Waters, Milford, MA). Further offline strong cation exchange fractionation was performed on some samples. For further individual experimental details, see supplemental Table S1A.
Liquid Chromatograph Tandem Mass Spectrometry (LC-MS/MS) and Peptide Identification-For all experiments, samples were separated by reverse phase HPLC coupled to a mass spectrometer: QSTAR Pulsar, QSTAR XL, QSTAR Elite (Applied Biosystems, Foster City, CA), LTQ-Orbitrap XL or QExactive (Thermo Fisher Scientific, San Jose, CA). Spectra were converted into peak lists for database searching using the mascot dll in Analyst for QSTAR instruments or using an in-house script based on the Raw_Extract script in Xcalibur v2.4 (Thermo Fisher Scientific). Peptide identification was performed using Protein Prospector version 5.10.0 (20). Search parameter mass allowances were tailored for each instrument: 100 ppm precursor and 0.15 Da fragment for QSTAR instruments, 20 ppm precursor and 0.6 Da fragment for LQT-Orbitrap XL, and 20 ppm precursor and 0.8 Da for QExactive. All searches were performed with constant modification of the peptide N terminus with the appropriate TEVest tag, variable modifications of carbamidomethylation of cysteines and oxidation of methionine, and allowing for up to three missed tryptic cleavages. All datasets were searched assuming tryptic specificity at the peptide C terminus, but no cleavage specificity at the N terminus. All fractions (including re-analysis of previously published data) were searched against the human SwissProt library release 2012_03 (20,255 entries) to provide consistent accession number annotations for all data. Maximum expectation value scores for protein and peptide of 0.02 were employed as acceptance criteria. Searches against a decoy library of random and reversed protein sequences revealed an average false discovery rate (FDR) across all datasets of 0.55%.
Data Analysis-The DegraBase framework was created using File-MakerPro version 9.0, and houses three types of data: the sample, peptide and N terminus/protein tables (Fig. 1B). Experimental parameters are entered by investigators, mass spectrometry data are imported from files created by Protein Prospector, and protein-and cleavage site-specific annotation data are imported from a number of external databases including UniProtKB (21), the CASBAH (16), and MEROPS (17). Full documentation, including FileMakerPro scripts for data analysis and Perl scripts for processing of UniProtKB data before input, is available as supplemental File S1. The DegraBase also exists as an HTML-based website (http://wellslab.ucsf.edu/degrabase/) to allow for more accessible searching.
Abundance data were taken from PaxDB version 2.1 using the integrated dataset called "Weighted average of 'H. sapiens Peptide-Atlas Build May 2010Ј(weighting 50%), 'H. sapiens PeptideAtlas Build March 2009Ј,(weighting 50%)" available from the downloads tab at www.pax-db.org (22). Sequence logos were made using iceLogo with the whole human SwissProt library as background (23). All logo images were made with the percent difference scoring system, except when stated as "Filled Logos" representing amino acid frequency, not information content. Significance was determined by chi-square analyses. Data for methionine processing, mitochondrial transit peptide removal and signal peptide removal were compared with SwissProt library release 2012_03. Mitochondrial localization was determined based on the MitoCarta database (24).
Gene Ontology (GO) 1 term enrichment was determined using the GO::TermFinder software (25). A list of unique proteins for each dataset was created and uploaded to the database and tested for enrichment against the human SwissProt background using all evidence codes except ND (No biological Data available) and IEA (Inferred from Electronic Annotation. Enriched terms were defined using a corrected p value cutoff of less than 0.01. To compare terms between datasets, a pairwise chi-square test was performed using the Benjamini-Hochberg multiple testing correction procedure.

RESULTS
The DegraBase-Given the massive amount of data generated from multiple experiments under different conditions, it was necessary to create a simple and normalized database. The DegraBase is a relational database built to house our ␣-aminomics data (Fig. 1B). It is available in three formats (see supplemental Information): a FileMakerPro file (supplemental File S2), an excel document containing worksheets for each of the major tables (supplemental File S3), and a web interface (http://wellslab.ucsf.edu/degrabase/) where users may search by substrate name or accession number. Full documentation of the database is available in Supplemental File S1.  Table 1A). There are a total of 8090 unique N terminus identifications from 3206 proteins. We subdivided our data into three sets: (1) untreated, (2) apoptotic, and (3) apoptotic caspase-cleaved. In a separate study using our labeling method, we have seen that there is cell line-and drug-specific variability in the data, but most differences show up in detected abundance of cleavage product over time rather than the presence or absence (reported here) of the specific identified N termini (13). Therefore, we were comfortable pooling our multiple apoptotic experiments together to compare all proteins detected in all untreated cells tested versus those undergoing apoptosis.
The untreated dataset contains all observations from the 11 experiments performed in five different cell lines (supplemental Table S1B). This dataset has 3732 identified N termini corresponding to 2144 unique N terminus start sites from 1239 proteins. The apoptotic dataset consists of all observations from the 33 experiments using seven different chemotherapeutic inducers in five cell lines (supplemental Table S1C). This A.

Apoptotic Caspase Dataset
Apoptosis Induction Untreated Apoptotic , For all experiments, human cells were grown under standard conditions, either with or without treatment with apoptosis inducing agents. Cells are lysed and proteins biotinylated on their free ␣-amines using subtiligase, followed by purification and identification by LC-MS/MS. N termini identifications from every experiment were entered into the database to create the untreated and apoptotic datasets, and a subset apoptotic caspase-cleaved dataset for apoptotic N termini following aspartic acid cleavage. (B), The DegraBase database is structured around four main tables linking the experimental data to the MS identifications and external database information at both the N terminus and protein level (for more details see Supplemental File S1). (C), Summary statistics of the DegraBase for all experiments in the DegraBase and for both the untreated and apoptotic datasets (more details in Supplemental Table S1A). The blue box highlights the apoptotic caspase-cleaved dataset within the apoptotic dataset. generated a total of 22311 independent peptide identifications, corresponding to 6990 unique N terminus sites from 3020 different proteins. This reflects the dramatic activation of caspases following the induction of apoptosis of our samples, also observed with caspase activity and cell death assays (data not shown). We defined the third dataset, the apoptotic caspase-cleaved dataset, as a subset of the apoptotic dataset that includes all apoptotic aspartic acidcleaved N termini (supplemental Table S1D). This dataset includes 1706 unique N termini from 1268 proteins, and in combination with our previous studies, MEROPS and CASBAH, increases the number of published human caspase-cleavage events to over 2200. The apoptotic dataset contains 1706 aspartate cleaved peptides compared with the 140 seen in the untreated dataset, reflecting a dramatic induction of caspase activity.
To estimate to what degree the ␣-aminome MS data are biased by protein abundance in cells, we compared the datasets to PaxDB (22), a database that provides an independent estimate of relative protein abundance based on MS spectral counting data. All three ␣-aminome datasets cover more than six orders of magnitude of ppm (supplemental Fig.  S1A-S1C). Only for the small set of low abundance proteins did our ␣-aminome identification tail off, which presumably reflects the limits of detection of the methodology. There is a slight enrichment for higher abundance proteins overall (supplemental Fig. S1D).
At the protein level, there is a large overlap between the untreated and apoptotic datasets; 1053 of the 1239 proteins (85%) from untreated cells were also found in the apoptotic dataset ( Fig. 2A). In contrast, we observed a smaller overlap between datasets when considering the particular N termini within each protein (Fig. 2B); only 1328 of the 2144 untreated N termini (62%) were labeled under apoptotic conditions. There is a small set of 361 proteins, but only 129 N termini, that overlap between the untreated and apoptotic caspasecleaved datasets. The presence of caspase-cleaved products in healthy cells likely reflects low levels of apoptosis that occurs in any healthy cell population, and make up a very small portion of the total untreated set. The protein overlap may represent apoptotic caspase substrates that also undergo endoproteolysis in healthy cells by noncaspases. Interestingly, many of the proteolytic substrates in healthy cells are cleaved at different positions upon induction of apoptosis.
To compare the functional properties of the different datasets, we performed Gene Ontology (GO) term enrichment using GO::TermFinder (supplemental Table S2) (25). We looked at the terms unique to each dataset to identify specific process, function or component annotations related to healthy or dying cellular states. The untreated dataset was enriched in terms related to homeostatic functions like metabolic and biosynthetic processes (specifically related to ribosomal, coenzyme, amino acids and fatty acids, NADH dehydrogenase, and isomerase functions), the mitochondrial proton-transporting ATP synthase complex, and organelle envelope lumen (prominently related to the endoplasmic reticulum). In the apoptotic set, we compared the significant terms from caspase substrates to the noncaspase apoptotic terms, and to the terms unique to the apoptotic set only. The caspase substrates are enriched in the regulation of transcription, and there were many terms related to cell morphogenesis, specifically chromosome and microtubule structure, which are known to change and break down during apoptosis. The noncaspase apoptotic enriched terms in process, function and component ontologies relate to chromatin assembly (especially DNA binding, vesicle coating, and targeting), signal transduction involved in DNA damage and cell cycle checkpoints, and nucleotide catabolic processes. We also saw enrichment in the non-caspase apoptotic set for proteins associated with terms for proteolysis and cell death.
We next analyzed the precise sequences surrounding the N termini identified in each dataset. We used iceLogo (23) to visualize the sequence specificity for cleavage events for each dataset using the human SwissProt database to establish background amino acid frequencies (Fig. 3). The cleavage sites are presented in the standard Schechter-Berger form, with the scissile bond between the P1 residue and the P1Ј residue (26). All three logos show a strong preference for small amino acids (glycine, serine, or alanine) at the P1Ј position, but significant differences at the P1 position. In healthy cells, there is enrichment for cleavage sites following lysine, arginine, and methionine. The methionine cleavages mainly represent N-terminal methionine processing. The large number of cuts following basic residues is consistent with a high activity of trypsin-like enzymes in both healthy and apoptotic cells. In apoptotic cells this tryptic-like activity is overshadowed by the large number of caspase cleavages following aspartic acid residues. The apoptotic caspase-cleaved dataset shows a degenerate specificity with moderate enrichment for aspartic acid-glutamic acid-valine in the P4-P2 positions, matching the classic "DEVD" substrate preference for executioner caspases-3 and -7, the signature proteases of apoptosis (17).
Three Categories of Proteolysis: Translational N Terminus Processing, Signal/Transit Peptide Removal, and Endoproteolysis-In the global analysis presented above, we have discussed all the proteolytic events in healthy and apoptotic cells without distinguishing between the different kinds of proteolytic processing known to occur in cells. We now look more closely at three important areas of proteolytic processing: (1) processing around the methionine at the translational N terminus (N termini labeled at residues 1 and 2), (2) cleavage of possible secretory or transit peptides during organelle trafficking (labeled at residues 3-65), and (3) other endoproteolytic events (labeled at residues 66ϩ) (Fig. 4). We chose to define possible signal or transit peptides within residues 3-65 based on patterns from previously published datasets (27,28) and from our own data (see below). Subdividing each dataset into these different groups, we see that the majority of cuts results from endoproteolysis (55-80%), then putative signal or transit peptide removal (20 -35%), and finally processing around the initiator methionine (Ͻ10%).
Initiator Methionine Processing-Eukaryotic proteins are typically cotranslationally acetylated, rendering the translational N terminus inaccessible to the subtiligase labeling technique (29,30). Recent work suggests that this acetylation is largely irreversible (31). However, there are some proteins, both with and without initiator methionine removal, that do not undergo cotranslational acetylation; these have translational N termini that are accessible to labeling. In the full database (both untreated and apoptotic samples), we observed labeling of the initiator methionine in 154 proteins (12% of identified proteins) (supplemental Table S3A), and labeling of the second residue, indicating methionine removal, in 198 proteins (15% of identified proteins) (supplemental Table S3B). It is noteworthy that the majority of these translational N termini are not yet annotated in SwissProt (supplemental Fig. S2).
The sequence logo for proteins in the untreated dataset retaining the initiator methionine suggests enrichment for a hydrophobic amino acid at the second residue (Fig. 5A), whereas proteins with the initiator methionine removed had enrichment for small amino acids at the second residue (Fig.  5B). This is largely consistent with previous studies showing that methionine removal is most efficient for proteins with small amino acids following the initiator methionine (32). Almost identical patterns were seen for the proteins in the apoptotic set (Figs. 5C-5D). Out of 182 methionine processing events seen in healthy cells, only 35 were not found in the apoptotic set, suggesting there is little change in the translational N-terminal processing events during apoptosis.
Mitochondrial Transit Peptide Removal-Proteins expressed from nuclear genes but destined for the mitochondria generally contain positively charged N-terminal regions that direct them to the mitochondrial import machinery (33). Once inside the mitochondria, these mitochondrial transit peptides (mTPs) are removed by the mitochondrial processing peptidase, and in some cases the truncated proteins are further processed by other proteases that remove one or a few additional residues (27). Although the mitochondrial proteome has been well characterized, data on the precise location of mTP cleavage sites remains minimal. Moreover, sequence specificities of the proteases involved are only partially understood (24,28,34). Considering only the untreated dataset to avoid possible apoptosis-induced cleavages, we identified roughly 250 labeled N termini from the ϳ1000 human SwissProt proteins that are in MitoCarta, a highly curated database of mitochondrial proteins (24).
The distribution of N terminus placement found in mitochondrial proteins is quite different from that of nonmitochon-drial proteins. We see a significant spike in the range of position 10 to position 65, roughly the location of most known mTP cleavage sites (Fig. 6A) (27). We therefore focused our examination on the 171 N termini seen in this range in Mito-Carta proteins. For the purpose of these analyses, in cases where one protein had more than one N terminus in this region, we chose the site closest to the translational N terminus. We found that 58 had no mTP cleavage site annotated in SwissProt, and 67 had an annotated site different from the one we observed (supplemental Fig. S3A and supplemental Table S3C). Additionally, 24 peptide removal start sites are within one residue of their SwissProt annotations. This may reflect the secondary proteolysis known to occur in the mTP removal pathway (27), and may be a part of a mitochondrial version of the N-end rule (35). It is notable that 46 out of the 67 cases (69%) where our data disagrees with SwissProt annotation involve cleavage sites that SwissProt describes as "By Similarity," "Potential," or "Probable," indicating a lack of strong evidence for these cuts. In contrast, in only 17 out of  cuts after residue 65). The untreated and apoptotic datasets had similar levels of translational N terminus labeling (ϳ10%), but differed for the latter categories, with the apoptotic datasets having more cleavage events past residue 65. The apoptotic caspase-cleaved set is shifted even more toward endoproteolytic cleavages than the apoptotic set.  [2]). Two proteins labeled at residue 1 but annotated in UniProt as not containing an initiator methionine (Ig lambda chain V-IV region HII (P01717, serine) and 40S ribosomal protein S30 (P62861, lysine)) were removed from the datasets for the iceLogo creation. the 46 cases (37%) where our data agree with SwissProt does the annotation contain this qualifying language (supplemental Table S3C). We generated an iceLogo for the 16 residues before and four residues after the 171 mTP cleavage sites (Fig. 6B). As expected based on previous studies, the iceLogo shows enrichment for arginine and de-enrichment for acidic residues throughout the transit peptide. The strongest arginine signal is in the P2 and P3 positions, as expected from previous studies (28) and is the most enriched signal in our untreated position 10 -65 logo (Fig. 6C). Signal Peptide Removal-Proteins destined for the secretory pathway are subjected to proteolytic removal of signal peptides near the N terminus (36). As with mTPs, we looked for these cleavages only in the untreated dataset to avoid any apoptosis-specific events. In this case, we focused on the 63 proteins that were annotated in SwissProt as having a signal peptide removal site between positions 10 and 65 (supplemental Table S3D). Most of these were ER or Golgi resident proteins; our technique does not efficiently capture secreted proteins after they have dissociated from the cell. In 30 of these proteins, the cleavage site we observed matched the one annotated in SwissProt (supplemental Fig. S3B). In this case, the qualifiers "By Similarity" and "Potential" were used by SwissProt to describe at least 60% of their signal peptide removal site annotations both in the set that agrees with our data, and the set that does not. We cannot be sure what accounts for the differences between our data and SwissProt, but we hope that these data will prove useful to researchers who work in this area. As with the mTP sites, we generated an iceLogo for positions P16 to P4Ј relative to the cleavage site (Fig. 6D). In this case, we saw a substantial enrichment for leucine residues in the P16-P6 positions, which is consistent with the known requirement for hydrophobicity in the signal peptide to facilitate interaction with the ER membrane (36,37).
Endoproteolysis Before and After Apoptosis-We next consider the endoproteolytic events that occur after residue 65, which represent the majority of our identified N termini (Fig. 4). Although many of the cleavages that occur before or at residue 65 are endoproteolytic, we chose to focus on the 66ϩ set to reduce the contamination of this set with possible signal or transit peptide removal sites. Considering the apoptotic set cleaved after residue 65, 28% of them occur with an aspartic acid residue at the P1 position, suggesting a caspase cleavage event (supplemental Fig. S4). To visualize noncaspase protease activity, we removed all cleavage sites with aspartic acid at the P1 position and then used iceLogo to generate filled logos (where letter height represents amino acid frequency) for the P1 and P1Ј positions in the untreated and apoptotic datasets (Fig. 7). Both logos show the predominance of basic amino acids (arginine and lysine) at P1 and small amino acids (glycine, serine, or alanine) at P1Ј. However, there is an overall decrease in the fraction of cleavages following arginine or lysine in the apoptotic dataset. This likely reflects the induction of other noncaspase and nontryptic proteases during apoptosis that are different from proteases in healthy cells.
Many human proteins undergo degradation by the proteasome, however we do not expect to see very many proteasome products using our subtiligase method because the majority are quickly degraded into their amino acid components (38). Those that remain intact (for example, in order to be displayed on an MHC class I complex) are mostly less than 10 amino acids long; the fractionation steps required to sep-arate the small biotin label from the larger labeled protein fragments in the subtiligase N-terminal enrichment technique causes significant loss of short peptides. Furthermore, our mass spectrometry data searches only considered peptides with a tryptic site on the C terminus, and only a subset of proteasome peptide products would fit this criteria.
The N-End Rule Before and After Apoptosis-The P1Ј position of a cleavage site is important for the half-life of the resulting protein fragment, as determined by the Arg/N-end rule pathway (35). Many proteases, including caspases (39), prefer the small amino acids glycine, serine, or alanine in the P1Ј pocket; in fact, these three residues make up 32% of the P1Ј residues of the ϳ56,000 protease cleavages (of both native and synthetic substrates) described in the substrate section of the MEROPS database. However, small amino acids are not a requirement, as we saw all twenty possible amino acids in the P1Ј position in the untreated, apoptotic and apoptotic caspase-cleaved datasets ( Table I). The untreated dataset had a greater proportion of cleavages with charged amino acids (lysine, arginine, aspartic acid, and glutamic acid) at the P1Ј position than apoptotic or apoptotic caspasecleaved datasets, whereas the apoptotic and apoptotic caspase-cleaved datasets had more cleavages with serine and glycine in the P1Ј position.
The Arg/N-end rule pathway degrades proteins and proteolysis products through ubiquitination and targeting to the proteasome. Different N-terminal amino acids may be stabilizing or destabilizing, and thus affect the half-life of the protein (35). To investigate the potential for a biological effect of the proteolysis products, we analyzed the data with respect to the theoretical half-lives for each P1Ј amino acid (40). We grouped the amino acids into stabilizing P1Ј amino acids (half-life greater than 20 h), destabilizing (half-life less than 1.5 h), and intermediate (half-live between 1.5 and 20 h) ( Table  I). For all three datasets, the majority (53-60%) of N termini were found in the intermediate group. However, there is a clear difference in the pattern of stabilizing and destabilizing cuts depending on cell condition. Almost 14% of all untreated cleavage events occurred before destabilizing amino acids. In contrast, only 7% of apoptotic cleavages and 4% of apoptotic caspase cleavages occurred before destabilizing amino acids, with most shifting into the intermediate range of predicted half-lives. In general, untreated proteolysis events were more destabilizing and had shorter theoretical half-lives than apoptotic events, and caspase cleavages leave particularly stable and longer lasting predicted N termini. DISCUSSION The subtiligase N-terminal labeling method yields high selectivity for ␣-amines with accurate peptide and protein iden-tifications and very low false discovery rate (Ͻ 1%). Overall, we observed 3206 proteins, corresponding to almost 16% of the entire SwissProt human proteome and covering over 6 logs in protein abundance. We see internally consistent labeling between the datasets. For example, the same mitochondrial transit sites and initiator methionine sites were labeled in 35-40 out of 44 untreated and apoptotic experiments. These cleavage events are independent of apoptosis, and therefore show the consistency in our labeling and detection method. Additionally, we believe that little bias originates because of subtiligase specificity. For example, all 20 amino acids were labeled at P1Ј, including proline and valine, which are known to be slow substrates for subtiligase in vitro (10).
We are confident that the distinctions between healthy and apoptotic datasets reflect the biology of the human cell lines, as the apoptotic samples showed decreased cell viability and increased caspase activity (as measured both in cell culture and in observed caspase-cleaved N termini relative to the untreated samples). In fact, 129 of the 140 aspartic-cleaved N termini identified in healthy cells were also in the 1706 apoptotic N terminus set, and likely reflect a small population of apoptotic cells within the healthy cell population. Comparing our apoptotic protein dataset to other published datasets shows that we capture most known apoptosis-related proteins. We have labeled more than 60% of the proteins listed in the ApoptoProteomics database (41), an apoptotic proteomics database. Additionally, 75 of our caspase substrates overlap with the literature-curated CASBAH database (16), and 79 overlap with the caspase-3, -6, or -7 substrates listed in MEROPS (17) (in both of these comparisons, we excluded Ͼ200 entries present in these databases but derived from the original subtiligase study (8)). These 1706 apoptotic caspasecleaved sites, in combination with MEROPS, the CASBAH, and other work from our laboratory (8,12,13), bring the total known human caspase cleavage sites to more than 2200. Importantly, the DegraBase contains a larger number of new noncaspase proteolytic events that have yet to be assigned to a specific protease. Surprisingly, there are only 45 sites in the 2900 noncaspase, nontryptic (not cleaved after lysine or arginine) endoproteolytic events (residue 66ϩ) that are present in the MEROPS database (release 9.6), and only 10 of these sites are annotated as "physiological" cleavages in MEROPS (supplemental Table S4). Interestingly, there is little exoproteolysis of these intracellular proteins, whereas laddering produced by sequential exoproteolysis was observed in 24% of all proteins identified in a subtiligase-based study of human serum (9).
It is probable that only a subset of the total identified apoptotic proteolytic substrates needs to be cleaved to complete apoptosis. For example, there are 1706 putative caspase cleavages, but 784 was the largest number of sites labeled in any single cellular experiment. Additionally, about 50% of all apoptotic N termini identified were only seen in one experiment. This may reflect the diversity in drug induction and cell types chosen as well as the expected stochasticity in mass spectrometry and our labeling technology. Some of the apoptotic labeling patterns may also be caused by induced polyspecific proteases, like the caspases, with large and diverse sets of possible substrates. Only 110 sites (87 caspasecleaved) from 109 proteins were seen consistently in at least 10 apoptotic experiments and only one or zero untreated experiments (supplemental Table S5). Interestingly, these common cleavages have a wide kinetic range. Some are shown to be cleaved by up to three different apoptotic caspases (12), and many have homologs in mouse and fly that are also known caspase substrates (14). These apoptosisenriched sites may represent important apoptotic nodes, whereas other cleavage sites may be unique to the experimental conditions, or possibly bystander cleavages.
Remarkably, the protease actors in healthy and apoptotic cells appear to target an overlapping set of substrates, but not always at the same cleavage sites (Fig. 2). This could allow for different regulation of these targets depending on the cellular conditions. An important difference between the untreated and apoptotic datasets is the theoretical half-lives of the newly created N termini. We found the neo-N termini created by caspase cleavages have a higher proportion of stabilizing N-terminal amino acids than those in the untreated and apoptotic datasets (Table I). A similar conclusion that apoptotic fragments tended to persist during apoptosis was reached using the PROTOMAP method (42). Stable apoptotic cleavage products, in particular protein fragments of caspase substrates, may function in a different manner from the parent protein. We realize that these half-life values are largely dependent on the assay method and may not be representative of specific in vivo half-lives of a given protein fragment. However, Piatkov et al. have recently showed the greater extent that caspases and the Arg/N-End Rule pathway interact: proapoptotic protein fragments contain evolutionarily conserved destabilizing N-terminal amino acids, targeting them for quick degradation by the proteasome in healthy cells; caspases cleave and inactivate members of the proteasome pathway, preventing peptide degradation (43). Indeed, we do see caspase cleavages in UBR4 and UBR5, which function as the Arg/N-End Rule pathway E3 Ubiquitin ligases.
During apoptosis, caspase cleavages may result in loss-offunction, gain-of-function, or no functional effect on the substrate. As many caspase cleavage events occur between protein domains (8), many substrates have the potential for gain of function events in which a catalytic domain is relocated or an inhibitor removed. Several such cleavages have been thoroughly studied in kinases, as reviewed by Kurokawa and Kornbluth (44). In a preliminary search through our database, we see enrichment for caspase cleavages between annotated domains. For example, in the kinase family, 52 of the 57 caspase-derived N termini occurred between domains, compared with 51 of 76 noncaspase apoptotic N termini and 15 of 26 untreated N termini (45, 46) (supplemental Table S6). This is consistent with a recent study by Dix et al., that demonstrated crosstalk between caspases and kinases, where phosphorylation can direct caspase cleavages on kinases that may lead to a change in kinase activity (47).
In sum, we provide an unbiased and global annotation of the human cellular ␣-aminome. The data are consolidated into a searchable database, the DegraBase, revealing a large amount homeostatic and apoptotic proteolysis in cells. To our knowledge, these untreated and apoptotic datasets are the most extensive published to date using a single methodology. We confidently identified many new sites related to protein processing, including initiator methionine retention or removal, and the specific cleavage locations for signal or transit peptide removal during protein trafficking. Additionally, our dataset shows the abundance of healthy homeostatic and non-caspase apoptotic endoproteolytic events that occur in cells. We hope that our colleagues across many areas of biology will find the DegraBase to be a useful resource for further understanding and characterization of proteolytic events in cells.