Formalin-Fixed, Paraffin-Embedded Tissues (FFPE) as a Robust Source for the Profiling of Native and Protease-Generated Protein Amino Termini*

Dysregulated proteolysis represents a hallmark of numerous diseases. In recent years, increasing number of studies has begun looking at the protein termini in hope to unveil the physiological and pathological functions of proteases in clinical research. However, the availability of cryopreserved tissue specimens is often limited. Alternatively, formalin-fixed, paraffin-embedded (FFPE) tissues offer an invaluable resource for clinical research. Pathologically relevant tissues are often stored as FFPE, which represent the most abundant resource of archived human specimens. In this study, we established a robust workflow to investigate native and protease-generated protein N termini from FFPE specimens. We demonstrate comparable N-terminomes of cryopreserved and formalin-fixed tissue, thereby showing that formalin fixation/paraffin embedment does not proteolytically damage proteins. Accordingly, FFPE specimens are fully amenable to N-terminal analysis. Moreover, we demonstrate feasibility of FFPE-degradomics in a quantitative N-terminomic study of FFPE liver specimens from cathepsin L deficient or wild-type mice. Using a machine learning approach in combination with the previously determined cathepsin L specificity, we successfully identify a number of potential cathepsin L cleavage sites. Our study establishes FFPE specimens as a valuable alternative to cryopreserved tissues for degradomic studies.

Dysregulated proteolysis represents a hallmark of numerous diseases. In recent years, increasing number of studies has begun looking at the protein termini in hope to unveil the physiological and pathological functions of proteases in clinical research. However, the availability of cryopreserved tissue specimens is often limited. Alternatively, formalin-fixed, paraffin-embedded (FFPE) tissues offer an invaluable resource for clinical research. Pathologically relevant tissues are often stored as FFPE, which represent the most abundant resource of archived human specimens. In this study, we established a robust workflow to investigate native and protease-generated protein N termini from FFPE specimens. We demonstrate comparable N-terminomes of cryopreserved and formalin-fixed tissue, thereby showing that formalin fixation/paraffin embedment does not proteolytically damage proteins. Accordingly, FFPE specimens are fully amenable to N-terminal analysis. Moreover, we demonstrate feasibility of FFPE-degradomics in a quantitative N-terminomic study of FFPE liver specimens from cathepsin L deficient or wild-type mice. Using a machine learning approach in combination with the previously determined cathepsin L specificity, we successfully identify a number of potential cathepsin L cleavage sites. Our study establishes FFPE specimens as a valuable alternative to cryopreserved tissues for degradomic studies. Formalin fixation and paraffin embedment are the prevailing methods to preserve tissues for routine clinical diagnostics and archival purposes. As such, formalin-fixed, paraffin-embedded (FFPE) 1 specimens represent a large collection of clinically annotated samples that are stored for long periods at room temperature. While many still consider cryopreserved specimens as the gold standard in clinical research, the recruitment of cryopreserved tissues in sufficient numbers for robust study designs is challenging. FFPE tissues offer an attractive alternative for the retrospective analysis of pathological processes.
Proteomic analysis of FFPE tissues has gained increasing interest since it was first presented (1). Studies have successfully demonstrated that FFPE tissues are amenable to all widely applied mass spectrometry (MS) platforms, including reversed-phase liquid tandem MS, matrix-assisted laser desorption ionization (MALDI) time-of-flight (TOF), and surfaceenhanced laser desorption ionization TOF analyses (2), as well as MALDI imaging (3,4). Interestingly, protein identification numbers and proteome coverage were found to be equivalent for FFPE and cryopreserved tissue, and FFPE tissues can be analyzed to a depth of up to 10 000 proteins per sample (5). Similarly, FFPE and cryopreserved tissues do not differ with regard to localization and function of identified proteins. Moreover, studies have also shown that identified protein subsets share a substantial overlap (6,7) and that utilization of different FFPE processes does not impede proteomic analysis (8).
While formaldehyde is known to fix proteins in tissue by reacting with basic amino acids (such as lysine, asparagine, and glutamine (9)) to form methylol adducts or reacting with carbonyl functional groups to form imine adducts between amines and aldehydes, these modifications are rarely detected in FFPE proteomes (10). In fact, it is known that very few carryovers of the formalin fixation process are retained following protein extraction for proteomics analysis. However, a minor shift of the arginine to lysine ratio has been observed, indicating the persistence of yet undefined modifications or cross-links (8,11,12). Nevertheless, analysis of common posttranslational modifications such as phosphorylation and glycosylation showed equal preservations in FFPE and cryopreserved tissue specimens (13).
Proteolysis is an irreversible posttranslational modification, often generating stable cleavage products with novel functionality or cell-contextual localization (14). Dysregulated proteolytic processing is a hallmark feature in numerous diseases (14,15). Thus, it is not surprising that many have turned to proteomics for the elucidation of the precise role of specific protease(s) as well as the identification their physiological substrates. At present, the majority of proteomics-based approaches for the system-wide analysis of proteolytic processing rely on the enrichment and subsequent investigation protein termini with the most widely used techniques focusing on amino termini (14,15). This is witnessed by the number of established strategies, which have been developed to investigate protein N termini (14). Typically, terminal and side-chain amino groups of full-length proteins are chemically modified, followed by protein digestion using trypsin to generate internal peptides that possess free amino termini. This chemical difference ("free" versus "protected" amino termini) between protein N termini and internal peptides is used to specifically enrich for native N-terminal peptides with subsequent LC-MS/MS analysis. Commonly used enrichment strategies are based on differential chromatography (combined fractional diagonal chromatography (16) and charge-based fractional diagonal chromatography) (17)), charge-reversal enrichment of protein amino termini termini (18), or usage of a highmolecular weight, amine-reactive polymer in combination with ultrafiltration (terminal amine isotopic labeling of substrates (TAILS) (19)).
To date, N-terminomics investigation from FFPE tissues has not yet been probed, perhaps owing to an existing reservation of whether FFPE specimens are amenable to degradomic strategies, as well as skepticism concerning their ability to preserve the "proteolytic signature" of biological specimens. In this study, we have developed a TAILS-based workflow for the degradomic investigation of FFPE specimens. Using corresponding cryopreserved specimens, we show that FFPE processing does not damage protein amino termini and resulting N-terminal peptides do not retain any carryover from the formalin fixation process after N-terminal enrichment. Furthermore, we demonstrate the feasibility of quantitative degradomic studies by comparing liver FFPE specimens from cathepsin L deficient and correspondingo wild-type mice. As a perspective, our study highlights the amenability of FFPE tissues to terminomic profiling and thus enables the potential in harnessing FFPE specimens from the clinical archives as a valuable source for the investigation of disease-associated proteolysis.

EXPERIMENTAL PROCEDURES
Experimental Design and Statistical Rationale-A total of three sample sets of FFPE mice liver tissues (male, 6-months old C57BL/6 strain) were analyzed and described in Results. Each sample set comprised of three biological replicates. Experimental controls from each sample set include wild-type tissues (comparison with knockout tissues) or cryopreserved tissues (comparison with FFPE tissues) or nonlabeled samples (comparison with 13 COD 2 formaldehyde-labeled samples). Three biological replicates were investigated in combination with a label-switch between 12 COH 2 formaldehyde and 13 COD 2 formaldehyde to provide statistical significance. Statistical analysis using linear models for microarray data (Limma) (20,21) allows for the use of linear models to assess differential expression in the context of multifactor designed experiments. In addition, Limma has the ability to analyze complex experiments involving comparisons between many peptides simultaneously in a small sample size.
Processing of Tissue Specimens-For formalin fixation and paraffin embedment, whole livers were harvested from male, 6 months old C57BL/6 wild-type mice or male, 6 months old C57BL/6 mice lacking cathepsin L (Ctsl Ϫ/Ϫ ) and fixed in 4% (v/v) formaldehyde solution in phosphate buffered saline for 16 h. After formalin fixation, tissue specimens were processed using a xylene-based STP 120 Spin Tissue Processor (Thermo Scientific, Bremen, Germany) and embedded in standard paraffin blocks. Subsequently, 30 tissue sections at 10 m thickness were cut from each paraffin block. All FFPE slides were deparaffinized using four times xylene for 5 min, two times with 100% ethanol for 1 min, one time with 96% ethanol for 1 min, one time with 70% ethanol for 1 min, one time with 50% ethanol for 1 min, and one time with distilled water for 5 min. For cryopreservation, fresh livers were withdrawn from mouse and were immediately snap-frozen in liquid nitrogen. Cryopreserved specimens were stored at -80°C.
Protein Extraction and Sample Preparation-Following deparaffinization, FFPE tissue sections were incubated in 100 mM 4 -2(2hydroxyethl)-1-piperazineethanesulfonic acid (HEPES) pH 7.5, 4% (w/v) sodium dodecyl sulfate (SDS), 50 mM dithiothreitol (DTT) for 1 h at 95°C with gentle agitation. For cryopreserved samples, tissues were homogenized using Ultra-Turrax T8 Homogenizer (IKA-Werke, Wilmington, NC, USA) in 200 mM HEPES, pH 8.0, and 4% (w/v) SDS following by heating at 95°C for 30 min with gentle agitation. Lysates from cryopreserved tissues were reduced using 10 mM DTT at 60°C for 30 min. FFPE and cryopreserved protein lysates were cooled and alkylated using 20 mM of iodoacetamide for 30 min in the dark, followed by centrifugation at 14,000 ϫ g for 15 min. Extracted proteins in the supernatant were precipitated using nine volumes of ice cold acetone and one volume of ice cold methanol at -80°C for 2 h. Precipitated proteins were harvested using centrifugation at 4,500 ϫ g for 2 h at 4°C. Resulting protein pellets were washed four times with ice cold methanol and then resolubilized in ice-cold 100 mM NaOH by water-bath ultrasonication at 4°C. The solution was brought to pH 7.5-8.0 by the addition of 200 mM HEPES free acid. Protein concentration was determined using bicinchoninic acid protein assay (Thermo Fisher).
N-Terminal Amino Isotopic Labeling of Substrates-Enrichment of protein N termini using terminal amine isotopic labeling of substrates (TAILS) was conducted as described previously (19). Briefly, extracted proteins from deparaffinized FFPE tissues and cryopreserved tissues were dimethylated using 40 mM 12 COH 2 formaldehyde or 40 mM 13 COD 2 formaldehyde in the presence of 40 mM sodium cyanoborohydride at 37°C for 16 h. Excess formaldehyde was quenched by the addition of 50 mM tris(hydroxymethyl)aminomethane (TRIS). Following the amine protection step, proteins were precipitated using nine volumes of ice-cold acetone and one volume of ice-cold methanol at -80°C for 2 h. Precipitated proteins were harvested using centrifugation at 4,500 ϫ g for 2 h at 4°C. Protein pellets were washed four times with ice-cold methanol and then redissolved in ice-cold 100 mM NaOH by water-bath ultrasonication at 4°C. The solution was brought to pH 7.50 -8.0 by the addition of 200 mM HEPES free acid and. Protein concentration was determined using bicinchoninic acid protein assay. Proteins were digested using sequencing-grade trypsin (Worthington Biochemical Corp, Lakewood, NJ) in a 100:1 (w/w) ratio at 37°C pH 7.0 for 16 h. Resulting free neo-N termini generated from the tryptic digestion were captured by hyperbranched polyglycerol-aldehydes (HPG-ALD) polymers in the presence of 40 mM sodium cyanoborohydride at 37°C for 16 h. Following capture, HPG-ALD hyperbranched polymers were saturated using 50 mM glycine for 1 h at room temperature and subsequently removed by ultracentrifugation using 10 kDa MWCO Microcon spin filters (Milipore, Billerica, MA). Collected flow-through fractions containing N-terminal peptides were desalted using C-18 Sep Pak (Waters, Milford, MA) and fractionated on high-performance liquid chromatography (SCX-HPLC) coupled to a strong cation exchange column (PolyLC, Columbia, MD). Buffer A consisted of 5 mM KH 2 PO 4 and 25% (v/v) acetonitrile (pH 2.7), and buffer B consisted of 5 mM KH 2 PO, 1 M KCl, and 25% acetonitrile (pH 2.7). Peptides were eluted in a linear gradient with increasing concentration of buffer B. Resulting fractions were collected, desalted using self-packed C18 STAGE tips (Empore, St. Paul, MN) (22), and analyzed by LC-MS/MS.

LC-MS/MS and Data
Analysis-Samples were analyzed on an Orbitrap XL (Thermo Scientific) or an Orbitrap Q-Exactive plus (Thermo Scientific) mass spectrometer. The Orbitrap XL was coupled to an Ultimate3000 micro pump (Thermo Scientific). Buffer A was 0.5% (v/v) acetic acid, buffer B 0.5% (v/v) acetic acid in 80% acetonitrile (HPLC grade). Liquid phases were applied at a flow rate of 300 nl/min with an increasing gradient of organic solvent for peptide separation. Reprosil-Pur 120 ODS-3 (Dr. Maisch, Ammerbuch-Entringen, Germany) was used to pack column tips of 75 m inner diameter and 11 cm length. The MS was operated in data-dependent mode, and each MS scan was followed by a maximum of five MS/MS scans. The Q-Exactive plus mass spectrometer was coupled to an Easy nanoLC 1000 (Thermo Scientific) with a flow rate of 300 nl/min. Buffer A was 0.5% (v/v) formic acid, and buffer B was 0.5% (v/v) formic acid in acetonitrile (water and acetonitrile were at least HPLC gradient grade quality). A gradient of increasing organic proportion was used for peptide separation (5-40% (v/v) acetonitrile in 80 min). The analytical column was an Acclaim PepMap column (Thermo Scientific), 2 m particle size, 100 Å pore size, length 150 mm, inner diameter 50 m. The mass spectrometer operated in data-dependent acquisition mode with a top 10 MS/MS method at a mass range of 300 -2000 Da.
MS data were converted to mzML format (23) using ProteoWizard (24). The complete data analysis was performed with a fully automated workflow within the OpenMS framework (25) (Supplemental Fig. 1). Peptide sequences were identified by MS-GFϩ (26) peptide search engine with decoy search strategy. A complete mouse proteome sequence file was downloaded from UniProt (27) on October 16, 2011, comprising 44,819 protein sequences. It was appended to an equal number of randomized sequences, derived from the original mouse proteome entries. Semi Arg-C specificity was used as search parameters with mass tolerance set at 20 ppm for precursor ions. Static modifications applied include cysteine carboxyamidomethylation (ϩ57.02 Da), lysine and N-terminal dimethylation ( 12 COH 2 formaldehyde ϩ28.03 Da or 13 COD 2 formaldehyde ϩ34.06 Da, if applicable), N-terminal monomethylation ( 12 COH 2 formaldehyde ϩ14.02 Da or 13 COD 2 formaldehyde ϩ17.03 Da, if applicable), and N-terminal acetylation (ϩ42.01 Da). The MS-GFϩ results were further validated by OpenMS at a confidence level greater than 95%. The relative quantification for each peptide was calculated using the Feature-FinderMultiplex tool (28) (as part of OpenMS). For cleavage events in which peptides were only present in wild-type or Ctsl Ϫ/Ϫ condition, a ratio of 2 Ϫ10 or 2 10 was assigned, respectively. A list of potential cathepsin L substrates has been predicted on the basis of a dataset for cathepsin L cleavage specificity from the MEROPS peptidase database. The method is based on an efficient string kernel implemented in the Explicit Decomposition with Neighborhood library (DOI: 10.5281/zenodo.27945). The method uses the notion of k-mers with gaps to enumerate all possible substrings of increasing orders (starting from monomers up to eight-mers), which are used as features in a linear binary classification estimator. The full computational pipeline, which allows a good estimate of the likelihood of cleavage target sites, is available under the Galaxy open, web-based platform for data-intensive biomedical research (https://toolshed.g2.bx.psu.edu).
Data Availability-The mass spectrometry data have been deposited to the ProteomeXchange Consortium (29) PRoteomics IDEntifications (PRIDE) partner repository with dataset identifier PXD002847 (reviewer account details-username: reviewer38683@ebi.ac.uk password: mIR5jHGS). Search results (pepXML format), along with .raw and mzML files, have been deposited. Annotated spectra are provided via MS-Viewer (30) with URLs being listed in Supplemental Table 1.

Enrichment of N-terminal Peptides from FFPE Specimens-
This study aims at the establishment of a protocol for the enrichment of N termini from formalin-fixed, paraffin embedded tissues for mass spectrometry analysis (Fig. 1). Proteins were extracted from deparaffinized tissues, using an extraction buffer containing HEPES as buffering agent, SDS as denaturing agent, and DTT as reducing agent, together with heating at 95°C for an extended period of time in order to revert the chemical modifications that are formed during formalin fixation. Once proteins were successfully extracted, the enrichment of N termini for mass spectrometry analysis was performed according to the original TAILS workflow (19). The technique depletes internal peptides after tryptic digest thereby enriching for naturally occurring N termini. Prior to the enrichment step, mass spectrometry analysis identified the majority of the N termini as being unmodified, while only few dimethylated and acetylated N termini were detected ( Fig. 2A) On the other hand, when the same sample was subjected to N-terminal enrichment using TAILS, unmodified N termini were completely depleted ( Fig. 2A). The identified dimethylated peptides were mainly derived from the first 20% of the full length protein chain (Fig. 2B), similarly observed in a previous study in canonical positional profiling of N termini from fresh or cryopreserved tissue or cultured cells (31).
N-terminal Coverage from Cryopreserved and FFPE Specimens-While cryopreserved tissues are immediately snapfrozen upon harvest, formalin fixation/paraffin embedment of tissues involves extended workflows from tissue harvest to paraffin block embedment and histoprocessing. Moreover, FFPE specimens were stored at room temperature for extended periods of time. To gain insight into the differential status of N termini in cryopreserved and formalin-fixed tissues, the N-terminal enrichment procedure was applied to assess N-terminal peptides from the different preservation conditions from liver tissue of C57BL/6 wild-type mouse.
LC-MS/MS analysis yielded comparable numbers of N termini identifications (unmodified, acetylated and dimethylated) among the three biological replicates in each of the two differentially processed tissues (Supplemental Tables 2-7). 3,000 -3,800 N termini were identified in cryopreserved specimens, while 2,000 -2,300 N termini were identified in formalin-fixed tissues. Incomplete overlap between proteomic experiments is an intrinsic characteristic when comparing between different biological replicates (32). In this study, a total of 987 N termini was identified among the three bio-logical replicates of FFPE tissue, while 1,199 overlapping N termini were identified among the cryopreserved counterparts. From these, a total of 486 N termini were shared between both preservation methods (among all replicates of cryopreserved and FFPE tissues, respectively). Previous studies indicate that proteins extracted from FFPE tissues are susceptible to a ϩ12 Da addition at N termini, lysine, tryptophan, tyrosine, serine, and threonine residues, as well as a ϩ30 Da addition at cysteine, histidine, lysine, and arginine residues (20,33,34). In both cryopreserved and FFPE samples, the fraction of peptides displaying these modifications remained below the 5% false discovery rate (data not shown).
Acetylation is the most prevalent native N-terminal modification, which occurs in a posttranslational manner while Nterminal dimethylation is introduced during the TAILS procedure to protect free N termini that are often generated by endogenous proteolysis (14,35). The ratio of acetylated to dimethylated N termini differs to a limited extent between cryopreserved and FFPE specimens. In cryopreserved samples, 58.7 Ϯ 4.3% of N termini were chemically dimethylated; this number was 42.1 Ϯ 1.4% for FFPE specimens (Fig. 3A). In both cryopreserved and FFPE specimens, dimethylated N termini mainly map to the first 20% of the full length protein chain (Fig. 3B) in good correspondence with the canonical positional profile of N termini (31). The increased proportion of dimethylated N termini in cryopreserved samples may indicate that formalin fixation and paraffin embedment prevents chemical dimethylation of a limited number of protein termini. On the other hand, the situation may also be indicative of increased proteolysis during cryopreservation. Noteworthy, the elevated number of dimethylated N termini in cryopreserved samples coincides with an increased fraction of ter-mini that map to internal positions (Fig. 3B), thus signaling aberrant cleavage within the protein chain. In another recent degradomic study of mouse embryonic kidney, we also observed a reduced level of dimethylated peptides when compared with acetylated peptides (18). By introducing a novel gel-based enrichment of N termini, we observed a significantly reduced fraction of peptides mapping to internal positions (18). For this reason, we consider it likely that the increased proportion of dimethylated N termini in cryopreserved samples is indicative of increased proteolysis during cryopreservation rather than signifying limited reactivity of N termini from FFPE samples. Further studies have also indicated that proteins are not proteolytically damaged during formalin fixation and histoprocessing (8). Nevertheless, the N-terminal peptides (acetylated and dimethylated) identified in cryopreserved or FFPE tissues show a high degree of similarity. For acetylated N termini, residues in P1Ј are predominantly methionine in peptides derived from both cryopreserved and FFPE tissues, along with a consistent alanine fingerprint in position P2Ј-P6Ј (Fig. 4A). On the other hand, for N-terminally dimethylated peptides, residues, such as serine, glycine, valine, alanine, and threonine, are equally observed at position P1Ј in both sample types (Fig. 4B). N termini from both FFPE and cryopreserved tissues map to proteins, which represent similar cellular components and molecular functions (Figs. 4B and 4C). The congruence in the proteome analysis of FFPE and cryopreserved samples is in line with an earlier study (8).
To specifically probe for putative formalin carryover, we further performed the TAILS N-terminomic analysis from FFPE specimens using only "heavy" 13 COD 2 formaldehyde. This setup clearly distinguishes light, carryover formaldehyde. We detected almost exclusively N termini with the heavy form of formaldehyde labeling (Fig. 5A, Supplemental Table 8), with the large population of heavy dimethylated peptides strongly indicating the absence of significant formalin carryover. Monomethylated N termini were also detected, which is attributed to N-terminal monomethylated proline-starting peptide sequences (Fig. 5B), as reported previously in (36,37). In total, the very low detection numbers of the 12 COH 2 formaldehyde "light" counterparts remain within the 5% false dis- covery rate margin, which we employed for peptide identifications. Evidently, our technique is not biased by carryover of light formaldehyde from formalin fixing of tissues. Therefore, we conclude that FFPE specimens are readily amenable to N-terminal degradomic profiling.
Altered N-Terminal Processing in the Liver of Cathepsin L Deficient Mice-The strength of the TAILS procedure is its suitability for comparative degradomic studies by straightforward incorporation of different formaldehyde isotopes as an integral part of the procedure. To assess compatibility of FFPE specimens with quantitative-comparative N-terminal profiling, we chose FFPE liver samples of cathepsin L deficient and corresponding wild-type mice. We previously showed that cathepsin L deletion in mice (Ctsl Ϫ/Ϫ ) results in a fundamentally perturbed protease network with a large number of downstream and secondary effects (38).
Stable isotopic formaldehyde labeling and TAILS were applied for the quantitative N-terminomic comparison of formalin-fixed liver tissues from wild-type and cathepsin L deficient mice. Three formalin-fixed liver tissues of cathepsin L deficient and wild-type mice were compared, incorporating a labelswitch strategy. A total of 8,061 nonredundant N termini (monomethylated, dimethylated, and acetylated) were quantified in all three biological replicates; with 5,926, 6,306, and 6,352 N-terminal peptides identified in individual biological rep-licates, respectively (Fig. 6A) (Supplemental Tables 9, 10, and 11). The distribution of fold changes of all three replicates show a near normal distribution, with the majority N-terminal peptides being equally abundant in wild-type and Ctsl Ϫ/Ϫ liver tissues. The characterization on the specificity of N-␣ acetylation in these FFPE tissues showed that acetylated N-terminal peptides have a preference for alanine, serine, and glutamate residues in P1Ј and also to a lesser extent in P2Ј (Fig.  6B). These results are in direct agreement with the prototypical profile of N-terminal acetylation and from previous studies on N-terminal acetylation in murine skin (35,48).
The identified monomethylated, dimethylated, and acetylated peptides were mainly attributed to the first 20% of the full length protein chain (Fig. 6C). A total of 1,720 mono-and dimethylated N termini (matched to 1,713 mouse proteins) were consistently identified in all three biological replicates. Gene ontology annotation for molecular functions classified these peptides to be predominantly involved in binding and catalytic activities (Fig. 6D) while cellular compartmental annotation showed that these peptides are mostly localized within intracellular compartments (Fig. 6E).
We are predominantly interested in cleavage events that depend on the presence of cathepsin L. Statistical analysis of mono-and dimethylated N termini using a moderated t test based on linear model for microarray data (Limma) (20,21) combined with Benjamini-Hochberg procedure of 5% false discovery rate (n ϭ 3) indicated that 205 peptides showed significant reduction in abundance when comparing Ctsl Ϫ/Ϫ versus wild-type liver tissues (Supplemental Table 12). Among these peptides, ten N-terminal peptides mapped to the postremoval of initiator methionine, ten stem from the removal of a signal peptide domain, and five stem from the removal of a transit peptide domain while the remaining 187 peptides stem FIG. 5. N termini peptides from 13 COD 2 formaldehyde heavy labeled proteins from FFPE liver tissue of C57BL/6 wild-type mouse. (A) Composition of light and heavy acetylated, chemically dimethylated (naturally unmodified protein N termini) and monomethylated N termini. (B) Visualization of identified monomethylated N termini being predominantly proline residue. Sequence logo was generated using iceLogo (60). from aberrant cleavage within the protein chain (Fig. 6D). Cathepsin L-mediated cleavage is primarily guided by a strong preference for aromatic and aliphatic residues in P2 with limited prime-site specificity contributions (33). Previous studies using Ctsl Ϫ/Ϫ mice have shown the involvement of this protease in the regulation in cardiac homeostasis (49 -51), immune system (52,53), hormonal processing (54,55), and tumorigenesis (56 -59). We employed an artificial neural network (machine learning) approach to distinguish significantly downregulated cleavage sites that adhere to the annotated cathepsin L specificity. For the machine learning process, we employed cathepsin L cleavage sites from MEROPS as training data. This approach yielded a list of 23 potential substrates for cathepsin L in the FFPE liver tissues of CtslϪ/Ϫ mice (Table I). Cathepsin L dependent proteolytic processing for some of these proteins was previously observed in murine skin (35), namely protein disulfide-isomerase, ATP synthase subunit beta, alpha-enolase, and cytoplasmic actin 1. While these substrate candidates are predominantly localized in the cytoplasm, cathepsin L is a lysosomal protease. We consider it likely that autophagic processes participate in delivering the aforementioned substrate candidates to the endolysosomal system.
Taken together, these data highlight that quantitative degradomic investigation of FFPE is feasible. Moreover, innovative strategies, such as machine learning enable the rapid classification of affected cleavage sites according to protease specificity patterns. CONCLUSION Proteolysis as a pivotal posttranslational modification plays a fundamental role in patho-physiological regulation in numerous diseases. While novel "terminomic" approaches have recently enabled the system-wide investigation of native pro- teolytic processing in various kinds of biological materials, the strategy for FFPE specimens has yet to be brought forward. Given that FFPE specimens are the most abundant resource for clinical and biomedical research, the present study reports, for the first time, the use FFPE specimens for protease research, henceforth opening novel avenues to study the role of proteolysis in a clinical setting.