Simple, scalable and ultra-sensitive tip-based identification of protease substrates

Proteases are in the center of many diseases, and consequently, proteases and their substrates are important drug targets as represented by an estimated 5-10% of all drugs under development. Mass spectrometry has been an indispensable tool for the discovery of novel protease substrates, particularly through the proteome-scale enrichment of so-called N-terminal peptides representing endogenous protein N termini. Methods such as combined fractional diagonal chromatography (COFRADIC)1 and, later, terminal amine isotopic labeling of substrates (TAILS) have revealed numerous insights into protease substrates and consensus motifs. We present an alternative and simple protocol for N-terminal peptide enrichment, based on charge-based fractional diagonal chromatography (ChaFRADIC) and requiring only well-established protein chemistry and a pipette tip. Using iTRAQ-8-plex, we quantified on average 2,073 ± 52 unique N-terminal peptides from only 4.3 μg per sample/channel, allowing the identification of proteolytic targets and consensus motifs. This high sensitivity may even allow working with clinical samples such as needle biopsies in the future. We applied our method to study the dynamics of staurosporine-induced apoptosis. Our data demonstrate an orchestrated regulation of specific pathways after 1.5 h, 3 h, and 6 h of treatment, with many important players of homeostasis targeted already after 1.5 h. We additionally observed an early multilevel modulation of the splicing machinery both by proteolysis and phosphorylation. This may reflect the known role of alternative splicing variants for a variety of apoptotic genes, which seems to be a driving force of staurosporine-induced apoptosis.


Introduction
Proteolysis plays a crucial role in maintaining cellular homeostasis by modulating protein function and activity and its dysregulation underlies many diseases such as cancer and Alzheimer (1)(2)(3). Through proteolytic cleavage, novel protein N-termini are generated. The identification of these so-called neo Ntermini is an important step towards understanding which proteins are substrates of a specific protease and revealing regulatory proteolytic networks in health and disease. Moreover, it also allows identifying protease cleavage motifs which is important for developing protease inhibitors or chemical proteomics tools.
As protein N-termini and more importantly neo N-termini are significantly underrepresented in the proteome, specific methods have been developed for the enrichment of N-terminal peptides followed by mass spectrometry (N-terminomics) to enable the system-wide identification of protease substrates and cleavage patterns (4). Several methods, namely COmbined FRActional DIagonal Chromatography (COFRADIC) (5), subtiligase N-terminal labeling and enrichment (6) and later Terminal Amine Isotopic Labeling of Substrates (TAILS) (7) pioneered the field of N-terminomics. COFRADIC and TAILS utilize the specific labelling of protein N-termini (and Lys residues) as an initial step of the enrichment procedure. Upon proteolytic cleavage as part of the common bottom-up proteomics strategy, this labelling allows quantifying N-terminal peptides but also distinguishing them from internal peptides with free Ntermini generated during in vitro digestion. Both methods have been used in numerous studies providing novel insights into proteases and their substrates (1,(8)(9)(10). Nevertheless, likely due to challenges in technical and (particularly in the past) data analysis aspects, the number of labs worldwide applying Nterminomics methods to screen for protease substrates is still limited.
We recently introduced an alternative HPLC-based strategy for N-terminal peptide enrichment, Chargebased FRActional diagonal chromatography (ChaFRADIC) (11). The method depends on the separation of peptides into distinct charge state fractions at pH 2.7, where a peptide's net charge in solution is mainly defined by the number of positively (Arg, Lys, His residues and free N-termini) and negatively (e.g. phosphorylation) charged groups ( Fig.1 a) (12). Though providing a high sensitivity for N-terminomics studies, the protocol requires a dedicated, highly reproducible HPLC system with automatic fractionation, which is associated with great costs and maintenance expenses. We therefore advanced the method into . 51 μL of the resolubilized peptides were separated at a flow rate of 80 μL/min. Peptides were separated with an optimized gradient to efficiently separate different charge states, and fractions were automatically collected using the U3000 fractionation option.
The gradient was as follows: 100% A for 10 min followed by a linear increase from 0% to 15% B in 9.3 min. Afterwards, B was kept at 15% for 8.7 min and then the gradient linearly increased from 15% to 30% B for 8 min. Then B was kept constant at 30% for 11 min, and linearly increased to 100% in 5 min. After 5 min at 100% B, C was increased to 100% in 1 min and kept constant for 5 min. Finally, A was increased to 100% in 1 min and the column was re-equilibrated at 100% A for 20 min. Per replicate, four fractions corresponding to charge states +1, +2, +3 and +4 were collected, as indicated in supplemental figure 1.    3).

Deposition of data
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (31) partner repository with the dataset identifiers PXD005954, PXD006594, PXD006595.

Results
After systematic optimization of the SCX tip conditions, namely (i) preparation of the tips including frit type/design and (ii) SCX beads to peptide ratio, as well as for the individual (elution and washing) steps of the protocol: (iii) buffer compositions and combinations, (iv) volumes, and (v) centrifugation conditions, the final protocol allowed us to reproducibly separate unlabeled (Fig.1 b) and iTRAQ-labeled ( Fig.1 c) tryptic peptides. The obtained fractions were on average 85-90% enriched in distinct charge states +1, +2, +3 and +4. The fractionation is scalable (Fig.1 b), can be applied to complex cell lysates and purified proteins and allowed us to completely transfer the originally HPLC-based N-terminal ChaFRADIC (11) enrichment to the simplified ChaFRAtip setup (Fig.1 d). This comes along with a substantial reduction of time and costs, as dedicated and reproducible HPLC instrumentation is no longer required and samples can be processed in parallel.
We wondered whether ChaFRAtip can provide results comparable to the original HPLC-based setup, but at a fraction of the costs and effort. Therefore, we cultivated SH-SY5Y cells in biological triplicates and treated each replicate either with vehicle or staurosporine to induce apoptosis. The resulting six samples (100 μg of protein each, based on BCA) were individually labeled with iTRAQ-8plex reagents on the protein level, multiplexed, enzymatically digested with trypsin and separated into six aliquots of 100 μg peptide each. Three aliquots each were used for HPLC-based ChaFRADIC and ChaFRAtip, followed by nano-LC-MS/MS of 2/3 per enriched fraction (Fig.2 a). Indeed, we achieved a similar level of technical reproducibility for both methods (Fig.2 b) and quantified 975±24 (HPLC) and 783±71 (tips) unique N-terminal peptides at 1% FDR. iTRAQ-based quantification was highly reproducible across technical and biological replicates as well as both methods (supplemental table 1). Significantly differential neo Nterminal peptides clearly point to the expected caspase activity (Fig.2 c) and the identified substrates are involved e.g. in splicing, protein folding, translation and transcription (Fig.2 d). Thus, ChaFRAtip enables highly sensitive N-terminomics with similar performance as the much more costly HPLC-based workflow.
We decided to illustrate the power of the ChaFRAtip method by studying the dynamics of staurosporine- replicates. We furthermore determined the inter (between all six technical replicates) and intra (between averages of biological replicates) relative standard deviations (RSD) from our quantitative data presented in supplemental table 2 (sheet "i -summary raw") and obtained average RSDs of 5% (inter) and 9% (intra).
After 1.5 h of staurosporine treatment 81 protease substrates were identified, followed by 109 and 30 additional substrates after 3 h and 6 h, respectively (supplemental tables 2 and 3). These substrates clearly confirm the expected caspase activity and furthermore an orchestrated regulation of specific pathways ( Fig.3 b). variants for a variety of apoptotic genes (32,33), which seems to be a driving-force of staurosporineinduced apoptosis.

Discussion
Our novel ChaFRAtip approach allows the ultra-sensitive identification of proteolytic targets and consensus motifs on a proteome-wide scale. The protocol is straightforward, low cost, and requires only minimal equipment that is standard in proteomics laboratories. Therefore, the protocol should be easily transferrable to other laboratories around the globe and may facilitate the generation of novel and expansion of existing knowledge on proteases, their targets and dynamics. As tip-based procedure, ChaFRAtip further has potential for automation using liquid handling systems for the purpose of parallelized high throughput label-free quantitative N-terminomics. As demonstrated, the protocol can be adapted to different amounts of starting material and also works with further chemical labeling strategies such as dimethyl or TMT labeling (data not shown). Moreover, it can be further adapted for straightforward and sensitive protease specificity profiling using peptide libraries (34).  (35). Nevertheless, given state-of-the-art LC-MS equipment and sampling protocols, the presented workflow might even work with as little as 5 μg of starting material, although lower numbers of quantified N-terminal peptides and higher technical variations may be expected partially due to inevitable losses during lysis and the following sample preparation steps. Nevertheless, combined with optimal sampling strategies and standardization, the here presented sensitivity may even allow Nterminomics of needle biopsies. We also analyzed our data for potential physicochemical biases that may arise from the enrichment as this is not uncommon for PTM-enrichment procedures (36). We therefore compared the 3,086 unique identified N-terminal peptides identified in this study to N-terminal peptides in the Peptide Atlas (October 2017, 1,222,862 peptide entries in total), as described in the Supplemental Methods. We could only find slight differences in peptide length and net charge state distributions that, however, can be attributed to the use of iTRAQ labeling (37) rather than our ChaFRAtip enrichment (see Supplemental Figures 3-6).
The combined analysis of proteome, phosphoproteome and N-terminome is particularly interesting as the complex interplay between multiple post-translational modifications (38)(39)(40) but also between different biomolecular species (41) becomes increasingly evident. It has been shown that proteases are regulated by phosphorylation (42), whereas kinases can be regulated by proteolytic cleavage (43). So-called phosphodegrons directly regulate degradation of protease substrates (44). Highly sensitive approaches allowing the multi-level study of patient samples will be an important step towards precision medicine, as it may allow identifying novel disease mechanisms (25), drug targets and biomarkers for therapy control.