Expanding Proteome Coverage with Orthogonal-specificity α-Lytic Proteases*

Bottom-up proteomics studies traditionally involve proteome digestion with a single protease, trypsin. However, trypsin alone does not generate peptides that encompass the entire proteome. Alternative proteases have been explored, but most have specificity for charged amino acid side chains. Therefore, additional proteases that improve proteome coverage through cleavage at sequences complementary to trypsin's may increase proteome coverage. We demonstrate the novel application of two proteases for bottom-up proteomics: wild type α-lytic protease (WaLP) and an active site mutant of WaLP, M190A α-lytic protease (MaLP). We assess several relevant factors, including MS/MS fragmentation, peptide length, peptide yield, and protease specificity. When data from separate digestions with trypsin, LysC, WaLP, and MaLP were combined, proteome coverage was increased by 101% relative to that achieved with trypsin digestion alone. To demonstrate how the gained sequence coverage can yield additional post-translational modification information, we show the identification of a number of novel phosphorylation sites in the Schizosaccharomyces pombe proteome and include an illustrative example from the protein MPD2 wherein two novel sites are identified, one in a tryptic peptide too short to identify and the other in a sequence devoid of tryptic sites. The specificity of WaLP and MaLP for aliphatic amino acid side chains was particularly valuable for coverage of membrane protein sequences, which increased 350% when the data from trypsin, LysC, WaLP, and MaLP were combined.

The most powerful technique for system-scale protein measurement, or proteomics, is mass-spectrometry-based proteomics (1). Although great progress has enabled the quantification of nearly all proteins expressed in yeast (2,3), sequence coverage is often dismal, with some proteins being identified by a single peptide sequence. Complete amino acid coverage is valuable for comprehensive profiling of post-translational modifications (e.g. phosphorylation) and for quantification of splice variants. Low observed proteome coverage can be caused by several factors, including the wide dynamic range of protein concentrations in biological samples, splice variants, and unanticipated or unconsidered post-translational modifications (PTMs). 1 Improvements to every step of the bottom-up proteomics workflow continue to increase the observable proteome.
Because of length constraints that limit observable peptides, proteome coverage is ultimately limited by proteome digestion. Typically, identifiable peptides are between 7 and 35 amino acids in length, with the lower limit being determined by sequence uniqueness and the upper limit being determined by the instrument's resolving power (4). In silico proteome digestions predict that nearly one-quarter of peptides generated from tryptic digestion of the Saccharomyces cerevisiae proteome will be only a single amino acid long. Sequences lost due to length overall result in a theoretical upper proteome coverage limit of 68.8% according to in silico predictions (supplemental Fig. S1).
Recently, several groups have demonstrated that combining data from separate protease digestions improves proteome coverage (4 -7). Improved peptide yield was also shown, allowing proteome analysis of small-quantity samples from laser-capture microdissection (8,9). Swaney et al. used trypsin, Lys-C, Arg-C, Glu-C, and Asp-N to double the observed S. cerevisiae nonredundant amino acid coverage from 11.9% to 25.5% (4).
Other proteases that are used in proteomics to complement trypsin mainly cleave at ionic amino acid side chains, and it would be useful to have proteases with additional, complementary specificities. Here we demonstrate the application of wild-type ␣-lytic protease (WaLP) (10) and an active site mutant of WaLP, M190A ␣-lytic protease (MaLP) (11), to proteome digestion for shotgun proteomics. Both were reported to have specificity for cleaving after aliphatic side chains, which are more common amino acids. WaLP is a serine protease secreted from the soil bacteria Lysobacter enzymogenesis (10,12) and has been studied extensively via mutagenesis and biophysical methods (11). WaLP has been found to exhibit remarkable stability (13,14).
Non-tryptic peptides are more difficult to identify than tryptic peptides, especially when lacking defined termini (i.e. from semi-specific protease digestion or endogenous peptides) due to increased database search space and less predictable ionization and fragmentation. A lack of defined termini drastically increases database search space because more possible peptides fall within the precursor tolerance and drive up false positive rates (15). The majority of tryptic peptides have one positive charge localized at each terminus in a ϩ2 precursor charge state upon electrospray ionization, which results in well-characterized fragmentation by collision-induced dissociation (CID) (16,17). Non-tryptic peptides, in contrast, may lack positively charged side chains (i.e. Arg, Lys, His) altogether, making it unlikely that multiple charges will be obtained upon electrospray ionization. Those that do contain positive charges away from the C terminus produce less predictable fragmentation upon CID. Recently, additional peptide fragmentation methods have become accessible, such as electron-transfer dissociation (ETD) (18), which produces fragment ion series that are less dependent on peptide sequence, and higher-energy collisional dissociation (HCD) (19). An in-depth comparison of activation methods for nontryptic peptide identification has been published recently by Smith's lab. In that report the authors evaluated FT-CID, FT-ETD, and FT-HCD for sequencing peptides isolated from blood plasma (20).
To enable application of the ␣-lytic proteases with specificity for aliphatic amino acid side chains to shotgun proteomics, we address the above issues by comparing multiple fragmentation modes in combination with the peptide identification algorithm MS-GFDB, which easily learns scoring parameters from an initial set of annotated peptide-spectrum matches for arbitrary fragmentation methods and proteases (21). We analyzed standard protein mixtures and complex Schizosaccharomyces pombe proteomes digested with trypsin, LysC, WaLP, and MaLP. Specifically, we assessed ion activation methods, observed peptide character, and biological gains due to additional digestions. The results present the pros and cons of using orthogonal proteases in proteomics.
Protease Expression and Purification-WaLP was expressed from Lysobacter enzymogenesis type 495 using Bachovichin's media supplemented with minimum Eagle's medium vitamins and 60 g/l sucrose. L. enzymogenesis was grown at 30°C with shaking at 100 rpm for 3 days. MaLP was expressed as described previously (23) in D1210 E. coli using the pALP12-⌬M190A plasmid, which was the generous gift of Dr. Dave Agard. Both proteases were purified from the culture supernatant as described previously (24). Briefly, the protease is captured from the supernatant by means of batch binding on SP-Sepharose, which is washed extensively and then eluted with high-pH glycine buffer. After buffer exchange to pH 7.2, the enzyme was loaded by superloop onto an FPLC monoS column using a gradient of 10 mM NaHPO 4 , pH 7.2, to the same buffer containing 250 mM NaOAc over 1 h.
In-gel Digestion-To test the suitability of WaLP and MaLP for in-gel digests, we obtained a sample of glucose transporter-5 (Uni-Prot accession number P22732) that was expressed in Pichia pastoris and then deglycosylated with PNGase F. After SDS-PAGE, the band was excised and subjected to in-gel digestion separately with trypsin, WaLP, or MaLP according to established protocols. The resulting peptides were analyzed with a 5600 TripleTof (AB Sciex, Framingham, MA) interfaced with a NanoAcquity UPLC (Waters, Inc., Milford, MA). Peptides were separated with a 1-h linear gradient from 5% to 80% mobile phase B at a flow rate of 250 l/min using a charged-surface hybrid C18 column (75-m inner diameter by 20-cm length, 2.5-m particles; Waters). Mobile phase A was 98% water, 2% ACN, 0.1% FA, and 0.005% TFA, and mobile phase B was100% ACN, 0.1% FA, and 0.005% TFA. Precursor spectra (400 -1250 m/z) were collected for 0.25 s, and then MS/MS (50 -2000 m/z) of up to 50 of the most intense charge ϩ2, ϩ3, and ϩ4 precursors was conducted for 2.4 s. The minimum intensity for MS/MS selection was 150 counts. Precursors were dynamically excluded for 4 s. The data were analyzed with Protein Prospector as described below.

␣-Lytic Proteases Improve Proteome Coverage
Proteome Preparation and Digestion-S. pombe cell lysates were a generous gift from Dr. Paul Russell. S. pombe cells were lysed using a bead mill in 50 mM Tris-HCl pH 8.0, 150 mM NaCl, 5 mM EDTA, 10% glycerol, 50 mM NaF, 0.1 mM Na 3 VO 4 , 0.2% Nonidet P-40. Lysates were clarified at 15,700 ϫ g for 10 min and the supernatant was removed. Insoluble material from the lysate was re-extracted according to a non-SDS compatible protocol, combined with the soluble material, and precipitated via chloroform/methanol extraction as described previously (25). Protein precipitates were resuspended in 100 mM Tris pH 7.2 containing 1.0% SDC, reduced with 5 mM tri-carboxyethyl phosphine at 60°C for 30 min, and alkylated with 10 mM NEM at room temperature for 1 h. Tri-carboxyethyl phosphine and NEM were then removed via ultrafiltration with a 10-kDa-cutoff Amicon-4 (EMD Millipore, Billerica, MA) with three 10-fold buffer exchanges into 100 mM Tris pH 7.2 containing 0.1% SDC. The alkylated S. pombe proteome concentration was determined using a BCA assay (Pierce Chemicals, from Thermo Scientific). Samples (150 g) of S. pombe proteome were separately digested with trypsin, LysC, WaLP, or MaLP at a ratio of 1:100 for 24 h at a total protein concentration of 0.5 mg/ml, and SDC was removed by acidification with 5% FA, extracted with ethyl acetate, and purified by SepPak C18 (Waters) purification as described previously (26,27).
MS Activation Comparisons-A series of analyses of mixtures of known proteins and of unseparated proteome digests were performed in order to determine the best activation parameters for the MS/MS runs. For these experiments, 0.65-g samples of S. pombe proteome digest were resuspended in 5 l of 0.1% FA and injected onto a trap column (Waters Symmetry, 180-m inner diameter by 20-mm length, 5-m C18 particles) equilibrated in 0.2% TFA using a Waters NanoAcquity autosampler and binary solvent manager. A 100-m inner diameter by 15 cm column (packed in-house) containing 3-m Magic C18 AQ particles was used for peptide separation using a 2.5-h gradient of 2% to 30% B (0.2% TFA in 90% ACN) at a flow rate of 0.6 l/min. The total run time was 1.5 h for the standard protein mix and 3 h for the S. pombe digests, including column flush and re-equilibration. Eluting peptides were electrosprayed at 2.7 kV using a Proxeon Nanospray Flex Ion Source interfaced to LTQ-Orbitrap Velos hybrid mass spectrometer (Thermo Fisher, Waltham, MA) using a precursor scan from 350 -1400 m/z and a target resolution of 30,000 in profile mode. Unassigned and ϩ1 precursor charge states were excluded, and dynamic exclusion was enabled for 45 s allowing one repeat and using sequential activation of the top five precursors using CID, then ETD, then HCD with the FT mass analyzer. The scan rate for this experiment was 1.2 spectra per second. Additional experiments were performed in which the top 10 precursors were sequentially targeted with CID and then ETD using the ion trap as the mass analyzer and in which the top 10 precursors were targeted using a data-dependent decision tree (ddDT) approach (28) to activate all ϩ2 precursor charge states with CID and all ϩ3 or greater precursor charge states with ETD. As expected, the faster scan rate of the ion trap yielded more peptide identifications than data from the higher resolution FT mass analyzer. The results from these experiments demonstrated the utility of the ddDT approach, which was then used to analyze the fully separated S. pombe proteome digests.
High-pH Fractionation of Proteome Samples-Peptide fractionation via high-pH reverse-phase (HPRP) was performed as described previously (29). Briefly, lyophilized peptides were resuspended in 1.15 ml of 20 mM ammonium formate (NH 4 HCO 2 ), pH 10 (HPRP mobile phase A). HPRP buffer B was 80% ACN with 20% 20 mM NH 4 HCO 2 , pH 10. Peptides were separated over a 100 ϫ 2.1 mm Waters C18 BEH column (5-m particles) maintained at 40°C. Samples (1.05 ml) were loaded at a flow rate of 0.5 ml/min over 7 min in 98% A, and peptides were eluted with a gradient from 2% to 100% B over 27 min. Fifty-four 0.5-ml fractions were collected into 100 l of 10% FA, and fractions were pooled according to the method of Smith's lab to yield 18 final pooled fractions that were lyophilized and stored at Ϫ80°C until nano-LC-MS/MS analysis (29).
Nano-LC Electrospray Ionization MS/MS of HPRP-fractionated Digests-Each pooled HPRP fraction was resuspended in 75 l of 0.1% FA. Five microliters (ϳ0.5 g/fraction) were injected into the LTQ-Orbitrap Velos hybrid mass spectrometer as described above, but with a 60-min gradient from 2% to 30% B followed by column re-equilibration, for a total of 90 minutes per run. For these experiments, a ddDT (28) was used to activate all ϩ2 precursor charge states with CID and all ϩ3 or greater precursor charge states with ETD. The total nano-LC-MS/MS acquisition time was 27 hours per protease, or 4.5 days total. Supplemental Table S1 contains a list of all of the experiments.
Database Searches-Files (.RAW) were converted to .mzXML files using the default parameters in msconvert.exe except for the option to centroid all spectra (version 3.0.4323, February 5, 2013) within Trans-Proteomic Pipeline (version 4.6.2) (30,31). The standard protein mix data (CID/HCD/ETD triples, high resolution) were searched with Protein Prospector against the E. coli subset of Swiss-Prot (March 21, 2012 version) with the sequences for each standard mix protein and protease added because the number of spectra was insufficient to properly train MS-GFDB. The database contained a total of 22,934 real and 22,934 randomized sequences comprising all E. coli strain sequences (45,868 total protein sequences) to allow estimation of the false discovery rate (FDR). Data from the unseparated S. pombe digests (CID/HCD/ETD triples, high resolution) were searched with Protein Prospector against the S. pombe subset of Swiss-Prot (March 21, 2012 version) with accessions for each protease added (4990 real, 4990 randomized, 10,980 total). An initial search was carried out with a 10-ppm precursor tolerance and 15ppm fragment-ion tolerance to calibrate the precursor masses, and this was followed by another search with a 5-ppm precursor tolerance and 15-ppm fragment-ion tolerance. Searches with trypsin and Lys-C data allowed up to three missed cleavages and one non-enzymatic terminus. Searches of WaLP and MaLP data used "no enzyme" specificity. Default variable modifications were used. Searches required the fixed modification of cysteine with NEM. The data on unseparated S. pombe proteome collected as CID/HCD/ETD triples were also searched with MS-GFDB version 7780 (21) against common contaminants and the S. pombe complete proteome containing a total of 5099 real and 5099 reversed sequences (downloaded from UniProt on June 20, 2012) using the merge search for comparison of the amount of internal ions. The comparison between Protein Prospector and MS-GFDB searches revealed that for WaLP and MaLP, Protein Prospector gave similar numbers of unique peptides (supplemental Table S2).
Data from the fully separated proteome analyses were converted to .mzXML and merged using mzXMLmerge to make database searching and downstream analysis more manageable. The merged .mzXML files were searched with MSGFplus.jar version 9352 (released on February 4, 2013). 2 MS-GFDB is a database search engine that reports rigorous p values (spectral probabilities) for spectral interpretations based on all possible peptide match scores (21). The key advantages of the MS-GF algorithm are that it is highly effective in utilizing spectral evidence, the spectral interpretations are rigorously scored, and the scoring algorithm can be re-trained using large datasets of annotated spectra (32). MS-GFDB extends MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.). This aspect was particularly important for efficient spectral interpretation of data from nontryptic digests. MSGFϩ is a successor of MS-GFDB that additionally allows input of mzml data and produces mzIdentML output files. 2 Database searches used default parameters, except the number of tolerable enzymatic termini was set to 1 and searches of MaLP and WaLP used "no enzyme" specificity. Searches required fixed modification of cysteine by NEM and variable modification at peptide N-terminal Q to pyro-glutamate, protein N-terminal methionine loss plus acetylation, and methionine oxidation. Precursor masses containing between 0 and 2 13 C were considered. For all MS-GFDB searches and all MS-GFϩ searches, the precursor mass tolerance was set to 5 ppm. After initial searches of each activation method alone, the scoring parameters were trained and the data were re-searched with the new scoring model. Only the MS-GFDB search engine was used for the large datasets because it was faster than Protein Prospector.
In order to quantitatively compare internal ions produced from peptide activation by HCD, we first used sequences identified by merged searches with MS-GFDB of S. pombe CID/ETD/HCD triples. The merged searches afforded HCD spectra that were insufficient in themselves for peptide identification. To identify internal ions in the HCD spectra, all possible internal ions from the identified peptide sequence were predicted using an in-house program created in [R]. The raw HCD spectrum corresponding to the matched peptide was then searched for the presence of each internal ion. A similar analysis was done on the ddDT spectra to determine the presence of internal ion peaks in the CID spectra from this larger dataset. All peptidespectrum matches from ddDT spectra were analyzed for the intensity and presence of b-, y-, c-, z-, and internal ions. The b-ion count included b-H 2 O and b-NH 3 if the peptide sequence contained serine/ threonine or asparagine/glutamine, and similarly losses were included in the y-ion and internal ion counts. Intact precursor ions and neutral losses from the precursors were removed from ETD spectra using the msconvert.exe ETD filter before computing c-and z-ions and c-1 and zϩ1 ions were included in the c-and z-ion values. The ions were quantified both as a percentage of the total ion current (TIC) and as a percentage of all MS/MS peaks in the spectrum. The fraction of peptide backbone breaks, defined by the presence of a b or y fragment ion corresponding to a break in the peptide backbone, was calculated according to Ref. 33.
Data Analysis-MS-GFDB search output from the activation comparison experiments was filtered to Ͻ1% peptide-level FDR. Proteome coverage was calculated using the Proteome Coverage Summarizer from PNNL using only peptides with a Ͻ1% peptide-level FDR as calculated by PeptideProphet (34). Protein identifications were by ProteinProphet with the default parameters (35). Proteome coverage was calculated using the Proteome Coverage Summarizer from PNNL using only peptides with a Ͻ1% peptide-level FDR as calculated by PeptideProphet. Euler diagrams were generated using eulerAPI. Additional analyses were carried out using in-house scripts written in [R] (36); these have been made available online at GitHub as "PepsuM." Protease specificity heatmaps were generated using only unique peptide sequences from PeptideProphet output. Transmembrane proteins were predicted from all identified proteins using TMHMM (37). The peptide sequences were analyzed using iceLogos (38).

RESULTS
Protease Activity in SDC, SDS, and Guanidine Hydrochloride-Previous studies on WaLP indicated that it possessed remarkable stability (39). As this property may provide advantages for the digestion of proteome samples under various solution conditions, we performed protease activity assays in various proteomic digestion conditions to assess the versatility of WaLP and MaLP compared with trypsin, LysC, and chymotrypsin. In every condition, the activity of WaLP and MaLP was similar to or greater that of trypsin; however, chymotrypsin showed higher activity than WaLP in urea and guanidine (Table IA). Strikingly, however, chymotrypsin activity decreased markedly over time under typical proteomic digestion conditions, whereas the activity of WaLP and MaLP remained high (Table IB). These results suggest that for the digestion of complex proteomes requiring several hours of digestion, WaLP and MaLP may be superior to chymotrpysin and may provide a reason for our inability to find reports of complex proteome digestions utilizing chymotrypsin.
Coverage of Standard Protein Mixture-A standard protein mixture digested by various proteases was analyzed via FT-CID/ETD/HCD to determine proteome coverage for comparisons to recently published results (22). Digestion of these simple standard protein mixtures gave relatively high protein sequence coverage regardless of the protease (trypsin, LysC, ␣-Lytic Proteases Improve Proteome Coverage chymotrypsin, elastase, WaLP, or MaLP) (Table II). Relative to all others, chymotrypsin yielded slightly longer peptides on average, whereas elastase yielded slightly shorter peptides. The average lengths of peptides generated by WaLP and MaLP were similar to those generated by trypsin. Similarly high protein sequence coverage was obtained when WaLP and MaLP digest data were combined with trypsin and Lys-C as compared with a recent report using combined data from trypsin, Lys-C, Glu-C, Asp-N, chymotrypsin, and Arg-C digests of a similar mixture (Table III).
Comparison of Tryptic and Nontryptic Peptide Identification Using CID, ETD, and HCD-Peptide fragment ion series depend on the peptide amino acid sequence and the activation method used to induce fragmentation (40). Tryptic peptides, which bear at least one positive charge at each terminus, produce strong b-and y-ion series upon activation with CID or HCD. Because peptides from WaLP and MaLP digestion lack such defined charge character, we used the versatile fragmentation ability of the LTQ-Orbitrap Velos (41) equipped with ETD to assess the identification efficiency of nontryptic peptides by CID, ETD, and HCD. First, we analyzed data from total S. pombe digests in which peptides identified in MS1 were sequentially activated by CID, ETD, and HCD to compare the results for aLP digests to those for trypsin digests. This analysis revealed some challenges related to the fact that WaLP and MaLP generate nontryptic peptides and cleave after several different amino acid residues. Out of the three FT-measured MS/MS activations, FT-HCD was most efficient for the identification of tryptic peptides, and the overlap between peptides identified by all three activation methods was  high (73%) (Fig. 1). FT-CID and FT-HCD performed similarly for the identification of nontryptic peptides from WaLP digestion, and the overlap was considerably lower (65%). The greatest overlap of unique identifications was for peptides from Lys-C, with 85% of unique sequences identified by all three activations. Supplemental Fig. S2 shows CID, ETD, and HCD spectra for the same peptide from the WaLP digestion. The CID spectrum contains a significant number of peaks due to losses of water and ammonia, and the HCD spectrum contains internal fragment ions, both of which are known to increase spectral complexity (42,43), resulting in lower peptide-spectrum match scores for peptides that do not have well-defined terminal residues. ETD resulted in abundant charge-reduced precursors along with a low-intensity series of c-and z-sequence ions (supplemental Fig. S2C). ETD contributed a greater percentage of nonoverlapping peptide identifications for WaLP and MaLP than for trypsin or LysC, consistent with our previous study comparing CID and ETD for nontryptic peptides from elastase and pepsin (44).
We re-searched these spectra using a merged search protocol in MS-GFDB. This approach resulted in more peptide identifications within a 1% FDR. Merged searching resulted in only marginal improvements in the number of identifications of peptides from the trypsin and LysC digests (5% increases), but greater improvements (supplemental Table S2) were obtained for the samples from MaLP and WaLP digests. This is not surprising, as these searches were run without enzyme specificity and thus had the most to gain when capitalizing on the CID/ETD/HCD complementarity in searches over a larger search space. The merged search data also allowed us to analyze the HCD spectra that were not sufficient for peptide identification in the absence of additional information from ETD and/or CID (56% of the triples). We first analyzed all of the HCD spectra for identified peptides to determine the percentage of the TIC contributed by internal ions (7.2% for trypsin and 7.4% for LysC versus 9.2% for WaLP and 9.1% for MaLP). A statistically significant greater percentage of the TIC was contributed by internal ions from WaLP and MaLP digests than from trypsin digestion (Student's t test, p val-ues Ͻ 10 Ϫ10 ). We next analyzed the fraction of MS/MS peaks attributable to internal ions. In this analysis, only peptides from the WaLP digest yielded a statistically significant increase in the fraction of MS/MS peaks attributable to internal ions (10.3%), whereas the relative number of internal ion peaks from the MaLP digestion (8.4%) was similar to that from trypsin digestion (8.7%), and peptides from LysC produced slightly fewer internal ion peaks (7.9%). Using all peptides identified from the four separate digestions, we examined the cross-correlation of internal ion abundance with the presence of each of the 20 amino acids. The abundance and presence of internal ions was positively correlated with the presence of residues A, D, G, I, L, P, and V. Interestingly, only arginine was found to negatively correlate to internal ions.
We next used data from the CID/ETD activation comparison to determine branch points for a ddDT that targeted precursors for CID or ETD based on precursor charge state and m/z. A ddDT targeting ϩ2 charge-state precursors with CID and Նϩ3 charge-state precursors with ETD was implemented similarly to the manner described in previous reports (28). The total run time of the ddDT method was only 1.5 h, half that of the other activation comparison runs. Use of this ddDT afforded more unique peptide identifications from WaLP digestion than even the best 3-h activation experiment (i.e. 2544 from 1.5-h ddDT versus 2358 from merged search of ion trap CID/ETD). Use of the ddDT for tryptic peptides resulted in nearly as many peptides as ion trap CID in half the acquisition time (i.e. 4195 from 1.

5-h ion trap ddDT and 4576 from 3-h ion trap CID).
Characterization of the MS/MS Data from WaLP and MaLP Digests-S. pombe proteome samples digested separately by trypsin, LysC, WaLP, or MaLP were separated off-line using HPRP (29), and each fraction was analyzed using a 90-min nano-LC run with the ddDT method. Over 200,000 MS/MS spectra were collected for each sample and searched with MSGFϩ followed by training and re-searching. At a peptidelevel FDR of less than 1%, similar numbers of peptides were identified from the trypsin, LysC, WaLP, and MaLP digests; 17,810 and 26,747 peptides were identified from the WaLP a We analyzed the same trypsin digestion twice to ascertain how much additional coverage would be obtained by combining the datasets. The gain in proteome coverage by combining data sets is given in the last column in parentheses. The gain from combining two separate trypsin digestions was 10%.

␣-Lytic Proteases Improve Proteome Coverage
and MaLP digests, respectively (Table IV). Even though very similar numbers of spectra were collected for each digest, the number of peptides identified from the WaLP digest was somewhat lower. Several factors might have contributed to this, one being the nontryptic C termini generated by this enzyme.
To better understand the consequences of nontryptic C termini for peptide identification, we analyzed the number of various fragment ions observed in the MS/MS spectra from trypsin, LysC, WaLP, or MaLP digests (Table V). The percentage of the TIC attributable to y-ions was higher for trypsin and LysC than for WaLP and MaLP, and conversely the percentage of the TIC attributable to b-ions was higher for WaLP and MaLP (Fig. 2). The y-ion directing capabilities of C-terminal positive charges have been discussed previously (45), but the impact on large proteome analyses can be appreciated from the results presented here. Indeed, many search engines give higher scores for y-ions than for other ions, which might be part of the reason more peptides were identified from the tryptic digestion than from the WaLP digestion. Another possible reason for lower numbers of peptide identifications from the WaLP digest could be the production of internal ions upon MS/MS. Training the MSGFϩ scoring function for peptides from WaLP or MaLP allowed reasonable identification of nontryptic peptides despite the lower percentage of the TIC attributable to y-ions. In fact, the MaLP digested sample resulted in the greatest number of unique peptides identified (Table IV).

Substrate Specificity of WaLP and MaLP-Previous studies
of WaLP specificity based on chromogenic activity assays revealed high activity toward P1 residues A, V, and M (11).

␣-Lytic Proteases Improve Proteome Coverage
MaLP, which has an active site Met replaced by Ala, was reported to have broadened activity for M, L, and F but similar activity against A and V (11). In order to more fully characterize the substrate specificity, all unique peptide sequences from PeptideProphet were combined to determine the specificity of WaLP and MaLP. The observed specificity of WaLP and MaLP was visualized in heat maps of cleavage position and observed amino acid frequency (Fig. 3). For comparison, the same mapping was done for peptides from trypsin and LysC (supplemental Fig. S3). WaLP cleaved most frequently after T (36%), but also with significant frequency after V (30%), A (27%), S (26%), and M (16%). As reported previously, MaLP had specificity for slightly larger aliphatic amino acids, cleaving most frequently after M (32%), L (26%), F (26%), Y (14%), T (13%), and V (13%). These results show that WaLP and MaLP were somewhat more specific than elastase, which  Table IV (over 17,000 peptides from the WaLP digest and over 26,000 peptides from the MaLP digest). A, WaLP peptides; raw counts of frequency of each amino acid at each position. B, WaLP peptides; counts normalized for the occurrence of each amino acid at each position. C, iceLogo depicting the enrichment and depletion of specific amino acids (relative to the whole proteome) at each position in the WaLP peptides with residues colored according to property (acidic, red; basic, blue; hydrophobic, black; small/neutral, green). WaLP yields peptides with the following P1 (C-terminal) residues: A, 20%; V, 20%; S, 16%; T, 16%; G, 8%; L, 6%. D, MaLP peptides; raw counts of frequency of each amino acid at each position. E, MaLP peptides; counts normalized for the occurrence of each amino acid at each position. F, iceLogo depicting the enrichment and depletion of specific amino acids (relative to the whole proteome) at each position in the MaLP peptides. MaLP yields peptides with the following P1 (C-terminal) residues: L, 24%; F, 13%; V, 11%; A, 7%; T, 7%; I, 6%. The cleavage site is marked by a vertical line in each plot. cleaved after A (43.5%), V (36.5%), I (34.7%), T (30.3%), S (21.4%), L (19.5%), and M (15.7%) (5). Interestingly, MaLP appeared to be able to differentiate between L and I (26% of leucines were found at the P1 position, versus only 8% of isoleucines), which cannot be resolved by mass alone. To follow up this potentially very interesting finding, we measured the ability of MaLP to cleave succinyl-A-A-P-L-pNa versus succinyl-A-A-P-I-pNa. Whereas activity toward the succinyl-A-A-P-L-pNa was high (specific activity of 3.4 ϫ 10 Ϫ3 U/mg, compared with 2.8 ϫ 10 Ϫ2 U/mg for the substrate of choice, succinyl-A-A-P-F-pNa), the activity of MaLP toward succinyl-A-A-P-I-pNa was not observable under identical assay conditions.
Length of Peptides from WaLP and MaLP Digestions-Because of their apparently promiscuous activity, complete digestion with WaLP or MaLP could result in many single amino acids and short peptides. Remarkably, WaLP digestion produces peptides with nearly the same average length as those resulting from trypsin digestion (11.8 Ϯ 3.1 amino acids versus 12.2 Ϯ 4.3 from WaLP and trypsin, respectively) (Fig.  4). MaLP digestion produced slightly longer peptides (12.6 Ϯ 3.7 amino acids). In addition, if nonspecific digestion were resulting in more single amino acids produced, one would expect the yield of amino acids still in peptides that adhered to C18 during solid phase extraction to be less, but this was not the case. Amino acid analysis of peptides from each digest revealed similar total peptide yields from digestion by trypsin, WaLP, and MaLP (Fig. 5). Interestingly, these results suggest that amino acids corresponding to the P1 specificity are depleted. So, for example, the peptides isolated from trypsin digests contain less R and K than the whole proteome and the peptides isolated from WaLP digests contain less A, S, T, and V than the whole proteome. This would make sense if pairs of these residues were cleaved into individual amino acids and not retained as peptides in the experiment. Indeed, in silico trypsin cleavage yields digestion products of which nearly one-quarter would be only a single amino acid (presumably K or R).
Quantitation of Peptide Overlap-Another possible limitation due to proteome digestion by semi-specific proteases is the production of largely redundant sequences with different terminal truncations ("shredding"). We quantified the redundancy in amino acid coverage according to the following relationship: The numerator includes redundancy from chemical modification (e.g. oxidized methionine), overlapping peptides, and identification of multiple charge states. This relationship can be applied to any proteomics experiment to assess the efficiency of converting peptide identifications to covering proteome sequences. Peptides from trypsin digestion were the least redundant, and peptides from MaLP digestion were the most redundant. Redundancy values for trypsin, LysC, WaLP, and MaLP were 1.3, 1.6, 1.7, and 2.0, respectively. The re-  Table IV. Trypsin digestion generated a broader distribution with a higher frequency of shorter peptides. The size distributions of the peptides from the WaLP and MaLP digests were narrower than those for either trypsin or LysC. Colored vertical lines mark the average observed peptide lengths. WaLP digestion produced the shortest mean peptide length of 11.8 Ϯ 3.1 amino acids. Trypsin digestion produced peptides with an average length of 12.2 Ϯ 4.2 amino acids. MaLP and LysC produced slightly longer average peptides with lengths of 12.6 Ϯ 3.7 and 13.5 Ϯ 4.7 amino acids, respectively. The average lengths of the observed peptides were all remarkably similar.
FIG. 5. Quantitative amino acid analysis of undigested (black), trypsin-digested (blue), WaLP-digested (orange), and MaLP-digested (purple) S. pombe proteome after purification on C18 to remove single amino acids and undigested proteins. The results show that each protease yielded similar total amounts of peptides with similar amino acid compositions, but some interesting differences in amino acid content were also discovered (discussed further in the text).
␣-Lytic Proteases Improve Proteome Coverage dundancy for combined data was 2.7. Such high redundancy is expected to be useful for high ion coverage that would facilitate site localization of PTMs (46).
Biological Gains from WaLP and MaLP Digestions-The central aim of this study was to improve proteome coverage. Compared with data from only trypsin, the combined proteases increased protein identifications by 24% and proteome coverage by 101% (Table IV). Such gains were significantly greater than those afforded from re-injection of tryptic peptides, which increased proteome coverage by only 10%.
One possible gain from the increased proteome coverage would be in PTM identification. Although the samples were not enriched for phosphorylation, we re-searched the fully separated S. pombe proteome data allowing for variable phosphorylation of S and T to look for these PTMs. Indeed, the complementary amino acids covered from WaLP digestion allowed the observance of 95 serine and threonine phos-phorylations, 63 of which had not been previously reported in UniProt. Similarly, 77 S/T phosphorylations were identified from the MaLP digest, 57 of which had not been previously reported (the assignments were made at the peptide level, not the site level). A particularly illustrative example of the improved coverage of phosphorylation sites was observed for the protein MPD2. The WaLP digest contained three phosphorylated peptides from MPD2, one corresponding to the previously reported phosphorylation of S175 and two novel sites at S223 and S750 (Fig. 6). It is clear from the sequence of the protein that S175 is located between two basic residues and would result in a tryptic peptide 17 amino acids long. The tryptic peptide covering S223 would be 75 amino acids in length, and the tryptic peptide covering S750 would be only 5 amino acids in length.
We also wondered whether proteases that cleave aliphatic residues might increase the coverage of membrane protein FIG. 6. A, sequence of MPD2 showing S175 (yellow), a previously reported phosphorylation site, and S223 and S750 (red), phosphorylation sites that had not been reported before. B, annotated spectrum from the ϩ3 charge state precursor of the peptide TGTApSPKLGSPFNHINRPV fragmented by ETD. C, annotated spectrum from the ϩ2 charge state precursor of the peptide TLQQPQRAGpSDTFPDLNTS fragmented by CID. D, annotated spectrum from the ϩ3 charge state precursor of the peptide ALKpSPLIKKNIQQA fragmented by ETD. The peptide mass information is given in supplemental Table S3, and the complete tables of assigned ions for each peptide are given in supplemental Fig. S4. sequences. Out of all 3555 protein groups identified, 244 (6.9%) were predicted to have three or more transmembrane helices. Sequences from these proteins were preferentially enriched in the gained coverage, with increases of up to 350% for very hydrophobic sequences (Fig. 7). Peptides from MaLP digestion were the greatest individual contributor to these gains, as can be seen by the observation that the percent proteome coverage did not decrease with transmembrane helix content nearly as dramatically as for the other proteases (Fig. 7A). Because in-gel digestion is sometimes used to digest membrane proteins, we tested the suitability of WaLP and MaLP for in-gel digestion on glucose transporter 5 (UniProt accession number P22732). Trypsin digestion yielded 36% coverage, WaLP yielded 84% coverage, and MaLP yielded 50% coverage. The combination of data covered 88% of the target protein sequence. A plot of sequence coverage versus hydrophobicity shows that peptides from WaLP and MaLP digests are almost solely responsible for coverage of the transmembrane segments (Fig. 7C).

DISCUSSION
The use of alternative proteases has the potential to expand proteome coverage, affording gains in PTM coverage as well as the identification of splice variants. In this work, we explored the utility of two proteases that have not been used for proteomics before, WaLP and MaLP. These proteases retain activity in harsh denaturing conditions. They improve the coverage of an in-gel digested protein. Combining data from WaLP, MaLP, trypsin, and LysC results in nearly 100% coverage of protein sequences in standard mixtures. Thus, WaLP and MaLP digestion will likely prove to be useful for increasing the coverage of protein sequences in proteomics, particularly when increased coverage is required for a targeted experiment or when appropriate tryptic cleavage sites are not present.
One possible advantage of WaLP and MaLP is that they cleave at aliphatic residues (A, V, T, and S for WaLP; L, F, and V for MaLP). Chymotrypsin, which cleaves after aromatic residues (F, Y, and W), has also been used to expand protein sequence coverage, but we could not find examples of chymotrypsin being used in studies of complex proteomes. WaLP and MaLP retain activity throughout long digestion times, whereas chymotrypsin does not, potentially making WaLP and MaLP better for improving coverage of proteins in complex proteome mixtures.
The fact that WaLP and MaLP cleave at nonpolar residues, however, presented some challenges when they were used in global proteomics experiments. The first challenge was their semi-specific substrate specificity. WaLP and MaLP were shown to cleave after several common nonpolar residues. Compared with termini from elastase digestion reported previously (V (36.5%), I (34.7%), T (30.3%), S (21.4%), L (19.5%), M (15.7%), and even H (9.1%)) (5), WaLP and MaLP (see "Results") were more specific. This semi-specificity would be expected to generate many single amino acids and short peptides, which would not be useful in unique sequence determination for proteomics. Surprisingly, the average length , and MaLP was evaluated for the amount of protein sequence that was covered in relation to how many transmembrane helices were predicted to be in each protein. The data show that MaLP covered a greater amount of sequence that is predicted to be from proteins containing transmembrane helices. B, the fold gain of proteome coverage was evaluated for the trypsin dataset combined with Lys C , WaLP , MaLP , and all four datasets . For the four datasets combined, proteome coverage for proteins with at least four predicted transmembrane helices was increased more than 3-fold. C, plot of the hydrophobicity versus coverage of the glucose transporter-5 sequence when various combinations of proteases are used for the digestion. WaLP and MaLP cover the more hydrophobic regions, whereas trypsin does not.

␣-Lytic Proteases Improve Proteome Coverage
of the peptides identified from digests with WaLP were the same as those from trypsin, and peptides from digestion by MaLP were slightly longer than those obtained with trypsin. Fig. 3 suggests that the substrate recognition preference of WaLP and MaLP extends beyond the P1 position, both before and after the position of cleavage, which is consistent with previous work showing that WaLP recognizes at least four amino acids past the position of cleavage (47). Thus, WaLP and MaLP target more residues for cleavage but apparently recognize a longer sequence motif.
Another challenge of the nonpolar substrate specificity of WaLP and MaLP is the yield of peptide fragment ions that are useful for sequence determination. WaLP and MaLP peptides yield a significantly lower abundance of y-ions (often scored the highest by database search algorithms). Whereas some 20,000 more MS/MS spectra were obtained from the WaLP digest in our ddDT experiment than with trypsin, some 2600 fewer peptides were matched to those spectra. This might partly be due to the need to search databases with "no enzyme" specificity. Indeed, searching tryptic digests with no enzyme for specificity results in 18,520 unique peptide identifications, compared with 21,035. The use of the merged spectra search capability in the MS-GFDB search engine did improve the number of identifications. However, it remains a puzzle why the number of peptides identified from the WaLP digest was lower. It is very encouraging that the number of peptides identified from the MaLP digest was significantly higher despite the lower percentage of y-ions. Because WaLP and MaLP don't cleave at K and R, the resultant peptides contain a higher percentage of these positively charged residues. Although this doesn't seem to have helped in the identification of WaLP peptides, the combination of the higher content of charged residues with the longer average length of the MaLP peptides might have improved the MS2 spectra enough to aid subsequent unambiguous peptide identification.
One striking feature of MaLP specificity is its ability to differentiate I and L, with MaLP preferring to cleave after L. This observation increases the utility of MaLP digestion, because differences between I and L cannot be resolved by mass alone. Another interesting result was that WaLP and MaLP digests avoided the residue-specific depletion of R and K from trypsin digestion (Fig. 5). Thus, WaLP and MaLP are likely to be extremely useful for proteomics analyses of Kand/or R-rich sequences.
Sequences identified from WaLP and MaLP digestion are highly complementary to sequences identified from trypsin and LysC digestions. In comparison with the results of Swaney et al., who doubled proteome coverage relative to trypsin using five separate digestions (4), we achieved double the sequence coverage from only four digestions. The additional coverage is, as expected, beneficial for more comprehensive PTM mapping studies. We show one such example in which two new serine phosphorylation sites were identified in MPD2, neither of which is on a peptide that would have been identified from trypsin digestion. Finally, the nonpolar substrate specificity of WaLP and, particularly, MaLP resulted in a dramatic increase (up to 350%) in the proteome coverage of proteins with transmembrane regions. The increase in proteome coverage from all four proteases relative to only trypsin was found to positively correlate with the minimum number of predicted transmembrane helices. Thus, we expect digestion with WaLP and MaLP will find use in comprehensive PTMmapping studies and especially deeper proteomic analysis of membrane proteins.