MSFragger-Labile: A Flexible Method to Improve Labile PTM Analysis in Proteomics

Posttranslational modifications of proteins play essential roles in defining and regulating the functions of the proteins they decorate, making identification of these modifications critical to understanding biology and disease. Methods for enriching and analyzing a wide variety of biological and chemical modifications of proteins have been developed using mass spectrometry–based proteomics, largely relying on traditional database search methods to identify the resulting mass spectra of modified peptides. These database search methods treat modifications as static attachments of a mass to particular position in the peptide sequence, but many modifications undergo fragmentation in tandem mass spectrometry experiments alongside, or instead of, the peptide backbone. While this fragmentation can confound traditional search methods, it also offers unique opportunities for improved searches that incorporate modification-specific fragment ions. Here, we present a new labile mode in the MSFragger search engine that provides the flexibility to tailor modification-centric searches to the fragmentation observed. We show that labile mode can dramatically improve spectrum identification rates of phosphopeptides, RNA-crosslinked peptides, and ADP-ribosylated peptides. Each of these modifications presents distinct fragmentation characteristics, showcasing the flexibility of MSFragger labile mode to improve search for a wide variety of biological and chemical modifications.


In Brief
Many peptide modifications are labile or fragment during tandem mass spectrometry experiments for proteomics. Database search engines often struggle to identify peptides bearing labile modifications, as the expected masses of fragment ions (with intact modification(s)) do not match the observed spectra. We have developed MSFragger Labile, a flexible set of parameters that incorporate modification fragmentation information into MSFragger search. We show that labile mode can greatly increase the number of modified peptides identified by database search.

MSFragger-Labile: A Flexible Method to Improve Labile PTM Analysis in Proteomics
Daniel A. Polasky 1,* , Daniel J. Geiszler 2 , Fengchao Yu 1 , Kai Li 2 , Guo Ci Teo 1 , and Alexey I. Nesvizhskii 1,2,* Posttranslational modifications of proteins play essential roles in defining and regulating the functions of the proteins they decorate, making identification of these modifications critical to understanding biology and disease. Methods for enriching and analyzing a wide variety of biological and chemical modifications of proteins have been developed using mass spectrometry-based proteomics, largely relying on traditional database search methods to identify the resulting mass spectra of modified peptides. These database search methods treat modifications as static attachments of a mass to particular position in the peptide sequence, but many modifications undergo fragmentation in tandem mass spectrometry experiments alongside, or instead of, the peptide backbone. While this fragmentation can confound traditional search methods, it also offers unique opportunities for improved searches that incorporate modification-specific fragment ions. Here, we present a new labile mode in the MSFragger search engine that provides the flexibility to tailor modification-centric searches to the fragmentation observed. We show that labile mode can dramatically improve spectrum identification rates of phosphopeptides, RNA-crosslinked peptides, and ADP-ribosylated peptides. Each of these modifications presents distinct fragmentation characteristics, showcasing the flexibility of MSFragger labile mode to improve search for a wide variety of biological and chemical modifications.
Posttranslational modifications (PTMs) of proteins play essential roles in defining and regulating protein functions (1). Liquid chromatography-tandem mass spectrometry (LC-MS/ MS) methods have become the preferred method for largescale identification of proteins from biological samples, becoming known as mass spectrometry (MS)-based proteomics (2). The ability of MS to detect and characterize posttranslational and chemical modifications of proteins at wholeproteome scales is currently unmatched and provides crucial insights into the function and regulation of proteins in biological systems. Analysis of PTMs is typically accomplished by broadly similar MS methods as those used for unmodified peptide analysis in proteomics, often with an added enrichment step to concentrate modified peptides of interest prior to LC-MS/MS analysis. Database search methods to identify the resulting tandem mass spectra are also similar to those used for unmodified peptides, as indeed, chemical artifacts and common modifications are nearly always included in such standard proteomics searches. The predominant search strategy for modified peptides assumes that modifications will remain intact on the peptide, simply shifting the mass of the amino acid to which they are connected by a fixed value (3). Fragment ions matching the combined amino acid and modification masses are used to detect modified peptides and localize the modification within the peptide. However, many modifications experience their own fragmentation during tandem MS, causing a mismatch between the expected and observed fragment ions. Some peptides can still be identified, by matching only ions that do not contain the modification site, but many spectra of modified peptides cannot be identified, contributing to the "dark matter" of proteomics (4,5).
Labile modifications, or those that fragment instead of or in addition to peptide backbone fragmentation in tandem MS, are extremely common. The most abundant PTMs, phosphorylation (6)(7)(8) and glycosylation, are labile, along with many less abundant PTMs like sulfation (9) and ADP-ribosylation (10), and many other peptide modifications of interest, including chemical modifications used for chemoproteomics methods (11), ubiquitinylated and SUMOylated peptides retaining a piece of a second peptide after digestion, RNAcrosslinked peptides (12), and many more. As many modifications of interest are labile, several approaches have been devised to identify and localize peptides bearing labile modifications. Electron-based activation methods like electron transfer dissociation (ETD) and electron capture dissociation typically (13), though not always (14), preserve modifications intact even if they are labile in collision activation methods, allowing database search methods to confidently identify modified peptides (15,16). However, these methods come at the cost of reduced acquisition speed (13), leading many groups to continue utilizing collisional activation for PTM searches. Particularly for modifications that are only partially labile, such as phosphorylation, collisional activation remains the most common analysis method. Many traditional search engines allow specified neutral losses from a modification, such as loss of phosphoric acid from phosphorylated peptides (17)(18)(19), which allows for improved search of modified peptides in some cases but lacks the flexibility to handle more complex modifications, such as glycosylation. ProteinProspector additionally allows searching for any modification that is lost entirely during fragmentation (18). For the most part, however, search methods for PTMs either utilize built-in common neutral losses or require bespoke search methods tailored to a particular modification, as is common for glycosylation. In either case, search methods for new or less common modifications are not well supported, as the fragment ions specific to these modifications are not encoded in common search engines. Many search engines have been developed or adapted to search specific modifications, particularly for glycosylation, including modification-specific fragment ions. These PTM-specific search engines and modes are extremely useful, but generally are designed with a particular modification, or set of modifications, in mind, lacking the flexibility to examine new modifications or fragment ions without alterations to the code.
Recently, an alternative approach has emerged in the "open" search method developed for searching for unknown or unexpected peptide modifications (4,5). Open searches, and the similar mass offset or multinotch searches (20), allow searching of peptides with a "mass offset," or a difference between the peptide sequence mass and the observed precursor mass from MS 1 . This allows peptides with unknown (open search) or known (offset search) modifications to be identified. Crucially for searching labile modifications, the peptide sequence is often initially searched without modifications in these methods, as a consequence how fragment-ion indexing approaches are implemented to improve the speed of such searches. As this reduces the sensitivity of the search for nonlabile modifications, a "localization-aware" open search method was recently implemented in MSFragger to allow searching for fragment ions retaining intact modification (21). However, the default open/offset search is a natural fit for identifying labile modifications that are lost completely from the peptide during MS 2 , as searching for peptide fragment ions without the modification is the optimal method in these cases. The MSFragger Glyco search method took advantage of this for searching glycopeptides, dramatically boosting sensitivity by searching for just peptide backbone fragment ions without glycosylation, or with a single predefined monosaccharide remainder (22). Most search engines that support glycoproteomics searches incorporate some or all of these fragments as well, such as Byonic (23), ProteinProspector (18), and MetaMorpheus (20). Like Nglycans, many modifications are not lost cleanly, either leaving behind a piece of the modification, or taking a piece of the peptide with them when leaving, such as the net loss of water from phospho-Ser and Thr residues during neutral loss of phosphoric acid. Modification loss can also produce signature, or diagnostic, fragment ions in the mass spectrum that indicate the presence of a particular modification (7,8,10,(24)(25)(26)(27).
To support a wide range of labile modification searches, we have implemented several new features in MSFragger, referred to collectively as "labile mode" search. Labile mode searches allow filtering of spectra for diagnostic ions when considering a labile modification, as well as specifying peptide or fragment remainder masses resulting from modification fragmentation to improve spectrum identification and localization. While there are existing search engines that use some or all of these ion types hard coded for specific modifications, MSFragger Labile offers these capabilities for any modification by making the fragment ions a set of flexible and customizable parameters. Here, we demonstrate the utility of labile mode searches to increase the spectral assignment rate for phosphorylation, peptide-RNA crosslinking, and ADP-ribosylation datasets. We used our recently described diagnostic ion mining module in PTM-Shepherd to determine the fragment ions to use in labile search, allowing optimal settings to be chosen even for modifications without well-characterized fragmentation pathways (24). Labile search has been incorporated into MSFragger starting with version 3.5 and FragPipe version 18.0, and workflow templates for the labile phosphorylation and ADPribosylation searches described here are available from the workflows menu in FragPipe.

EXPERIMENTAL PROCEDURES
Datasets "RNA crosslinking" data were downloaded from PXD023401, from the report of Bae et al. (12). Briefly, crosslinked RNA-peptide complexes were prepared by photoactivable ribonucleoside labeling and UV crosslinking and analyzed by LC-MS/MS on an Orbitrap Fusion Lumos instrument using higher-energy C-trap dissociation (HCD) activation at 30% normalized collision energy (NCE). Data corresponding to the 4-thiouridine (4SU) crosslinked RNA-peptide complexes were analyzed here. Clinical Proteomic Tumor Analysis Consortium ("CPTAC") phosphoproteomics data from the clear cell renal cell carcinoma cohort were downloaded from the CPTAC data portal. Tandem mass tag (TMT)-labeled phosphopeptides were enriched from tumor and adjacent normal tissue samples, fractionated, and analyzed on an Orbitrap Fusion Lumos mass spectrometer using HCD activation at 37 NCE (28). "Multienergy" phosphoproteomics data were downloaded from PXD004415 and refer to the report of Tran et al., who analyzed unlabeled, enriched phosphopeptides from Jurkat T cells via HCD at two collision energies, 25 and 35 NCE, on an Orbitrap Q Exactive mass spectrometer (29). "AIETD" ADP-ribosylation data were downloaded from PXD017417, in which ADP-ribosylated peptides were enriched using Af1521 macrodomain affinity from HeLa cells treated with H 2 O 2 and analyzed on an Orbitrap Fusion Lumos instrument modified with a CO 2 laser for AIETD experiments (30). "HCD" ADP-ribosylation data were downloaded from PXD004245, where ADP-ribosylated peptides from H 2 O 2treated HeLa cells and mouse liver tissues were enriched using Af1521 macrodomain and analyzed by HCD activation on an Orbitrap Q Exactive instrument at 28 NCE (31).

General Search Settings and Validation
For all searches, raw files were converted to mzML format and centroided using MSConvert, version 3.0.22068, from ProteoWizard (32,33). All searches were performed using MSFragger, version 3.6, Philosopher, version 4.7.0, and FragPipe, version 19.0. All searches used MSFragger's built-in mass calibration, fully enzymatic protein cleavage except for variable N-terminal Met clipping, deisotoping, and default neutral loss removal. Several options are available for validation of labile search results. In all searches reported here, labile modifications from mass offset searches were written to the MSFragger output in the same manner as variable modifications from conventional (nonlabile) MSFragger searches using the "Report mass shift as a variable mod" option, and downstream validation tools were used as in conventional searches. For open searches and searches with many modifications, labile modifications can instead be left as mass offsets (reported as "delta masses" by MSFragger) and the extended mass model in PeptideProphet (34) can be used to model probabilities for each mass offset, as is done in MSFragger Glyco searches (22). For result scoring, Percolator (35) or Peptide-Prophet can be used in FragPipe. Percolator is now supported for open searches from MSFragger following implementation of a delta mass score in FragPipe; however, PeptideProphet is still recommended as the extended mass model provides more rigorous treatment of delta masses. For mass offset labile searches, either Percolator or PeptideProphet can be used. For localization of modifications, PTMProphet (36) is available in FragPipe in addition to the fragment remainder-based localization of MSFragger and can be used instead of or in addition to MSFragger localization. All searches performed here used ProteinProphet (37) for protein inference and performed false discovery rate filtering to 1% peptide-spectrum match (PSM), ion, peptide, and protein levels in Philosopher using the sequential method (38).

RNA-Protein Crosslink Search
4SU data were searched against a reviewed human proteome with decoys and common contaminants added in Philosopher, downloaded 2022/06/13 with 20,420 total entries (including contaminants). Fully enzymatic MSFragger search was performed using strict-trypsin settings (i.e., allowing cleavage before Pro) and allowing two missed cleavages, precursor, and fragment tolerances of 20 and 10 ppm, respectively, mass calibration, fixed Cys carbamidomethylation, and variable modifications of Met oxidation and protein N-terminal acetylation. 4SU crosslinking was searched as both the intact nucleoside (+226.0594 Da) and the base only (+94.0168 Da), as frequent in-source fragmentation of nucleoside was noted by Bae et al. and which we confirmed using the PTM-Shepherd diagnostic mining module (24). For labile search, +94 and +226 were searched as mass offsets on any amino acid, with labile mode active, no diagnostic or peptide remainder ions, and with or without a fragment remainder ion of +94. For the equivalent conventional search, variable modifications of +94 and +226 were specified on all amino acids (each given a max of one per peptide). Additional conventional searches considering (i) only +226 on all amino acids (max one per peptide) and (ii) both +226 and +94 only on the five most commonly modified residues (C, H, W, F, Y) were considered as well but identified fewer RNA-crosslinked peptides than the conventional search described above. PeptideProphet was used for validation with default closed search settings.

CPTAC Phosphoproteomics Search
The clear cell renal cell carcinoma cohort phosphoproteomics data were searched against the same human database as used in the RNA-protein crosslink (RNA-XL) search. MSFragger search was performed with fully enzymatic strict-trypsin search allowing two missed cleavages, precursor and fragment tolerances of 20 and 10 ppm respectively, mass calibration, fixed modifications of Cys carbamidomethylation and TMT-labeling of Lys and peptide N termini, and variable modifications of Met oxidation, Protein N-terminal acetylation, and deamidation (Asn, Gln). A maximum of three total variable modifications was allowed per peptide (including phosphorylation). Up to three phosphorylation events on Ser, Thr, or Tyr were allowed on a peptide, specified as variable modifications (nonlabile search) and/or mass offsets (labile search), with the number of nonlabile and labile phosphorylations always adding up to 3. For labile searches, a fragment remainder ion of −18.01056 was specified, corresponding to neutral loss of phosphoric acid but no diagnostic or peptide remainder ions. Note that phospho-Tyr is either maintained intact during collisional activation or undergoes loss of HPO 3 rather than phosphoric acid (H 3 PO 4 ) (8), despite the fragment remainder ion for phosphoric acid loss being applied to all possible phosphorylation sites in our search. However, Tyr-phosphorylation is usually underrepresented among phosphopeptides, unless the enrichment specifically targets the Tyr-modified sequences. In those cases, this fragment remainder ion should not be specified. Neutral loss of water can occur on other amino acids (not from phosphorylation), potentially resulting in increased scores for phosphopeptides when using this fragment remainder ion. The neutral loss removal option was enabled in MSFragger for all searches and can help reduce this issue as neutral loss peaks that are less intense than the corresponding nonloss peak are removed prior to search. Percolator was used for validation with default closed search settings. PTMProphet was used for final localization of all modifications, using a static 15 ppm fragment tolerance, em model 1, a minimum Pepti-deProphet probability of 0.5, and the "lability" mode enabled.

Multienergy Phosphoproteomics Search
All data from Tran et al. at both HCD settings (25,35) were searched against the same database as CPTAC search, with identical MSFragger and validation settings except for no TMT labeling.

ADP-Ribosylation Searches
Searches of HeLa data used the same human database as the RNA-XL and phosphoproteomics searches. Searches of mouse tissue data used a reviewed mouse proteome with decoys and common contaminants added in Philosopher, downloaded 2022/06/27 with 17,230 total entries. MSFragger searches were fully enzymatic, using strict trypsin settings with the number of missed cleavages aligned to the searches performed in the original analyses: up to four missed cleavages (39) for data from Buch-Larsen et al. or three missed cleavages for data from Martello et al. Precursor and fragment tolerances were 20 and 10 ppm, respectively, and fixed modification of Cys carbamidomethylation and variable modifications of oxidation (Met) and protein N-terminal acetylation were used. A maximum of three total variable modifications was allowed per peptide (including ADP-ribosylation for conventional searches). ADP-ribose (+541.06111 Da) was specified as a mass offset (labile) and/or variable modification (conventional), with a max of one per peptide. For the hybrid search, a total of two ADP-ribosylations were allowed on a peptide (one as a mass offset and one as a variable modification). Amino acids allowed for ADP-ribosylation were aligned to the original analyses: ADP-ribosylation was allowed on Ser, Arg, and Lys for mouse liver tissue data from Martello

Experimental Design and Statistical Rationale
No experiments were performed for this study; all results are derived from reanalysis of previously published datasets (for details, see "Datasets").

RESULTS AND DISCUSSION
Many peptide modifications undergo fragmentation in MS/ MS experiments, changing the expected mass of peptide backbone fragment ions and resulting in poor performance for search methods that do not take these fragments into account. The MSFragger labile search mode provides a flexible approach to use the ions resulting from modification fragmentation in search, allowing spectra to be filtered for diagnostic ions and searched for partial or complete fragmentation of a modification with peptide and fragment remainder ions. In contrast to many existing search engines with modification-specific fragments encoded only for certain modifications, MSFragger labile incorporates fragments of several types as general parameters that can be adjusted to accommodate any modification. To demonstrate the utility of the labile search method, we re-analyzed published data corresponding to three PTMs, each with a distinct fragmentation pattern that is supported by the labile search mode.
An overview of the ion types and methods included in labile mode is shown in Figure 1. MSFragger labile mode search is an extension of the MSFragger search engine's (5) open and mass offset search modes for labile modifications and consists of three main features corresponding to the three types of fragment ions generated by fragmentation of labile modifications (Fig. 1A). First, "diagnostic ions," or peaks MSFragger labile mode workflows. A, modification fragment types available for labile mode searches, including diagnostic ions (partial modification only), peptide remainder ions (intact peptide with partial modification), and fragment remainder ions (partial peptide with partial modification). ADP-ribosylation is the modification depicted in cartoon form (rib = ribose, A = adenine, Ad = adenosine). B, MSFragger search modes. In nonlabile mode, modifications are specified as variable modifications (as in conventional MSFragger searches). In labile mode, modifications are specified as mass offsets and fragments lacking the modification (orange) are searched. If fragment remainder ions (red) are specified, candidates with modification loss ions and fragment remainder ions are compared, with the higher scoring candidate chosen. In hybrid mode, a modification is specified as both a variable modification and mass offset, producing labile and nonlabile candidates. As in the labile mode, the highest scoring candidate is chosen as the search result.
corresponding to a portion of the modification observed on its own after dissociation from a modified peptide, can be specified along with a minimum intensity threshold (Fig. 1A,  left). Diagnostic ions indicate spectra containing a modified peptide and are used as a filter for labile mode searches, in which only spectra where the sum of intensities of the diagnostic ions exceeds a user-specified threshold are searched for the labile modification(s) (i.e., mass offsets). All other spectra not containing sufficient diagnostic ion signal are searched only for peptides with no mass offset (i.e., unmodified peptides). Second, "peptide remainder ions," or the intact peptide bearing a partially fragmented modification, can be specified (Fig. 1A, right). Peptide remainder ions are common for larger modifications containing multiple labile bonds, such as the ADP-ribosylation depicted in Figure 1. In labile searches, these ions are added to the peptide score, providing a boost for modified peptides that produce them. Finally, "fragment remainder ions" (Fig. 1A, center) refer to peptide backbone fragment ions that retain a part of the modification. Fragment remainder ions are added to the peptide score like peptide remainder ions, boosting sensitivity for modified peptides. Unlike peptide remainder ions, fragment remainder ions are also used to localize the modification within the peptide sequence, since only fragments that include the modification site can generate a modified ion. Negative fragment remainders can also be specified, referring to loss of part of the amino acid side chain at the modification site during loss of modification, such as loss of phosphoric acid from phospho-Ser or Thr residues.
MSFragger searches can now be performed entirely in the conventional (nonlabile) mode, entirely in the new labile mode, or in a hybrid mode (Fig. 1B). In the hybrid mode, modifications that incompletely dissociate can be specified as both a variable modification (nonlabile) and as a mass offset (labile). Spectra will then be searched against a theoretical peptide bearing the intact modification (from the variable modification search) and against a theoretical peptide that has lost or fragmented the modification (from the labile mass offset search), with the highest scoring result being chosen as the assignment for each spectrum. The hybrid search will also consider peptides bearing both a variable modification and a mass offset, for a total of two modifications on the peptide. Localization of modifications found in labile or hybrid searches can be accomplished in one of two ways. If fragment remainder ions are present, MSFragger will use them to attempt to localize the modification, analogously to how nonlabile modifications are localized in localization-aware open search using the complete modification mass (21). This is a simple localization method, which places the modification at each allowed site and determines which site obtains the best hyperscore, using the fragment remainder ions in place of the complete modification mass for labile searches. Alternatively, FragPipe contains PTMProphet (36), which uses a Bayesian model to localize modifications and provide an estimate of the localization confidence. PTMProphet localizations supersede those of MSFragger if PTMProphet is run in a FragPipe workflow, for both labile and conventional searches. Finally, labile search results can be examined using the integrated FragPipe visualization tool FP-PDV (40). Support for viewing custom remainder fragment ions of peptides has been added, allowing examination and comparison of different fragmentation pathways directly within the graphical interface.
A recent report of a method for photoactivatable ribonucleosides for finding sites of RNA-XL makes an excellent example of the importance of labile mode search for highly labile modifications. In this method, a single ribonucleoside is crosslinked to peptides prior to tandem MS analysis ( Fig. 2A). The ribonucleoside is highly labile, exhibiting both neutral loss of ribose at low collision energy, leaving the base behind, and complete loss of the entire ribonucleoside at moderate to high collision energies (12). As a result, typical nonlabile search methods struggle to identify RNA-crosslinked peptides because fragment ions including the modification site, expected to contain the full ribonucleoside, are not found (Fig. 2B, top spectrum). To identify their RNA-crosslinked peptides in the original manuscript describing these data, the authors employed a spectrum duplication strategy that is essentially equivalent to a mass offset search, allowing peptides that have lost the entire modification to be considered. With MSFragger's labile search, in addition to searching for complete modification loss, fragment remainder ions (retaining the base only, in this case) provide a confident identification by matching many more ions in the spectrum and allowing for localization of the modification site (Fig. 2B, bottom spectrum). We compared a conventional search in MSFragger (the ribonucleoside specified as a variable modification) to a two labile mode searches, one assuming complete loss of the ribonucleoside (labile mode with no fragment remainder ions) and one including a +94 Da fragment remainder corresponding to retention of the base. The labile mode searches dramatically outperformed the conventional search, identifying 48% and 108% more RNA-XL spectra for the noremainder and with-remainder searches, respectively (Fig. 2C). Workaround methods have been used to search modification losses in conventional search engines, such as the method of spectrum duplication and manual precursor mass adjustment used by Bae et al. to search these data (12). In contrast, MSFragger labile mode provides greater capabilities, such as the remainder fragment ions that dramatically boost the number of crosslinked spectra identified, as well as the flexibility to combine multiple such features in a fully automated fashion. Ultimately, this improved spectrum assignment translated to confident identification of 60% more unique RNA-crosslinked peptides from 25% more proteins compared to a conventional (nonlabile) search performed with MSFragger (Fig. 2D). For highly labile modifications like these RNA-crosslinked peptides, and as previously demonstrated for glycopeptides (22), labile search thus greatly improves our capability to confidently identify modified peptides due to the near complete absence of ions bearing the intact modification that a nonlabile search requires. Fragment remainder ions are particularly advantageous because they enable localization of the modification within the peptide, analogously to how intact modifications are localized in nonlabile searches.
Not all modifications are as labile as these RNA-crosslinks, but labile search can still provide substantial benefits for intermediately labile modifications through hybrid searches. Phosphorylation is one of the most widely studied PTMs, with phosphoproteomics comprising an entire subfield of proteomics. It is well known that phosphorylation, particularly of Ser and Thr, can lose phosphoric acid in tandem MS, especially under conditions of low proton mobility and at high collision energies (7,8). Despite this, many phosphopeptide searches are performed with conventional (nonlabile) methods, though many search engines specifically encode the neutral loss of phosphoric acid due to the prevalence of phosphopeptide searches. As peptides are often observed at multiple charge states and can vary widely in their ability to sequester protons, a mix of phosphate retention and neutral loss is typically observed in phosphoproteomics data, with the proportion of peptides in each category depending on the collision energy, charge state, and other factors affecting proton mobility. This mix of modification retention and loss is common to many intermediately labile modifications and presents a challenge to search engines, as there will be some peptides for which the observed fragmentation does not match the fragment ions expected by nonlabile searches and some not well matched to labile searches. MSFragger labile search provides a flexible solution to this problem, allowing for competition between the nonlabile and labile alternatives to obtain the best scoring match for each spectrum. This is accomplished by specifying both a variable modification (nonlabile) and a mass offset (labile) of the same mass and amino acid specificity. If the peptide in a given spectrum largely retains the modification, the nonlabile variable modification version of the peptide will obtain a better score, whereas if the peptide largely loses the modification, the labile mass offset version will be chosen instead, allowing both possibilities to be considered in a single search (Fig. 1B, hybrid mode). For phosphopeptide searches, peptides that contain multiple phosphates can be searched by setting more than one variable modification and/or a mass offset corresponding to the mass of multiple phosphate modifications. Mass offset searches are restricted to considering a single modification site for nonlabile modifications, but multiple labile modifications can be successfully identified (without localization) thanks to dissociation of the modifications. In the phospho searches described here, we allowed a maximum of three phosphates per peptide and tested each combination of variable modifications and mass offsets yielding a maximum of three modifications to evaluate the performance of labile, conventional, and hybrid searches without interference from searching different numbers of phosphates. Localization was subsequently performed using PTMProphet in FragPipe, even for the labile MSFragger searches that did not localize multiple phosphosites directly during the search step. PTMProphet allows incorporation of modification-specific neutral losses; however, we chose not to include the neutral loss of phosphoric acid in the PTMProphet localization in this case. This allows the neutral loss of phosphoric acid fragment remainder ion to be considered in the MSFragger search to improve the spectrum identification rate without allowing other water losses from the peptide to confuse the phosphate localization.
We demonstrate the value of this flexible labile search method for phosphoproteomics data from two sources, one label-free with fragmentation compared at low and high collision energies, and one TMT-labeled and fragmented at high energy. For the label-free phosphoproteomics data of Tran et al., MSFragger labile search shows a substantial improvement in phosphoPSMs identified over a conventional search in MSFragger, which in turn identifies many more PSMs than the original search of Tran et al., performed with Mascot (Fig. 3A).
Dividing the search results by the collision energy at which the data were acquired, the performance of labile search shows a clear dependence on the collision energy (29). At an NCE of 25, the labile and hybrid searches offer some improvement over the conventional search, with hybrid and fully labile searches having very similar performance (Fig. 3B and supplemental Table S1). At this low collision energy, some peptides exhibit phosphate loss while others do not, so hybrid searches that consider both cases offer the best performance, with no improvement from a fully labile search versus hybrid. At the higher NCE setting of 35, however, the fully labile search assigns over 30% more spectra than the conventional search, as the majority of phosphates are dissociated at this high energy ( Fig. 3B and supplemental Table S1). At this high energy, the fully labile search also outperforms the hybrid searches by 13 to 17%, indicating that the optimal search strategy depends on the degree to which modifications are dissociated in a given experiment. TMT labeling has been shown to increase phosphate neutral loss by reducing proton mobility (8), leading us to also examine a TMT-labeled phosphoproteomics dataset from the CPTAC consortium (see Methods for details). This large dataset, comprising 23 fractionated TMT-11 experiments, was searched with the same set of nonlabile (conventional), hybrid, and fully labile phospho searches. Between the high collision energy (NCE 37) and TMT labeling, it is no surprise that the fully labile search offered the best performance, identifying roughly 500,000 more phosphopeptide spectra, or an increase of 30%, compared to the conventional search (Fig. 3C). The hybrid mode searches also provided substantively improved performance compared to conventional search, possibly as a result of the variation in proton mobility from peptides with different numbers of TMT labels. The hybrid and fully labile searches ultimately identified roughly 4000 to 6000 more phosphopeptides and up to 190 more phosphoproteins than the nonlabile search (Table 1), demonstrating the value of considering phosphate loss in search, particularly for TMT-labeled phosphopeptides fragmented at high energy.
Finally, we re-analyzed several datasets containing ADPribosylated peptides to assess the performance of MSFragger labile search on a modification with different degrees of lability and from hybrid activation methods. Analysis of ADPribosylation is a growing field of study, and it has been established that ADP-ribose loss from Ser residues is extremely common (10,41), leading to many current analyses employing hybrid activation methods like EThcD to localize ADP-ribosylation sites (42,43). Previous studies have established that ADP-ribose forms a series of fragment ions when subjected to collisional activation, resulting in all three categories of labile fragments (diagnostic ions, peptide remainder ions, and fragment remainder ions) (see Fig. 1) (10). A recent study employed AIETD and compared a range of laser powers for activation, concluding that a medium laser power setting was best for the nonlabile search employed in the study (30). MaxQuant, both conventional and hybrid mode MSFragger searches offer substantial improvements in the number of ADP-ribosylated PSMs identified but with little difference between the conventional and hybrid modes (Fig. 4A). Fully labile search identified fewer PSMs, as expected given that ETD, which leaves modifications intact, is the primary activation method. Comparing the results across each laser power setting, we observe a strong positive correlation between laser power and the performance of the hybrid mode search compared to conventional search, with hybrid mode search providing improved performance at all but the lowest two laser powers (Fig. 4B). The similar performance between conventional and hybrid MSFragger searches in the aggregated data is thus the combination of an increase in PSMs with hybrid search at high laser powers and a decrease at lower laser powers. Given that the primary fragmentation is expected to come from electron transfer, preserving modifications intact, smaller improvements in performance than in HCD analyses with labile searches are expected; however, labile search methods can clearly still provide benefits in hybrid activation methods when supplemental activation is sufficiently highenergy or modifications are sufficiently labile. We also examined HCD activation ADP-ribosylation data from HeLa cells (primarily containing Ser-linked ADP-R) and mouse liver tissue (containing a mix of Arg-and Ser-linked ADP-R). In both cases, labile search offered improved performance versus conventional MSFragger search, with increases of 3% in HeLa data and 5% in mouse data, with hybrid search offering similar improvements in HeLa (3%) and slightly lower in mouse (3%) (Fig. 4, C and D). We also tested the impact of including diagnostic and peptide remainder ions in this dataset and found their inclusion in the labile and hybrid search parameters resulted in only modest improvements with ADPribosylation data (supplemental Figs. S9 and S10). In contrast, Arg-linked ADP-R uniquely generates a fragment remainder ion with a mass of −42 Da, corresponding to loss of CN 2 H 2 from the Arg side chain (10). Adding this remainder fragment to the search resulted in a 10% increase in PSMs identified in mouse liver tissue but no change in HeLa lysate, as expected given that there was almost no Arg-linked ADP-R in the HeLa dataset (Fig. 4E, center). The presence of this fragment remainder allowed for confident localization of the majority of Arg-linked ADP-R sites in the mouse liver data (Fig. 4E, blue), whereas Ser-linked sites left no fragment remainder and generally could not be localized from HCD data in either mouse liver or HeLa lysate (Fig. 4E, red). Thus, while Ser-linked ADP-R indeed requires hybrid activation for localization, Arg-linked ADP-R can be successfully localized by the faster and more sensitive HCD activation using MSFragger labile mode.

CONCLUSIONS
Labile mode search in MSFragger provides a powerful and flexible set of tools to identify spectra of peptides bearing labile modifications. While the degree of improvement in labile search compared to conventional, nonlabile search scales with the activation energy used to fragment the peptide and the lability of the modification, the hybrid mode search option improves the spectrum assignment rate in nearly all cases by searching for peptides both with and without modification fragmentation. With user-specified options for diagnostic ions, peptide remainder masses, and fragment remainder masses, a wide variety of labile modifications can be easily searched with MSFragger. The labile mode search can leverage the fragment ions discovered by our recently described fragment ion discovery module in PTM-Shepherd (24), allowing labile mode to be used without requiring manual annotation of fragmentation pathways for new modifications. Localization of labile modifications has long been a significant obstacle to PTM analyses and remains so in cases where PTMs dissociate without forming any fragment remainder masses. However, in many cases, we can now localize modifications using fragment remainder ions, as demonstrated for RNA photocrosslinking and Arg-linked ADP-ribosylation here. PTMProphet localization is also available in FragPipe and has the ability to localize using neutral loss masses, allowing for confirmatory localization of labile modifications following MSFragger search. Labile search mode is implemented in MSFragger, version 3.5, and FragPipe, version 18.0, along with visualization support in the integrated FP-PDV visualization software. Included with FragPipe are prebuilt workflow for labile searches of phosphorylation and ADP-ribosylation, which can also easily be modified to allow user-friendly implementation of labile searches for any modification of interest. Hybrid and labile mode searches identified~4000-6000 more phosphopeptides and 100-200 more phosphoproteins from the same raw data in the CPTAC TMT phosphoproteomics dataset.

DATA AVAILABILITY
Raw data reanalyzed here can be found in the public repositories listed in the Experimental Methods section (see "Datasets"). Processed tables containing results (PSMs, peptides, and proteins found) for all searches described here have been deposited at https://doi.org/10.5281/zenodo.76612 99. FragPipe workflow files containing all parameters used for each search are also provided in this repository. A guide explaining the format of the results tables can be found at https://fragpipe.nesvilab.org/docs/tutorial_fragpipe_outputs. html.
Supplemental data -This article contains supplemental data.
Funding and additional information -This work was funded in part by NIH grants R01GM094231, 5R01GM135504, and U24CA271037. The content is solely the responsibility of the    ). B, percentage increase in hybrid mode searches (relative to nonlabile searches) of AI-ETD ADP-ribosylation data, showing improved performance for hybrid searches with increasing laser power. C, ADP-ribosylated PSMs from conventional, labile, and hybrid MSFragger searches of HCD-fragmented HeLa lysate. D, ADP-ribosylated PSMs from mouse liver tissue. E, comparison of labile searches with −42 fragment remainder ion from HeLa lysate and mouse liver tissue, showing improved performance in mouse tissue but not HeLa. Pie charts show the majority of mouse tissue ADP-ribosylation sites are at Arg residues (blue) versus majority unlocalized (red) in HeLa, likely corresponding to Ser residue sites that cannot be localized due to complete modification loss from Ser. AI-ETD, activation ion-electron transfer dissociation; HCD, higher-energy C-trap dissociation; PSM, peptide-spectrum match.