Peptide-Centric Proteome Analysis: An Alternative Strategy for the Analysis of Tandem Mass Spectrometry Data*

In mass spectrometry-based bottom-up proteomics, data-independent acquisition is an emerging technique because of its comprehensive and unbiased sampling of precursor ions. However, current data-independent acquisition methods use wide precursor isolation windows, resulting in cofragmentation and complex mixture spectra. Thus, conventional database searching tools that identify peptides by interpreting individual tandem MS spectra are inherently limited in analyzing data-independent acquisition data. Here we discuss an alternative approach, peptide-centric analysis, which tests directly for the presence and absence of query peptides. We discuss how peptide-centric analysis resolves some limitations of traditional spectrum-centric analysis, and we outline the unique characteristics of peptide-centric analysis in general.

In mass spectrometry-based bottom-up proteomics, data-independent acquisition is an emerging technique because of its comprehensive and unbiased sampling of precursor ions. However, current data-independent acquisition methods use wide precursor isolation windows, resulting in cofragmentation and complex mixture spectra. Thus, conventional database searching tools that identify peptides by interpreting individual tandem MS spectra are inherently limited in analyzing data-independent acquisition data. Here we discuss an alternative approach, peptide-centric analysis, which tests directly for the presence and absence of query peptides. We discuss how peptide-centric analysis resolves some limitations of traditional spectrum-centric analysis, and we outline the unique characteristics of peptide-centric analysis in general. Tandem mass spectrometry has become the technology of choice for proteome characterization. In a typical bottom-up proteomic experiment, a mixture of proteins is proteolytically digested into peptides, separated by liquid chromatography, and analyzed using tandem mass spectrometry. The ultimate goal is to identify and quantify proteins by detecting and quantifying individual peptides, thereby shedding light on the underlying cellular mechanisms or phenotype. Several modes of data acquisition have been developed for bottom-up proteomics. The most commonly applied mode uses data-dependent acquisition (DDA) 1 , in which tandem MS (MS/MS) spectra are acquired from the dissociation of precursor ions selected from an MS survey spectrum. Constrained by the speed of instrumentation, DDA can sample only a subset of precursor ions for MS/MS characterization, generally targeting the top-N most abundant ions detected in the most recent survey spectrum. In addition, DDA is typically coupled with a method referred to as "dynamic exclusion" (1) that attempts to prevent reselection of the same m/z for some specified period of time. These acquisition strategies greatly increase proteome coverage and extend the dynamic range of detection for shotgun proteomics. The resulting MS/MS spectra are typically analyzed using sequence database searching software such as SEQUEST, Mascot, X!Tandem, MaxQuant, Comet, MS-GFϩ, or OMSSA (2)(3)(4)(5)(6)(7)(8). Because these algorithms identify peptides by first associating each individual spectrum with a matching peptide sequence and then aggregating the thus matched spectra into a list of identified peptides, we refer to them as "spectrum-centric analyses." In spectrum-centric analysis, spectra are most commonly interpreted using database searching, but can also be interpreted using de novo sequencing (9 -11), or by searching against a spectrum library (12)(13)(14). For the past two decades, spectrum-centric analysis has been an essential driving force for the development of large-scale shotgun proteomics using DDA.
DDA is a powerful and well-established technique for LC-MS/MS data acquisition. By targeting precursor ions observed in MS survey scans with highly selective MS/MS scans, DDA generates a large set of high quality MS/MS spectra, which can be automatically interpreted by database searching to identify thousands of proteins in a complex sample. When DDA was introduced, instrumentation was not fast enough to sample every observed precursor in the survey scan; thus, high-intensity precursors were prefer-entially targeted because they tend to generate higher quality MS/MS spectra that lead to peptide identification. Although this stochastic sampling approach results in a large amount of peptide identification in a single sample run, it comes at the cost of reproducibility of MS/MS acquisition between sample analyses and an inherent bias against low abundance analytes that are less likely to be sampled (15). On modern instrumentation, the speed of MS/MS acquisition has dramatically improved to the point where the majority of MS precursors that are not already in the dynamic exclusion list can be sampled for MS/MS analysis. However, even if every precursor observed in each survey MS scan is sampled, DDA is still biased against low abundance analytes that fall below the limit of detection in the MS analysis and will never be sampled. This bias is a practical limitation in the analysis of a complex mixture with high dynamic range in which many analytes will be below the limit of detection in MS analysis, but remain detectable by the more selective and more sensitive MS/MS analysis (16).
DDA remains a powerful method for identifying a large number of proteins in a sample. However, because of the incomplete sampling, when a peptide is not identified in a conventional shotgun experiment using DDA, it is incorrect to conclude that the peptide is missing from the sample, or even below the limit of detection of MS/MS, because the peptide ions may have never been sampled for MS/MS analysis. To overcome such limitations, targeted acquisition approaches such as selected reaction monitoring (SRM, also commonly referred to as multiple reaction monitoring) are often the methods of choice. In targeted acquisition approaches, a set of predetermined precursor ions are systematically subjected to MS/MS characterization throughout the LC time domain. The collision energy for each targeted ion can be optimized for fragmentation efficiency. The resulting data are typically analyzed using "targeted analysis" (17)(18)(19), in which the software looks for the co-eluting patterns from a group of predetermined pairs of precursor-product ions (called transitions). With systematic MS/MS sampling and the combined specificity of chromatographic retention time, precursor ion mass, and the distribution of product ions, targeted acquisition allows highly sensitive and reproducible detection of the targeted analytes within a complex mixture.
Modern targeted acquisition approaches are the gold standard for sensitively and reproducibly measuring hundreds of peptides in a single LC-MS/MS run (20 -22). However, data acquired in this manner is only informative for the set of peptides targeted for analysis. Because of this narrow focus, iterative testing of different hypotheses (i.e. a different set of target peptides) also requires iterative acquisition of additional data. Moreover, assay development often requires retention time scheduling and/or refinement steps to find the optimal peptides and transitions for testing a particular hypothesis.
With the existence of two complementary but distinct approaches-DDA for broad sample characterization and tar-geted acquisition for interrogation of a specific hypothesisthe natural question is if the benefits of both techniques may be combined in a single technique. A potential solution is an alternative mode of bottom-up proteomics referred to as data-independent acquisition (DIA) that has been described and realized with various implementations (16,(23)(24)(25)(26)(27)(28)(29)(30)(31)(32). In DIA, the instrument acquires MS/MS spectra systematically and independently from the content of MS survey spectra. These DIA approaches differ from DDA methods, targeted acquisition methods, and from each other in MS/MS isolation window width, total range of precursor m/z covered, duration of completing one cycle of isolation scheme (called the cycle time), single or multiple isolation windows per MS/MS analysis, and instrument platform. Because of the benefits of systematic sampling of the precursor m/z range by MS/MS, data from a single DIA experiment can be useful for both peptide detection and quantification in a complex mixture. Similar to DDA approaches, DIA data is broadly informative because the MS/MS characterization is not specific to a predefined set of peptide targets. Similar to targeted approaches, MS/MS information of peptides across the entire LC time domain can be extracted from DIA data to test a particular hypothesis. As the acquisition speed of modern instrumentation continues to increase, DIA has become more popular because of its comprehensive and unbiased sampling.
Although DIA resolves the problem of biased or incomplete MS/MS sampling, current DIA methods come with compromises (33), where the most common compromise is precursor selectivity. Constrained by the speed and accuracy of instrumentation, DIA methods typically use five-to 10-fold wider isolation windows compared with DDA to achieve the same breadth and depth of a single LC-MS/MS run. Because of this reduction in precursor selectivity, MS/MS spectra from DIA are noisier than DDA spectra. In particular, DIA by design generates mixture spectra, each containing product ions from multiple analytes with various abundance and different charge states. Fragmenting multiple analytes together also precludes DIA from tailoring collision energy for every analyte, a standard optimization in DDA and targeted acquisition.
The low precursor selectivity and resulting complexity of DIA spectra severely challenges the performance of traditional spectrum-centric analysis, which generally assumes that the detected product ions were derived from a single, isolated precursor. The major challenges in interpreting mixture spectra lie in allowing for multiple contributing precursor ions, assessing the dynamic range of mixture peptides, distributing intensities of product ions shared by contributing peptides, and adjusting statistical confidence estimates. Because almost every spectrum is mixed in DIA data, it is poorly suited for analysis by classic spectrum-centric approaches initially designed for DDA data. Some sophisticated spectrum-centric approaches (34 -39) address these challenges by deconvolving mixture spectra into pseudo spectra or by matching mixture spectra to combinations of product ions from multiple candidate peptides. However, identification of low abundance analytes by interpreting mixture spectra is inherently difficult because the MS/MS signals from low abundance analytes are naturally overwhelmed by the signals from high abundance ones.
Recently, Gillet et al. demonstrated an alternative approach that analyzes DIA data in a targeted fashion (24), opening a new door for the investigation of tandem mass spectrometry data. Much like targeted analysis of transitions used in targeted acquisition methods, Gillet et al. use extracted ion chromatograms to detect and quantify query peptides. Similarly, Weisbrod et al. identify peptides by searching peptide fragmentation patterns against DIA data (25). Instead of interpreting individual spectra in a spectrum-centric fashion, these alternative approaches take each peptide of interest and ask: "Is this peptide detected in the data?" We refer to this approach as "peptide-centric analysis" in contrast with "spectrum-centric analysis." In peptide-centric analysis, each peptide is detected by searching the MS and MS/MS data for signals selective for the query peptide. Peptide-centric analysis covers all methods that use peptides as an independent query unit, including but not limited to the targeted analysis. Peptide-centric analysis is intrinsically very different from spectrum-centric analysis (Table I, Fig. 1) and better suited for addressing many biological problems. This perspective discusses the analytical advantages of peptide-centric analysis and how they could translate to improvements in protein inference, and the analysis of DIA data.
Unique Characteristics of Peptide-Centric Analysis-I. Direct Statistical Measurements of Query Peptides-A drawback of spectrum-centric analysis is that the confidence estimates for peptides are indirect. In spectrum-centric analysis, each MS/MS spectrum is first assigned at least one peptide identity, yielding a large set of peptide-spectrum matches (PSMs). These PSMs are classified into accepted or not accepted by methods (40 -43) that assign to each PSM statistical confidence estimates, indicating the confidence of either a set of PSMs being correct (e.g. FDR) or an individual PSM being correct (e.g. p values and E-values). Subsequently, peptide-level confidence estimates can be assigned by aggregating the best PSM per peptide in a postprocessing step (43,44). Because the query unit for spectrum-centric analysis is an MS/MS spectrum, only the peptides that are matched to at least one spectrum are subject to the peptide level statistical tests. As a result, only this subset of peptides is assigned statistical confidence estimates, and the remaining peptides are implicitly considered missing.
Peptide-centric analysis, on the other hand, tests every peptide queried, providing direct and complete statistical measurements. The goal of peptide-centric analysis is to ascertain whether a query peptide was detected in an experiment. Thus, in a given data set, all of the query peptides can be separated into those with or without evidence of detection (i.e. detected or not detected). An empirical null can be estimated by generating decoy query peptides with shuffled sequences, measuring the null score distribution, and calculating p values and q-values for every query peptide using common statistical methods (40,43,45). With peptide-centric analysis, direct peptide-level testing makes answering biological questions more straightforward, and the completeness of statistical measurements makes subsequent comparison and quantification much easier.
Peptide-centric analysis could be very useful when considering the protein inference problem, which involves estimating the set of detected proteins from the set of detected peptides (46). Protein inference is heavily affected by the observed peptides. The value of peptide-centric analysis is that each peptide in a database can be directly assigned a confidence estimate of being detected/not detected because each peptide is directly investigated. In contrast, spectrum-centric analysis implicitly assigns all "missing" peptides equal, very low confidence estimates. These imputed confidence estimates could lead to biases in the inferred set of detected proteins. This includes peptides that distinguish splice isoforms or paralogs. Therefore, when comparing the result from a peptide-centric analysis to the detectability of such a peptide (47,48), it is possible to begin to probabilistically evaluate the presence/absence of a protein isoform. With directly tested peptide probabilities, peptide-centric analysis makes the input of protein inference more straightforward and transparent.
II. Considerations for Mixture Spectra-When investigating a complex proteome with shotgun proteomics, mixture spectra are a common occurrence. Although conventional DDA uses narrow isolation windows (typically ϳ2 m/z-wide) targeting single precursor ion species for fragmentation, as  (35,39,49). The frequency and impact of mixture spectra in a DDA experiment vary with the sample complexity, LC separation, acquisition parameters, and instrumentation. Some studies used isolation windows as narrow as 0.7 m/z-wide to minimize unwanted precursor ions from being co-isolated and cofragmented (15,50). In the context of DIA, all spectra are essentially mixture spectra because DIA isolates and fragments all precursor ions within a wide m/z range. As discussed previously, identification of multiple components in a mixture spectrum is challenging: Most spectrum-centric software is designed to identify a single component from each spectrum. Peptide-centric analysis excels in handling mixture spectra because it does not interpret individual spectra. Rather than deconvolving each individual spectrum, peptide-centric analysis searches for evidence of detection for individual peptides, explicitly tolerating cofragmentation. Although spectrum-centric analysis struggles to identify multiple components with wide dynamic range from each mixture spectrum, peptide-centric analysis queries each peptide independently from other peptides. This subtle but significant change of query unit (Table I) shifts the problem from "peptides competing with each other to explain the mixture spectrum" to "spectra competing with each other to represent the query peptide." With peptide-centric analysis, a single spectrum can be the top-scoring evidence of detection for multiple distinct peptides, as expected in the case of mixture spectra. In addition, peptide-centric analysis readily benefits from the systematic sampling of DIA when each analyte is sampled multiple time across its chromatographic peak. Conversely, even if the product ions of the query peptide comprise the minority of the mixture spectra, peptide detection can still be achieved using peptide-centric analysis.
III. Roles of Precursor Ion Signals-Precursor information is a powerful component of MS/MS data analysis. Inherently designed to identify DDA spectra, spectrum-centric approaches typically use precursor information as a "filter" to FIG. 1. Spectrum-centric analysis and peptide-centric analysis. In spectrum-centric analysis, each MS/MS spectrum from either a DDA or DIA experiment is queried against a protein sequence database. The peptides that yield the best scoring N statistically significant PSMs are assigned to the corresponding MS/MS spectrum. Typically N is one for a DDA spectrum and multiple for a DIA spectrum (showing N ϭ 4 here). In peptide-centric analysis, every peptide of interest is queried against the acquired MS/MS data. The bottom-middle panel shows the extracted MS/MS signal of the query peptide over time in which the signal is extracted from any MS/MS spectrum generated from isolating the query precursor m/z. The extraction window width corresponds to the acquisition method, showing here 2 m/z for DDA and 10 m/z for DIA. The precursor m/z of the query peptide is sampled stochastically and sparsely in DDA but systematically in DIA. The MS/MS signal that provides the best scoring evidence of detection is assigned to the query peptide (indicated by the arrows). constrain peptide candidates for PSMs (2)(3)(4)(5)(6)(7)(8). These approaches assign precursor ion(s) to each spectrum in various ways spanning from using the un-processed precursor ion target, considering multiple monoisotopic ions in the isolation window, to detecting peptide features in the MS space. With high mass measurement accuracy and high resolution instruments, spectrum-centric searches could allow for only Ϯ10 ppm of monoisotopic mass tolerance, thus greatly reducing the number of peptide candidates for PSMs and reducing the false discovery rate.
In the context of analyzing DIA data there is no clear consensus on how to use precursor information. Recent DIA methods emphasize the systematic measurement of both precursor and product ions, allowing for the detection of precursor and product ions that covary over elution time and likely are derived from the same analyte (26). This concept of detecting covarying precursor-product ion groups has been used to generate deconvolved spectra from DIA spectra. Each deconvolved spectrum contains precursor and product ions ostensibly derived from a single analyte and are thus more compatible with spectrum-centric analysis (51,52).
Peptide-centric approaches could also use precursor information as evidence of detection. Rardin et al. recently demonstrated improved quantification from DIA data using Skyline with precursor ion filtering and transition filtering by correlation analysis (18,53). Although filtering with precursor ions and precursor-product groups improves selectivity and specificity, the detection process could reduce sensitivity because analytes may have no MS signal, or an MS signal with substantial chemical noise despite having an MS/MS signal amenable for quantification. One way to incorporate precursor information without reducing sensitivity is to use it as a scoring feature rather than a filter, which is employed in some peptide-centric approaches such as the algorithms used in Skyline (18). When analyzing complex mixtures, incorporating precursor information without filtering may provide greater confidence in peptide detection for analytes with a signal in MS spectra without compromising sensitivity by eliminating analytes that may have an MS/MS signal but no detectable MS signal.
Applications of Peptide-Centric Analysis-Peptide-centric analysis is particularly suited for DIA experiments given its advantages in handling mixture spectra. In addition, peptidecentric analysis can easily incorporate valuable properties from DIA data, such as retention time and elution profile, that are commonly ignored by spectrum-centric analysis. For example, Gillet et al. demonstrated peptide detection and quantification by extracting peptide-specific product ion chromatograms, or extracted ion chromatograms, from DIA data using 26-m/z SWATH acquisition (24). Weisbrod et al. demonstrated peptide detection and quantification by searching theoretical or empirical peptide fragmentation patterns against the DIA data acquired using high mass accuracy Fourier transform-all reaction monitoring (FT-ARM) of 100-m/z wide isolation windows (25). With low precursor selectivity and high intrascan dynamic range in both cases, correctly interpreting the spectra using spectrum-centric analysis is extremely challenging.
Peptide-centric analysis can also be applied to DDA data. For example, Liebler et al. used a pattern recognition algorithm (SALSA) to search for peptide-specific ion series against the DDA MS/MS spectra (54). Because of the stochastic nature of the DDA data, the evidence for peptide detection appears sparse and scattered compared with analyzing DIA data (Fig. 1). Nonetheless, peptide-centric analysis provides statistical measure for every query peptide regardless of whether the data is sparse or dense. In addition, given that many DDA spectra are mixed, peptide-centric analysis retains the benefits of handling of mixture spectra when analyzing DDA data.
Extensible Framework for Mass Spectrometry-This concept of defining peptides as analytes and directly searching for their evidence of detection generalizes into a broader paradigm, which we call "analyte-centric analysis." Analytecentric analysis comprises any method that uses the analyte as the query unit to ask whether the analyte is detected or not. It includes the traditional targeted data analysis, but is not limited to the methods that scores based on transitions or extracted ion chromatograms. The analyte of interest can be naturally extended from peptides to include small molecules, peptides with modifications, intact proteins, lipids and metabolites. In this analyte-centric paradigm, any properties of an analyte can be naturally incorporated into the score that summarizes the evidence supporting an analyte being detected. For example, "Does the discovered fragmentation evidence coincide with chromatographic expectations?" Also, as mass spectrometer resolution continues to improve toward finescale isotope resolution (55), the analyte-centric approach can discriminate an isotopic profile based on the elemental composition of the analyte.
One of the subtle but significant benefits of analyte-centric analysis is the change in the query unit and null hypothesis. In the spectrum-centric approach, validation programs that modeled a false distribution of decoy hits were in reality posing the null hypothesis as, "This spectrum is made up of a random analyte." For analyte-centric analysis, the null hypothesis is, "The analyte is not detected in the data." This more direct hypothesis is better suited for answering most biological problems.

CONCLUSIONS
In this perspective, we discuss the analytically unique characteristics of peptide-centric analysis compared with traditional spectrum-centric analysis in analyzing shotgun proteomics data. Specifically, peptide-centric analysis provides direct statistical measurements for every peptide, and could improve the analysis of mixture spectra common in DIA data. We also discussed how peptide-centric approaches could use precursor signals as essential or supporting evidence of detection. As mass spectrometry instruments continue to improve in acquisition speed, DDA will be able to sample deeper for lower abundance analytes and DIA will be able to systematically acquire MS/MS spectra with improved precursor selectivity or a shorter cycle time. Analysis of the resulting large collections of data could benefit from the alternative peptidecentric approaches. Specifically, changing the perspective from identifying as many spectra as possible to confidently detecting peptides from an experiment greatly benefits protein inference and quantitative comparison. The fact that the same peptide is fragmented in DIA data sets multiple times generating a chromatographic elution profile for each product ion further increases the achievable quantitative accuracy. Furthermore, a peptide-centric perspective can be naturally extended to other analytes such as intact proteins, lipids, and metabolites. We hope the analytical advantages of analytecentric analysis over spectrum-centric analysis will incite the field to further advance bioinformatics and statistical solutions for analyzing mass spectrometry data.