Analysis of Major Histocompatibility Complex (MHC) Immunopeptidomes Using Mass Spectrometry*

The myriad of peptides presented at the cell surface by class I and class II major histocompatibility complex (MHC) molecules are referred to as the immunopeptidome and are of great importance for basic and translational science. For basic science, the immunopeptidome is a critical component for understanding the immune system; for translational science, exact knowledge of the immunopeptidome can directly fuel and guide the development of next-generation vaccines and immunotherapies against autoimmunity, infectious diseases, and cancers. In this mini-review, we summarize established isolation techniques as well as emerging mass spectrometry-based platforms (i.e. SWATH-MS) to identify and quantify MHC-associated peptides. We also highlight selected biological applications and discuss important current technical limitations that need to be solved to accelerate the development of this field.

The myriad of peptides presented at the cell surface by class I and class II major histocompatibility complex (MHC) molecules are referred to as the immunopeptidome and are of great importance for basic and translational science. For basic science, the immunopeptidome is a critical component for understanding the immune system; for translational science, exact knowledge of the immunopeptidome can directly fuel and guide the development of next-generation vaccines and immunotherapies against autoimmunity, infectious diseases, and cancers. In this mini-review, we summarize established isolation techniques as well as emerging mass spectrometry-based platforms (i.e. SWATH-MS) to identify and quantify MHC-associated peptides. We also highlight selected biological applications and discuss important current technical limitations that need to be solved to accelerate the development of this field. The immunopeptidome is referred to as the collection of peptides associated with and presented by major histocompatibility complex (MHC) molecules (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11). MHC-associated peptides are recognized by T lymphocytes that are in turn activated to eliminate abnormal cells such as pathogen-infected and cancer cells. These immune peptides are divided in two classes: MHC class I and class II peptides that are distinguishable by (1) their structure, (2) the intracellular pathways by which they are generated, and (3) the type of T lymphocytes that recognize them, reviewed in (12,13). In brief, MHC class I peptides are predominantly 9 -12 amino acids in length or slightly longer (14 -17). Class I peptides are generated mainly following degradation of intracellular pro-teins by the ubiquitin-proteasome system and are recognized by cytotoxic CD8ϩ T cells (18). MHC class II peptides are 10 -25 amino acids in length, derived mainly from proteasemediated degradation of endocytosed proteins of extracellular origin, and are recognized by helper CD4ϩ T cells. The tissue/cell type distribution also differs for class I and class II peptides: Class I peptides are presented on virtually any nucleated cell, whereas peptides displayed by class II molecules are found on a subset of specialized immune cells such as dendritic cells, macrophages, and B lymphocytes. In recent years, there has been, however, a rapid increase in the number of nonhematopoetic cell types suggested to present peptides on MHC class II molecules (19).
In the human population, the complexity of the MHC immunopeptidome is amplified by the very large genetic pool coding for structurally different class I and class II MHC molecules, termed human leukocyte antigen (HLA) molecules (20). In fact, the HLA genes constitute the most polymorphic gene cluster in the human genome. The allelic diversity often alters the structure and specificity of the peptide-binding sites of the HLA molecules (21,22). Consequently, each HLA allotype associates with a specific set of peptides bearing conserved amino acids known as '"residues" or HLA binding motif (23,24). The human genome comprises over 10,000 different HLA allelic forms (http://www.ebi.ac.uk/imgt/hla/ stats.html; April 2015), and each person expresses up to six different classical class I allotypes and typically eight different class II allotypes, resulting in a huge HLA peptidomic complexity at the population level (25).
Pioneered by Donald Hunt in the early 1990s, analyses of MHC-associated peptides by data-dependent analysis (DDA) mass spectrometry (MS) have yielded groundbreaking knowledge about the peptide binding motifs of MHC molecules (26). Thanks to the astonishing progress in MS-based technologies over the last decade, hundreds to thousands of MHC-associated peptides can now be identified in a single measurement using optimal biological model systems. More recently, targeted MS techniques have emerged as robust approaches to accurately and reproducibly quantify the dynamics of antigen presentation (27). As a result of such emerging technologies, a better understanding of our immune system as well as clinical applications are expected. In this mini-review, we aim at (1) describing key technical considerations in the selection of appropriate model systems for the exploration of immunopeptidomes, (2) summarizing established methods for the isolation of MHC-associated peptides for mass spectrometric analysis, (3) providing an up-to-date description of standard and emerging MS techniques, and (4) discussing future directions that, if explored, will advance the field.
Considerations in the Selection of Model Systems-At the genomics level, any living organism can be investigated following robust and efficient extraction of DNA. In contrast, not every biological model system is compatible with the analysis of the immunopeptidome. In principle, class I peptides are expected to be detectable on most cell and tissue types in mammals, as genes coding for MHC class I molecules are expressed in virtually any nucleated cell in jawed vertebrates. However, many technological limitations, as described below, have yet to be overcome (e.g. isolation of peptides, dynamic range of mass spectrometers, software tools) to reach robust and comprehensive analysis of MHC I immunopeptidomes from any mammalian cell type. New methods for the investigation of class I peptides would also be beneficial for the analysis of class II immunopeptidomes as both classes of peptides share generally similar technical limitations.
Currently, suitable model cell lines or tissues for immunopeptidome analysis have to express high levels of endogenous MHC molecules. Determining the absolute number of cell surface MHC molecules by flow cytometry and/or mass spectrometry is therefore an important initial step when establishing a new model system (28,29). On average, we noted from pertinent literature reports that the usage of at least ϳ5 ϫ 10 8 cells expressing ϳ2 ϫ 10 5 MHC molecules per cell was a minimum requirement for the exploration of cellular immunopeptidomes (3,4). Cell lines expressing low levels of endogenous class I molecules (e.g. C1R cells) but high levels of transfected MHC molecules, either soluble or membrane bound, have also been used in many immunopeptidomics studies (30 -34). This property makes these cells particularly attractive for the analysis of peptides presented by individual class I allotypes as the overexpression provides the flexibility required for the exploration of the allotypes (35, 36) (Fig. 1,  upper panel).
Analysis of MHC class I and II peptide ligands from cells isolated in mouse primary tissue were reported with limited success given the high number of mice needed to perform an experiment (37)(38)(39)(40). In terms of translational potential, primary human tissues are highly attractive (Fig. 1, upper panel). In general, ϳ1 g of tissue is required to detect hundreds to thousands of MHC class I peptides (41). In fact, the amount of starting material needed for the detection of high numbers of peptides is inversely proportional to the expression levels of MHC molecules. For instance, if specific tumor specimens express high levels of MHC molecules, much less material is usually needed to detect high numbers of peptide se-quences. However, specific and rigorous assays determining the exact amount and quality of the tissue required for such analyses have yet to be documented to translate the approach more effectively to the clinic.
Isolation of MHC Class I and Class II peptides-In the early days, papain proteolytic digestion was moderately successful for the isolation of MHC-associated peptides (42)(43)(44). However, this technique has not been widely used because of the large quantities of starting materials required. In the early 1990s, three main techniques to extract MHC-associated peptides were developed: (1) strong acid elution of MHC class I and II peptides from whole-cell lysate using trifluoroacetic acid (45)(46)(47), (2) mild acid elution (MAE) of class I peptides (not class II peptides) from the cell surface (48), and (3) immunoaffinity purification (IP) of the class I and II MHC peptide complexes from detergent solubilized cell lysates followed by the release of ligands from the isolated complexes (49) (Fig. 1, middle panel).
Nowadays, MAE and IP are the best established and most widely used methods, both still requiring large amounts of starting material (typically 2 ϫ 10 8 to 1 ϫ 10 10 cells). Until recently, MAE was used to isolate MHC class I peptides from various cell lines, bone-marrow-derived dendritic cells, and primary thymocytes (5,38,39,48,50). The MAE approach is cell-surface-specific and can be repeated over time from the same cell population, e.g. to isolate newly generated MHC class I peptides. A significant proportion of peptides is, however, nonspecific for the MHC class I molecule (38,39). In fact, by comparing the number of MAE-extracted peptides from wild-type and ␤2-microglobulin knockout cells -the latter expressing virtually no detectable level of cell surface MHC class I molecules -about 40 -60% of the identified peptides were determined to be contaminants, i.e. non-MHC class I-derived.
Two main advantages can be attributed to the IP strategy: (1) the high specificity of the extraction process for peptides associated to MHC molecules and (2) its flexibility ( Fig. 1, middle panel). In fact, this method can be used for the isolation of both class I and class II MHC peptides from a range of biological sources such as cell lines, primary tissues and plasma (36,(51)(52)(53). Moreover, up to ϳ90% of immunopurified peptides were reported to be specific for the MHC molecules (51,54).
Using the standard IP protocol, cells are first treated and lysed with a nondenaturing detergent (53). MHC peptide complexes are then precipitated by applying the complex lysate to an affinity column coupled with monoclonal antibody (mAb) specific for a certain MHC class or allotype (53,55). Both cell surface and intracellular MHC molecules are precipitated. HPLC fractionation or filtering is generally used following the IP (and MAE) preparation to separate peptides from larger complex components. Due to high consumption of affinity matrix, this isolation technique typically requires in-house production of mAb from hybridoma cell lines. Of note, papain digestion has also been used in combination with the IP method to cleave and subsequently isolate cell surface MHCpeptide complexes (50,56).
The IP method is highly specific but presumably comes with the cost of a very low yield, i.e. ϳ0.5-3% as reported by Hassan et al. (57). This study used a state-of-the-art, quantitatively accurate MS technique-termed selected or multiple reaction monitoring (SRM or MRM, respectively)-to quantify all losses of HLA class I peptides during sample processing. The authors showed, for the first time, that the immunopurification step was at the origin of significant losses during sample handling. Whether the yield of the IP approach is highly variable between different laboratories has yet to be reported. Nevertheless, this study stressed the need for the development of new and more efficient methodologies for the isolation of MHC-associated peptides and for the accurate quantification of stepwise yields of the procedure.
Thus, current isolation techniques are suitable for the analysis of immunopeptidomes from various biological model systems. However, the field would greatly benefit from additional technical reports as well as standardized protocols leading to comparable results between different laboratories.
MS-Based Platforms in Immunopeptidomics-Over the last decade, our ability to identify and quantify MHC-associated peptides using MS has greatly benefitted from progress made in the field of MS-based proteomics (58,59). Three types of MS data acquisition methods are now available for the measurement of MHC-associated peptides: (1) DDA, (2) targeted data acquisition, and (3) data-independent acquisition (DIA) (60). DDA is a well-established and a widely used method for large-scale identification of MHC-associated peptides (61)(62)(63). In contrast, the targeted and the DIA approaches are still emerging, and both techniques are expected to offer unprecedented advantages in terms of reproducibility and quantitative accuracy for the measurement of MHC peptide ligands across multiple samples, as recently demonstrated in proteomics (64 -66). The following section describes the principles of these three data acquisition methods with a particular focus on the measurement of class I MHC peptides.
Mapping Immunopeptidomes in DDA Mode-Over the last decade, most MS-based peptidomic studies were carried out using DDA (also known as discovery-based or shotgun MS) to maximize the amount of information acquired within an experiment. In fact, the latest generation of MS instrumentscapable of high-resolution and accurate mass measurements (e.g. Q-Exactive or quadrupole TOF) (67, 68)-enables the identification of hundreds to thousands of MHC class I or class II peptides within several hours (62). In DDA mode, ionized MHC peptide ligands (precursor ions) are first detected in a survey scan (MS1 scan) (Fig. 2, upper panel). The most abundant precursor ions detected in the precursor ion scan (TopN) are then selected for fragmentation using collision-induced dissociation (CID), beam-type higher-energy CID 1 (HCD) or electron-transfer dissociation. The resulting fragment ions (product ions) are finally detected and recorded as fragment ion spectra (MS2 spectra). For sequence identification, the recorded MS2 spectra are searched against a protein database using commercial and/or open-source automated search engines (69,70). For MHC class I-associated peptides, ϳ10% of the acquired MS2 spectra can be confidently assigned (false discovery rate Ͻ1%) to a peptide sequence (54). The exact proportion of confidently assigned MS2 spectra might be, however, largely laboratory dependent and might depend on the acquisition parameters of the instrument used (e.g. length of measurement, threshold settings, and amount of samples). The relatively low identification success rate of ϳ10% (as opposed to ϳ50% for peptides generated by tryptic digestion of intact proteins) can be attributed to the shortness and the nontryptic nature of MHC class I peptides. In fact, many class I peptides are inefficiently ionized due to the lack of basic amino acid residues, and standard fragmentation methods generate hardly predictable, information-poor MS2 spectra (71). Thus, only a small fraction of the acquired MS2 spectra contain sufficient information for the correct assignment of the peptide sequence. To boost the identification of class I peptides, new ionization and fragmentation techniques, as well as novel software tools optimized for MHC peptide ligands, would need to be developed.
Along these lines, Ternette et al. recently found that 5% of dimethyl sulfoxide in liquid chromatography solvents enhanced the electrospray ionization of HLA class I peptides, improving the total ion count by approximately twofold (personal communication) (72,73). However, the effect of enhancing electrospray ionization and sensitivity might depend on the type of emitter source as well as specific parameters such as voltage, temperature, and gas flow. Interestingly, a gain in HLA class I peptide identification was also found by introducing a novel fragmentation method termed "electron-transfer/ higher-energy collision dissociation" (54). With electron-transfer/higher-energy collision dissociation, Mommen et al. found that 39% of the acquired MS2 spectra led to high-confidence HLA class I peptide assignments (54). This method generates the fragment ions induced by ETH (c/z) and HCD (b/y) and combines them in a single spectrum (74,75). The authors directly attributed the better performance of electron-transfer/ higher-energy collision dissociation for the identification of HLA class I peptides (i.e. approximately threefold increase)-in comparison to CID, HCD, and electron-transfer dissociation-to the more extensive backbone fragmentation 1 The abbreviations used are: CID, collision-induced dissociation; CV, coefficient of variation; DDA, data-dependent acquisition; DIA, data-independent acquisition; HCD, beam-type higher-energy CID; HIV, human immunodeficiency virus; HLA, human leukocyte antigen; HRM, hyperreaction monitoring; IP, immunoaffinity purification; mAb, monoclonal antibody; MHC, major histocompatibility complex; MRM, multiple reaction monitoring; MS, mass spectrometry; PRM, parallel reaction monitoring; SRM, selected reaction monitoring.  (76), performed approximately twofold better than the standard target-decoy approach (77), irrespective of the fragmentation method used.
Although MS-based immunopeptidomics has greatly benefited from enormous advances in high-resolution instrumentation in recent years, it is now apparent that mass spectrometers operated in DDA mode are less well suited for solving problems that require the comparison of comprehensive, quantitative, and reproducible datasets across many samples or conditions. In fact, Michalski et al. reported that while 100,000 peptide features were recorded from a complex digest of cell lysate, only 16% were selected for fragmentation (referred as undersampling) even though the experiment was performed on a fast scanning LTQ-Orbitrap Velos mass spectrometer (78). Moreover, the selection of the precursor ions in DDA mode follows a simple intensity-dependent heuristics, leading to stochastic isolation of precursor ions for fragmentation and therefore irreproducible peptide identification when the same sample is repeatedly analyzed (79 -81). In fact, ϳ20% of the selected HLA class I peptides were shown to vary between replicate analyses of the same sample (82). Targeted MS strategies can alleviate this limitation. Their principles are described below with a particular focus on S/MRM and parallel reaction monitoring (PRM).

Identification and Quantification of MHC-Associated Peptides in Targeted Data Acquisition
Mode-Targeted quantitative proteomics was acknowledged as the method of the year in 2012 by Nature Methods (64,83). In contrast to DDA, targeted data acquisition methods offer high specificity, sensitivity, reproducibility, quantitative accuracy, and a wide dynamic range (approaching five orders of magnitude) for the measurement of peptides (84 -86) (Fig. 1, lower panel). Stateof-the-art targeted methods are mainly performed in S/MRM and PRM mode on a low-resolution triple quadrupole and a high-resolution quadrupole Orbitrap mass spectrometer, respectively (Fig. 2, middle panel) (84,86). Such MS-based targeted platforms require a priori knowledge of the molecular targets. These techniques are routinely used within the proteomics community but were only recently applied for the sensitive detection and the robust quantification of MHC class I peptides (27,37,57,87,88).
S/MRM is considered the gold standard quantification method for predefined sets of target peptides (84). The method exploits the capability of the first and the third quadrupole in a triple quadrupole mass spectrometer to act as mass filters for the iterative isolation of a precursor ion and a fragment ion derived from the targeted precursor-also known as a transition (Fig. 2, middle panel). In a typical S/MRM experiment, the signal of three to six transitions per targeted MHC peptide ligand is recorded over the chromatographic elution profile of the targeted peptide. This type of targeted data acquisition results in a peak group (coeluting fragment ion traces) that is subsequently analyzed using software tools such as Skyline (89), mProphet, and mQuest (90,91). Importantly, selection of optimal transitions has to be carried out before an S/MRM experiment, typically involving the use of synthetic peptides, that optionally are heavy isotope labeled (92)(93)(94). Therefore, optimization of S/MRM transitions requires significant efforts. On a quadrupole orbitrap mass spectrometer operated in PRM mode, this rather timeconsuming preacquisition selection process is not required since all fragment ions of the targeted peptide and potentially coselected contaminating peptides are recorded for each precursor-ion charge state (Fig. 2, middle panel)-the selection of optimal fragment ion traces is therefore carried out postacquisition for peptide identification and quantification (95). Synthetic peptides are used in proteomics to optimize the collision energy of individual transitions. In this regard, previous studies showed that the sensitivity gain resulting from optimizing the collision energy for each transition was about twofold compared with the signals obtained from more generic collision energies computed according to the mass of the targeted precursor (92, 94, 96 -98). This optimization process could therefore be particularly beneficial for the detection of low-abundance MHC-bound peptides (27,88).
For the reproducible and accurate quantification of tryptic peptides in complex proteome digests, the S/MRM technique has been shown to reach a high dynamic range of about five orders of magnitude, as well as excellent interlaboratory reproducibility, with a median coefficient of variation (CV) of ϳ5% (99). Similar analytical performances were also observed using the PRM approach (100 -102). Nevertheless, such targeted techniques are still emerging in immunopeptidomics, and only a few studies have used S/MRM or PRM for the relative and absolute quantification of MHC peptides (8,27,36,37,57,87). For instance, the group of Purcell demonstrated in a technical note that the limit of detection for the model peptide SIINFEKL was in the attomol range using the S/MRM technique (37). In addition, S/MRM and PRM were recently applied-in combination with MHC-monomers loaded with heavy-labeled peptides-to quantify the yield of the IP method, as mentioned above (57). Importantly, intraand interlaboratory studies measuring the limit of quantification, limit of detection, CV, and chromatographic reproducaccurate mass spectrometer (e.g. Q-Exactive). All MS2 product ions derived from predefined peptides (MS1 precursors) are recorded over time to generate the ion chromatographic traces. (Lower panel) HRM and SWATH MS are two fundamentally similar data-independent acquisition methods employing an Orbitrap-type and a quadrupole TOF high-resolution accurate mass spectrometer, respectively. In DIA mode, multiplexed ion traces are acquired by repeatedly cycling through predefined consecutive precursor isolation windows (originally 32 ϫ 25 Th) and by monitoring all fragment ions.
ibility have yet to be conducted from a large set of MHC peptide ligands.
S/MRM and PRM are two fundamentally similar approaches considered to be highly robust for the sensitive, reproducible, and accurate quantification of peptides. Nevertheless, both techniques are limited by their capacity to detect hundreds of peptides per sample injection and thus are not ideally suited to comprehensively quantify immunopeptidomes. To overcome this limitation, new MS methods, collectively known as DIA, have recently emerged and should expedite the deciphering of immunopeptidomes. Below, we focus on the SWATH-MS sequential window acquisition of all theoretical mass spectra (SWATH-MS) technology (103), one of the DIA strategies that was very recently applied to analyze immunopeptidomes from a range of biological sources (82).
Digital Mapping of Immunopeptidomes in DIA Mode-DIA is an unbiased MS technique that combines the advantages of DDA and S/MRM (104). Since the introduction of its concept in 2004, several DIA strategies have been described and were reviewed in detail (66,(105)(106)(107). In essence, this acquisition mode converts all peptides in a physical sample into a permanent digital map composed of multiplexed MS2 spectra derived from the fragmentation of all precursor ions. Data collection is therefore comprehensive and quantitative information can be mined retrospectively (Fig. 1, lower panel). More recently, SWATH-MS was introduced as a new DIA method for the consistent, reproducible, and quantitatively accurate measurement of proteins across multiple samples (103, 108 -116). The SWATH-MS and the SWATH-like hyperreaction monitoring (HRM) techniques provide S/MRM-like performance in terms of reproducibility and quantitative accuracy and were implemented in a fast-scanning, high-resolution quadrupole time-of-flight and a quadrupole orbitrap mass spectrometer, respectively (Fig. 2, lower panel) (68,117). In SWATH or HRM mode, the instrument cycles repeatedly through fixed or variable adjacent precursor isolation windows (typically 32 windows of 25 m/z covering 400 -1200 m/z) over the course of chromatographic elution, thus fragmenting all coeluting precursor ions in each window recording multiplexed fragment ion spectra of all peptides in a userdefined retention time versus mass to charge window.
To extract quantitative information from such digital SWATH-MS data, high-quality assay libraries are required (109,111). Such libraries contain retention-time and fragmentation information of the peptides to be quantified. Assay libraries are generated from native and/or synthetic peptides using a SWATH-compatible mass spectrometer operated in DDA mode. Very recently, assay libraries were successfully employed for the high-throughput measurement of HLA-associated peptides by SWATH-MS (82). Caron et al. demonstrated, for the first time, the feasibility of an international effort to build standardized HLA allele-specific peptide spectral and assay libraries, which were used to extract quantitative information from digital SWATH maps acquired in differ-ent laboratories. Importantly, the authors demonstrated that the SWATH method clearly outperformed the DDA approach for the reproducible identification of HLA class I peptides across several technical replicates. In addition, Caron et al. showed that (1) ϳ81% of HLA class I peptides present in an assay library could be reliably extracted from a quantitative SWATH map in a cell-type-independent manner, (2) narrowing the size of the windows by 2.5-fold (i.e. from 25 Da to 10 Da width each) resulted in a ϳ13% increase in the identification of class I peptides, and (3) the dynamic range of peptides quantified by SWATH-MS in different cell types, based on their signal intensity, was about 3-4 orders of magnitude. Moreover, HLA peptide ligand spectral and assay libraries were stored by class and allele in the public SWATH Atlas database (http://www.swathatlas.org/, also covering libraries for proteome analysis), providing an initial transparent framework to collect, organize, and share immunopeptidomic data. Thus, the workflow and the computational and data resources presented in that study was a first step toward highly consistent, reproducible, and quantitatively accurate measurements of immunopeptidomes across multiple samples.
Although the development of DIA/SWATH-based technologies is an important advance in the field, the detection and quantification of low abundant MHC-associated peptides still remain a major challenge. In this regard, the limit of detection and limit of quantification of SWATH-MS for MHC peptide samples have not been reported yet but were recently determined to be in the low-femtomole to mid-attomole range for high complexity samples across multiple laboratories (Collins et al. manuscript in preparation) (103). The capability of current software tools (e.g. OpenSWATH, Peakview, Spectronaut, Skyline) to reliably discriminate real signals from interfering peaks in complex DIA-windows is also considered as an important limitation. Another limitation is the nonconventional fragmentation of MHC peptides, resulting in information-poor MS2 spectra. This is a challenge for the creation of high-quality HLA allele-specific peptide assay libraries in which both the quality and quantity of fragment ions are important. Computational analysis of fragmentation patterns for MHC peptides using publically available immunopeptidomics datasets might help in this regard (118). Over the next ten years, both the selectivity and sensitivity of new DIA/ SWATH mass spectrometers are expected to improve quite significantly. For instance, next-generation mass spectrometers might cycle at extreme speed through high numbers of very small precursor isolation windows (e.g. 1 m/z) over the course of chromatographic elution. Such performance in SWATH-MS would dramatically reduce the presence of interfering peaks and would facilitate the reliable identification and quantification of low abundant peptides. By continuously expanding HLA-allele-specific peptide assay libraries and by improving the performance of computational frameworks, it can also be expected that more HLA-associated peptides will be confidently extracted from DIA/SWATH data in the future. Moreover, robust untargeted analysis of DIA immunopeptidomics data will enable reproducible analysis of MHC immunopeptidomes without the need for spectral and assay libraries, as recently shown for DIA proteomics data (119).
Application of DDA-, Targeted-and DIA-Based Immunopeptidomics in Basic and Translational Studies-Over the last decade, the methods described above were applied in various biological contexts for the analysis of class I and class II MHC peptides. MS-based approaches, together with MHC peptide ligand prediction algorithms (e.g. SYFPEITHI, Net-MHC, smm) (120 -122) were particularly helpful to better understand the molecular mechanisms that process cellular proteins into the immunopeptidome as well as for the detection of disease-related peptides and phosphopeptides that could be used for the rational design of immunotherapeutic interventions (123)(124)(125). Since a comprehensive survey of the literature is not within the scope of this mini-review, we focus below on selected landmark papers and applications of outstanding interest.
Large-scale identification of MHC peptide ligands by MS took off in 2004 with the analysis of more than 200 naturally presented HLA-B*1801 peptides (31). In this seminal paper, Hickman et al. sequenced peptide ligands of secreted HLA molecules from a transfected human B cell line using DDA MS. This study indicated, for the first time, that HLA class I peptide ligands were encoded by any gene, suggesting that any protein from any cellular compartment could potentially contribute to the composition of the MHC class I immunopeptidome. This basic notion was then further supported by additional DDA-based studies, collectively indicating that the MHC class I immunopeptidome conveys to the cell surface an integrative view of gene regulation (30, 35, 39, 51, 123, 126 -136).
The journal Science highlighted cancer immunotherapy as the "2013 Breakthrough of the Year" (137). In fact, compelling clinical results have shown that antibody-based checkpoint blockade therapy can restore the function of T lymphocytes to eradicate tumor cells (138). By integrating exome sequencing, MHC peptide ligand prediction algorithms and targeted MS, Gubin et al. recently identified tumor-specific mutant MHC class I peptides as targets of this form of immunotherapy (88). Proteogenomics approaches (139,140) using DDA MS were also recently used for the identification of mutant as well as polymorphic MHC class I peptides (5,51,(141)(142)(143)(144). Notably, the detection of tumor-specific mutant MHC peptides has not been achieved yet using DIA MS. If tested and validated, DIA MS could, nevertheless, represent a robust approach in the clinic for the development of next-generation T-cell-based cancer vaccines as well as for the stratification of patients who might best benefit from checkpoint blockade immunotherapy (138,145,146).
Rapid, robust, and inexpensive detection of tumor-specific and pathogen-derived HLA peptides from low amounts of starting material is expected to have a strong impact on the development of vaccines against cancers as well as deadly infectious diseases such as tuberculosis, HIV and malaria (147,148). MS-based technologies have matured enough to facilitate the analysis of self-immunopeptidomes but might still fall short of deciphering the full repertoire of HLA peptide ligands encoded by the genes of a pathogen or tumor-specific alleles. In fact, state-of-the-art mass spectrometers enable, at best, the detection of a few dozen pathogen-derived HLA peptides (i.e. for vaccinia virus, HIV, hepatitis C virus, human papillomavirus, and human respiratory syncytial virus) per experiment using optimal in vitro model systems (27, 72, 132, 147, 149 -151). Interestingly, the development of a highthroughput cytotoxic T cell-based platform for epitope discovery was recently reported (152). The results generated by this technology indicated that the repertoire of self-HLA peptides has been clearly underestimated by MS studies. Thus, benchmarking studies should be conducted to clarify whether or not the emerging MS methods described herein already provide the required sensitivity for the measurement of the most relevant immunogenic T-cell epitopes.
Data Sharing and Future Directions-The field of MS-based proteomics has progressed at an exceptionally fast pace over the last decade and the technical progress has also been highly beneficial for the MS-based measurement of MHCassociated peptides. As a consequence of new technologies, hundreds of thousands of different HLA class I and class II peptides are expected to be sequenced in the future given the extreme diversity of HLA molecules in the human population. To capitalize on such "Big Data" the development of computational methods as well as new data repositories will be essential in order to automate the annotation, storage and sharing of large MHC peptidomic datasets. Currently, sequences of eluted peptides can be uploaded and shared through the public immune epitope database (122,153). On the other hand, the original raw MS files are stored in separate repositories (e.g. PeptideAtlas, PRIDE PRoteomics IDEntifications (PRIDE), CHORUS) and more recently, the SWATHAtlas database (82,140). Ideally, this information should be integrated and centralized within the same database. Genomic information should also be included as this type of information is now essential for the identification of tumor-specific mutant MHC class I peptides, and eventually, for the detection of strain-specific or drug-resistant-specific MHC peptides encoded by pathogen genomes. Such resources would also help to further improve the computational predictability and annotation of MHC peptides, class II in particular. The emerging proteogenomics community could contribute importantly to leveraging the existing procedures toward the creation of this comprehensive resource.
The latest MS technique, DIA mass spectrometry exemplified by SWATH-MS, has the power to digitize the immunopeptidomic content of physical samples (82). In the future, the aim should be to integrate both immunopeptidomics and T cell-based "Big Data" in order to predict key immunogenic epitopes in cancer and infectious diseases, e.g. with the help of cloud-based machine learning supercomputers (154 -157). Although such advanced systems-level approaches could prove to be highly powerful in the future, one has to keep in mind the paramount importance of applying robust protocols for the isolation of MHC-associated peptides, as it crucially influences the specificity and sensitivity of downstream MS analysis (114). In fact, since the early 1990s, little work has been documented for the isolation of MHC peptide ligands. Moreover, the yield of the widely used IP approach was only recently assessed using state-of-the-art MS techniques and was reported to be about 0.5-3%, indicating extreme losses and/or biases during sample preparation (57). As recently shown in the field of proteomics, sample preparation protocols ideally enable all sample processing steps to be carried out in a single tube to minimize sample losses, thereby enhancing sensitivity, throughput, and scalability of peptidomics analyses (158,159). The development of rapid and efficient sample processing techniques is therefore crucial for the robust analysis of immunopeptidomes and will be necessary to scale up the process and advance the field effectively into routine clinical application.