Increased Diversity of the HLA-B40 Ligandome by the Presentation of Peptides Phosphorylated at Their Main Anchor Residue*

Human leukocyte antigen (HLA) class I molecules bind peptides derived from the intracellular degradation of endogenous proteins and present them to cytotoxic T lymphocytes, allowing the immune system to detect transformed or virally infected cells. It is known that HLA class I–associated peptides may harbor posttranslational modifications. In particular, phosphorylated ligands have raised much interest as potential targets for cancer immunotherapy. By combining affinity purification with high-resolution mass spectrometry, we identified more than 2000 unique ligands bound to HLA-B40. Sequence analysis revealed two major anchor motifs: aspartic or glutamic acid at peptide position 2 (P2) and methionine, phenylalanine, or aliphatic residues at the C terminus. The use of immobilized metal ion and TiO2 affinity chromatography allowed the characterization of 85 phosphorylated ligands. We further confirmed every sequence belonging to this subset by comparing its experimental MS2 spectrum with that obtained upon fragmentation of the corresponding synthetic peptide. Remarkably, three phospholigands lacked a canonical anchor residue at P2, containing phosphoserine instead. Binding assays showed that these peptides bound to HLA-B40 with high affinity. Together, our data demonstrate that the peptidome of a given HLA allotype can be broadened by the presentation of peptides with posttranslational modifications at major anchor positions. We suggest that ligands with phosphorylated residues at P2 might be optimal targets for T-cell-based cancer immunotherapy.

Human leukocyte antigen (HLA) class I molecules bind peptides derived from the intracellular degradation of endogenous proteins and present them to cytotoxic T lymphocytes, allowing the immune system to detect transformed or virally infected cells. It is known that HLA class I-associated peptides may harbor posttranslational modifications. In particular, phosphorylated ligands have raised much interest as potential targets for cancer immunotherapy. By combining affinity purification with highresolution mass spectrometry, we identified more than 2000 unique ligands bound to HLA-B40. Sequence analysis revealed two major anchor motifs: aspartic or glutamic acid at peptide position 2 (P2) and methionine, phenylalanine, or aliphatic residues at the C terminus. The use of immobilized metal ion and TiO 2 affinity chromatography allowed the characterization of 85 phosphorylated ligands. We further confirmed every sequence belonging to this subset by comparing its experimental MS2 spectrum with that obtained upon fragmentation of the corresponding synthetic peptide. Remarkably, three phospholigands lacked a canonical anchor residue at P2, containing phosphoserine instead. Binding assays showed that these peptides bound to HLA-B40 with high affinity. Together, our data demonstrate that the peptidome of a given HLA allotype can be broadened by the presentation of peptides with posttranslational modifications at major anchor positions. We suggest that ligands with phosphorylated residues at P2 might be optimal targets for T-cell-based cancer immunotherapy. Major histocompatibility complex (MHC) 1 class I molecules are cell surface glycoproteins that are expressed on almost every nucleated cell in vertebrates. They result from the noncovalent interaction of a polymorphic heavy chain, a constant light chain (␤-2-microglobulin (␤2m)), and a peptide ligand (1). The extracellular region of the heavy chain encompasses three domains, ␣ 1 , ␣ 2 , and ␣ 3 , with ␣ 1 and ␣ 2 forming a groove that accommodates a peptide ligand of, typically, 8 to 11 amino acid residues. The binding of the ligand to the groove is governed by the interaction of the side chains of certain peptide residues, called anchor positions, with several pockets of the heavy chain named A to F (1,2). The size and chemical nature of these pockets impose restrictions on the peptide repertoire that can be associated with a particular class I antigen. It is reckoned that the ligandome of a given class I allotype may comprise up to 10,000 different peptides (3), although recent reports suggest that this number may be underestimated (4).
Peptides displayed by MHC class I molecules derive from the intracellular degradation of endogenous proteins in the nucleus and cytosol and reach the lumen of the endoplasmic reticulum by means of the transporter associated with antigen processing. Inside the endoplasmic reticulum, peptides bind to the heavy chain and ␤2m in a multistep process involving several chaperones. Finally, if the bound peptide confers enough stability to the complex, the MHC class I molecule migrates via the Golgi network to the cell surface (5). MHC class I molecules facilitate immunological surveillance by presenting peptide ligands to CD8 ϩ T lymphocytes. When tumorspecific peptides or peptides derived from intracellular pathogens are detected by the T cells, they exert their cytotoxic effects over the antigen-presenting cell, promoting tumor suppression or eradication of the infection.
The MHC, known as the human leukocyte antigen (HLA) system in humans, is the most polymorphic region in the entire genome (6). In particular, the IMGT/HLA database (7) currently contains about 7000 allele sequences that encode more than 5000 different human class I antigens. Most of these polymorphisms are located within the ␣1 and ␣2 domains of the heavy chain and modulate the peptide binding preferences of each allotype (8). It is thought that the great diversity of HLA class I allotypes, and of their associated ligandomes, is an adaptation to guarantee immunity against intracellular pathogens (6). In this regard, the existence of a large number of different class I molecules capable of presenting diverse peptidomes hampers immune evasion by means of viral genetic mutation.
It has been known for a long time that HLA class I molecules display posttranslationally modified peptides at the cell surface (9). Among other modifications, N-terminal acetylation (10), phosphorylation (11)(12)(13), methylation (14), and glycosylation (15) have been described in MHC class I-bound peptidomes. In this context, phosphorylated ligands have raised much interest owing to their potential as targets in T-cellbased cancer immunotherapy (12,13), given that aberrant phosphorylation is a hallmark of malignant transformation (16,17) and phosphorylated epitopes can be specifically recognized by CTLs (11). Therefore, the characterization of the phosphopeptidome associated with MHC class I molecules and the identification of tumor-derived phosphopeptides are necessary in order for such immunotherapeutic approaches to be implemented.
Nevertheless, the identification of HLA class I-bound phosphopeptides is difficult because of several analytical limitations. Phosphopeptides constitute only a small fraction of the peptide repertoire of a given HLA allotype. Additionally, MS analysis is hindered by the low ionization efficiency of phosphorylated species relative to their nonphosphorylated counterparts (18), which makes them more difficult to detect. Moreover, the fragmentation of phosphorylated peptides by collision-induced dissociation usually results in minimally informative MS2 spectra (19). As a consequence of the lability of the phosphate group, which is readily dissociated during fragmentation, a prominent signal corresponding to the neutral loss of phosphoric acid is often observed contrasting with poor b-and y-type ion signals. This phenomenon is especially exacerbated in the case of phosphoserine (19), which is involved in about 90% of the phosphorylation events in the human proteome (18). Thus, the unambiguous identification of phosphopeptides is usually a challenging task.
To overcome these limitations, a wide variety of proteomic approaches have been developed. Several such strategies rely on the enrichment of the phosphorylated species prior to LC-MS analysis, typically by means of IMAC or TiO 2 affinity chromatography (20). These techniques have also been successfully applied to the characterization of the phosphopeptidomes of several class I molecules (11)(12)(13).
In this study, we employed IMAC and TiO 2 chromatography in combination with LC-MS to enrich and identify phosphorylated peptides associated with HLA-B*40:02. This strategy allowed the identification of a large number of endogenous ligands and the fine mapping of the B*40:02 binding motif. It was also effective for the characterization of the phosphopeptidome displayed by this allotype. Among the identified phospholigands, we found some sequences lacking the canonical binding motif at peptide position 2 (P2) and carrying a phosphoserine residue instead. Thus, the presentation of ligands harboring posttranslationally modified residues at major anchor positions contributes to the increased diversity of HLA class I peptidomes. We suggest that these sorts of epitopes might be valuable in T-cell-based cancer immunotherapy.

EXPERIMENTAL PROCEDURES
Cell Lines and Monoclonal Antibodies-HMy2.C1R (C1R) is a human lymphoid cell line with low expression of its endogenous HLA class I molecules. C1R cells show reduced levels of HLA-B*35:03 and normal expression of HLA-C*04:01 (21). A full-length cDNA clone of B*40:02 was obtained from the LCL line 143.2 (22) and cloned into the RSV5neo vector. C1R-B*40:02 transfectants were generated via electroporation of 10 7 C1R cells at 250 mV and 960 F. To select stable transfectants, we grew the cells in the presence of 1 mg/ml geneticin (Invitrogen), and surface expression of HLA-B*40:02 was confirmed by flow cytometry. Cells were cultured in DMEM supplemented with 7.5% FCS (both from Sigma). The mAb W6/32 (IgG2a specific for a monomorphic HLA class I determinant) has been described elsewhere (23).
Phosphopeptide Enrichment-Phosphopeptide enrichment entailed the combination of two previously described approaches, namely, IMAC with Fe 3ϩ as ligand (25) and TiO 2 chromatography (26). The peptide pool, reconstituted in 200 l of loading solution (50% acetonitrile, 0.3% TFA, pH 1.5), was incubated at room temperature with 15 l of Phos-Select iron affinity gel (Sigma) that was subsequently packed in a homemade tip column. This column was connected to a second one packed with Oligo R3 resin (Applied Biosystems, Foster City, CA). The flow-through was recovered and both columns were washed extensively, first with loading solution, then with transfer solution (1% phosphoric acid), and finally with washing solution (2% acetonitrile, 0.1% TFA). Elution was carried out using two solutions sequentially: (i) 50% acetonitrile, 0.1% TFA; and (ii) 30% acetonitrile, 0.06 mM NH 4 OH. Both eluates were mixed, dried to completeness, and redissolved in 0.1% formic acid (this sample is referred to as the IMAC fraction throughout the article).
The flow-through of the first chromatography was dried and reconstituted in 1 M glycolic acid, 80% acetonitrile, 5% TFA and applied to a microcolumn packed with Titansphere TiO 2 resin (GL Sciences, Tokio, Japan). The unbound fraction was recovered, acidified with 10 l of 0.1% formic acid, dried, and reconstituted in 0.1% TFA before desalting with a C18 ZipTip (Eppendorf, Hamburg, Germany). Afterward, the sample was dried and reconstituted in 0.1% formic acid (this sample is referred to as the flow-through fraction throughout the article). The TiO 2 column was washed with 80% acetonitrile, 5% TFA and the peptides were eluted with two solutions: (i) 0.3 M NH 4 OH and (ii) 0.3 M NH 4 OH, 30% acetonitrile. Both eluates were mixed and the medium was acidified by the addition of formic acid. Then, the samples were desalted using a C18 ZipTip (Eppendorf), dried in a Speed-Vac, and redissolved in 0.1% formic acid (this sample is referred to as the TiO 2 fraction throughout the article).
LC-MS Analysis-Two technical replicates of each fraction (flowthrough, IMAC, and TiO 2 ) of the B*40:02-bound peptide pool were analyzed in a nano-LC Ultra HPLC (Eksigent, Framingham, MA) coupled online with a 5600 triple TOF mass spectrometer (AB Sciex, Framingham, MA) through a nanospray III source (AB Sciex). Chromatography was performed using a C18 chromXP trapping column (350 m ϫ 0.5 mm, 3-m particle diameter, 120-Å pore size; Eksigent) and a C18 chromXP column (75 m ϫ 150 mm, 3-m particle diameter, 120-Å pore size; Eksigent). Solvent A was 0.1% formic acid in water, and solvent B was 0.1% formic acid in acetonitrile. The loading pump was operated at isocratic conditions with buffer A at a flow rate of 2 l/min for 10 min. The nanopump worked at 300 nl/min under gradient elution conditions as follows: 2% B for 1 min, a linear increase to 30% B in 109 min, a linear increase to 40% B in 10 min, a linear increase to 90% B in 5 min, and 90% B for 5 min. HPLC was controlled with the Eksigent Control software (version 3.12, Eksigent).
The nanospray source was equipped with a fused silica PicoTip emitter (10 m ϫ 12 cm, New Objective, Woburn, MA). The ion source was operated in positive ionization mode at 150°C with a potential difference of 2800 V. Each acquisition cycle consisted of a survey scan of 250 ms between 350 and 1250 m/z units and a maximum of 50 MS2 spectra scanning between 100 and 1500 m/z units. Ions showing the highest intensities in the MS spectrum were selected for fragmentation. Singly charged ions were excluded to avoid the fragmentation of non-peptide contaminants. A dynamic exclusion window of 20 s was applied to each fragmented ion. The total cycle time was 2.8 s. The mass spectrometer was controlled with Analyst TF software (version 1.5, AB Sciex).
Synthetic peptides were dissolved in 0.5% formic acid and 20% acetonitrile at an estimated concentration of 250 fmol/l. Peptides were directly infused at a flow rate of 3 l/min into the 5600 triple TOF mass spectrometer. For retention time comparison of natural and synthetic phosphopeptides, a pool of 85 synthetic phosphopeptides, 100 fmol each, was prepared in 0.1% formic acid and analyzed via LC-MS/MS using the same chromatographic conditions described above. The extracted ion chromatogram of each phosphopeptide was used to determine its retention time.
MS/MS Ion Search and Peptide Identification-Raw data were processed with PeakView software (version 1.1, AB Sciex) to generate an MGF file that was used as input for a Mascot (version 2.4) MS/MS ion search. A concatenated target-decoy protein database containing 40,478 sequences was generated by combining the Uni-Prot Homo sapiens complete proteome set (downloaded on May 23, 2011) with its corresponding reversed database generated with the DBToolKit software (version 4.1.4). Search parameters were set as follows: no enzyme; peptide tolerance, 15 ppm; MS/MS tolerance, 25 mDa; and electrospray ionization quadrupole TOF as instrument. Variable modifications included phosphorylation of serine, threonine, and tyrosine; oxidation of methionine; and pyroglutamic acid formation from N-terminal glutamine. Peptide sequences that matched the HLA-C*04:01 binding motif, Phe or Tyr at P2 and Asp at P3 (27), were discarded. Estimation of the false discovery rate (FDR) was carried out by decoy hit counting as previously described (28,29), and only those matches with an FDR Ͻ 5% at the peptide level were considered. All the information related to the MS analysis, MS/MS ion search, and peptide identification, including raw data, MS metadata, MGF and mzIdentML files, and the corresponding MIAPE MS and MSI documents, were deposited in ProteomeXchange (PRIDE accession number 31118, ProteomeXchange accession number PXD000450). This process was aided by the MIAPE extractor tool (version 3.7.0).
For the identification of phosphorylated ligands, every match with a Mascot score greater than 25 was considered. Then, MS2 spectra were manually inspected for signals that could correspond to the neutral loss of the phosphate group (Ϫ98, Ϫ49, and Ϫ32.7 for singly, doubly, and triply charged ions, respectively). Putative phosphopeptides were validated by means of retention time comparison and fragmentation of the corresponding synthetic peptide (supplemental data S2).
Statistical Analysis-To assess preferences in residue usage, we grouped the identified peptides according to their length. The frequency of each residue at each peptide position (f obs ) was compared with the frequency of the same amino acid in the database (f exp ) under the null hypothesis that f obs Յ f exp . Preliminary p values for each residue and position were calculated assuming a binomial distribution with p ϭ f exp . Definitive P values were obtained by subjecting preliminary p values to multiple testing correction as follows: where k is the number of residues of each peptide in the set tested (i.e. k ϭ 9 in the nonamer set). P values less than 0.05 were considered statistically significant. Peptide Synthesis-The stepwise solid-phase peptide synthesis was performed on an automated Multipep peptide synthesizer (Intavis, Koeln, Germany) using standard Fmoc (N-(9-fluorenyl)methoxycarbonyl) chemistry. Peptides were purified via reversed-phase chromatography either in a Smartline HPLC instrument (Knauer, Berlin, Germany) equipped with a 218TP52 C18 column (Vydac, Deerfield, IL) or using an Oligo R3 (Applied Biosystems) microcolumn. Peptides intended for binding assays were quantified by means of amino acid analysis in a Biochrom 30 amino acid analyzer (Biochrom, Cambridge, UK). The peptide GEFGGCGSV was labeled after synthesis with 5-iodoacetamidofluoresceine (Thermo) following the manufacturer's instructions. Afterward, it was purified by means of reversed-phase HPLC and quantified based on absorbance at 491 nm as described elsewhere (30).
Peptide Binding Assays-Binding assays were performed essentially as described elsewhere (30,31), with minor variations. In brief, C1R-B*40:02 transfectants were washed twice with PBS and acid stripped via incubation in 0.263 M citric acid, 0.123 M Na 2 HPO 4 , 1% BSA, pH 3, for 2 min at 4°C. After two washes with ice-cold DMEM, the cells were incubated in DMEM supplemented with 5% FCS, 2 g/ml of human ␤2m (Calbiochem), 400 nM fluorescent reference peptide GEFGGXGSV (where X represents fluoresceine-labeled cysteine), and different concentrations of the test peptides ranging from 50 M to 23 nM in 3-fold dilutions. After overnight incubation at 4°C, fluorescence was measured in an Epics XL-MCL flow cytometer (Beckman Coulter). Inhibition of reference peptide binding was plotted versus the test peptide concentration, and the IC 50 (the concentration of test peptide that gives 50% inhibition) was estimated after the experimental results had been fitted to a sigmoid curve, as previously described (30).

RESULTS
Size Distribution of the B*40:02 Peptidome-HLA-B*40:02 was affinity purified and its constitutive peptidome was acid extracted and isolated via centrifugal filtration. The phosphor-ylated species in the peptide pool were enriched by means of IMAC and TiO 2 chromatography. This strategy yielded three fractions (namely, IMAC eluate, TiO 2 eluate, and flow-through) that were then analyzed via LC-MS.
Fine Mapping of the B*40:02 Binding Motif-To determine the B*40:02 binding motif, peptides were grouped according to their length. Only octamers to undecamers were considered because the number of sequences was large enough for proper statistical analysis only in these sets. We assumed that in peptide positions not subjected to structural constraints, residue usage should mirror the frequencies of each amino acid in the proteome. Conversely, the overrepresentation of one or more particular residues would reflect the binding preferences of HLA-B40. The frequency of each residue at each peptide position (f obs ) was compared with the frequency of that particular amino acid in the database (f exp ). A residue was considered to be favored at a given position if the difference between f obs and f exp was statistically significant (p Ͻ 0.05) after multiple testing correction.
Besides these two main anchors, restrictions in residue usage were observed in most peptide positions, although much more subtle than those affecting P2 and P⍀ (Fig. 4).
Characterization To bypass the limitations related to the identification of phosphopeptides via MS, every peptide match with a score greater than 25 in the Mascot search was considered. Then, MS2 spectra were manually inspected for signals that could derive from the neutral loss of phosphoric acid (see "Experimental Procedures"). Finally, putative phosphorylated sequences were further confirmed through comparison of the retention times and the MS2 spectra of the endogenous and the synthetic peptides ( Fig. 5 and supplemental data S2).
A total of 85 unique phosphopeptides were sequenced using this approach (Table I and supplemental data S2). Of them, 69 (81%) were identified only in the IMAC eluate and 16 (19%) were found in both the IMAC and the TiO 2 samples. No single peptide belonged exclusively to the TiO 2 fraction. In our dataset, phosphorylation occurred exclusively at serine (77 sequences, 91%) and threonine residues (8 sequences, 9%) and peptides containing phosphotyrosine could not be identified. Notably, in 60 ligands (71%) phosphorylation involved SP or TP sites (i.e. phosphorylation occurred before a proline residue). Finally, 48 out of the 85 phosphorylated positions described (56%) had been previously annotated in either the HPRD or the UniProt database (Table I) Table I and supplemental data S2). It is worth noting that pSer and Glu share structural similarities. Both residues hold side chains of comparable length with a net negative charge. As a consequence, we reasoned that the interaction of pSer with the B pocket could confer enough stability to the com-plex to allow B*40:02 to display ligands with phosphorylated residues at P2.
To test this hypothesis, the natural ligands S[pS]YGNIRAV and G[pS]FSRFYSL and the related mutant peptides SEYGNIRAV and GEFSRFYSL were tested for binding to B*40:02. In this assay, C1R-B*40:02 cells were acid stripped to dissociate surface HLA class I complexes. Then, a reference peptide that bound specifically to HLA-40 (Fig. 6A) was added to the cells together with human ␤2m and different concentrations of the test peptides. The amount of fluorescent peptide bound to B*40:02 was determined via flow cytometry, and the binding affinity of the test peptides was inferred from the concentration-dependent inhibition of the binding of the reference peptide.
Both phosphopeptides bound to B*40:02 with high affinity. The IC 50 value, defined as the concentration of test peptide that yielded 50% inhibition, was 1.8 and 0.3 M for S[pS]YGNIRAV and G[pS]FSRFYSL, respectively (Fig. 6). Likewise, the mutant peptides SEYGNIRAV and GEFSRFYSL showed affinity similar to or slightly higher than that of their phosphorylated counterparts (IC 50 ϭ 1.3 and 0.2 M, respectively). This further confirmed that S[pS]YGNIRAV and G[pS]FSRFYSL are true B*40:02 ligands and indicated that the substitution of Glu by pSer has little or no effect on binding affinity. The peptide GRIDKPILK, a known HLA-B*27:05 ligand, was included as a negative control, as it lacks proper motifs at P2 and P⍀ to fit the B*40:02 groove. As seen in Fig. 6, this ligand failed to inhibit the binding of the reference peptide, demonstrating that the rest of the inhibition curves actually reflect specific binding of the test peptides. peptides and proteins on a routine basis. The same techniques devised to identify proteins have been applied to the characterization of class I-bound peptide repertoires (4,(32)(33)(34). The identification of HLA class I-bound peptidomes, however, is usually a more complex task because of the relatively low amount of sample typically available. We esti-  mate that after affinity purification, only about 2 to 4 g of peptides are obtained from 10 10 cells transfected with the allotype of interest (data not shown). Despite this drawback, the high-throughput identification of HLA ligands is now feasible, and some authors have even proposed the staging of a Human Immunopeptidome Project, analogous to the Human Proteome Project (35), to systematically characterize the ligandomes of HLA antigens (36).
In this work, we focused on the characterization of the peptidome and phosphopeptidome presented by HLA-B*40: 02, a member of the B44 supertype (8). B*40:02 has been reported to predispose to adult T-cell leukemia, a non-Hodgkin's lymphoma caused by human T-lymphotropic virus type 1 (37). Apparently, this association is explained by the limited capability of HLA-B*40:02 to present epitopes derived from human T-lymphotropic virus type 1 and to trigger a strong CTL response. If tumor-specific epitopes were described, it is conceivable that B40ϩ adult T-cell leukemia patients could benefit from T-cell-based immunotherapy.
The HLA-B40-associated peptide pool was affinity purified and its constitutive peptide repertoire was acid extracted. Afterward, phosphorylated ligands were enriched sequentially by means of IMAC and TiO 2 affinity chromatography, yielding three different fractions, IMAC, TiO 2 , and flow-through, that were then analyzed via LC-MS. An MS/MS ion search allowed the identification of more than 2000 B*40:02 ligands. Most of them were identified in the flow-through, as expected, but both the IMAC and the TiO 2 fractions contributed significantly to the number of detected ligands, providing more than 300 additional sequences. The size distribution of this peptide pool was that expected for a class I ligandome, with nonamers being by far the most abundant species and accounting for 50% of the peptide repertoire. Additionally, a relatively high frequency of octamers (15%) was detected. Although not common, this feature is shared by other HLA class I antigens such as B37 or B18 (38).
For fine mapping of the B40 binding motif, peptides were grouped according to their lengths. This grouping is required because class I ligands are anchored through their N and C termini (1) to the heavy chain. Consequently, short peptides are bound in an extended conformation, whereas the central region of longer ligands protrudes from the groove (39 -42). For this reason, the proper alignment of peptides of different lengths is not straightforward, especially regarding peptide positions P4 to P⍀-2.
The major constraint for binding to B40 was found at P2, where the overwhelming majority of ligands contained acidic residues. Indeed, only 4.7% of the identified peptides showed alternative amino acids at this position, which is consistent with the estimated FDR of the whole set (Ͻ5%). This indicates that the presence of Glu2 or Asp2 is mandatory for binding to B*40:02 and suggests that the sequences without this motif The sequence, accession number (AN), and putative parental protein are indicated. ϩ and Ϫ symbols specify whether the peptide was sequenced from the IMAC or the TiO 2 fraction. If the described phosphorylation was annotated in the HPRDB and/or the UniProt database, a reference is given. Peptides phosphorylated at P2 (n ϭ 57, 65, and 81) are shown in bold letters. Phosphoserine and phosphothreonine are represented as ͓pS͔ and ͓pT͔, respectively. are probably random matches. It is also possible that contaminating peptides bound to HLA-B*35:03 were present in our dataset because, in contrast to C*04:01, B35 ligands were not filtered out during data analysis. However, only 8 out of the 2246 reported sequences (0.36%) matched the HLA-B*35:03 binding motif (43). This is consistent with the very low expression level of HLA-B35 in C1R cells caused by a point mutation in its translation initiation codon (21).
Although the three-dimensional structure of HLA-B*40:02 has not yet been elucidated, other members of the B44 supertype with an identical B pocket, such as B41, show a similar restriction at P2 (39). Analysis of the crystal structures of HLA-B*41:03 and B*41:04 reveals the presence of hydrogen bonds between the residue at P2 with Tyr99 and Glu63, van der Waals interactions with Tyr7, and potential salt bridges with His9 and Lys45 (39). As a general rule, position 45 is critical for the specificity of the B pocket. In this regard, allotypes with Lys45 such as B41 (39) or B44 (44) bind peptides with acidic residues at P2, whereas other molecules with Glu45, such as B27, show an almost absolute preference for Arg at this position (33,45,46).
Intriguingly, a size-dependent modulation of residue usage at P2 was found. Whereas B40 bound octamers with Glu and Asp at this position (68% and 26% of the sequences, respectively), among nonamers, ligands with Asp2 accounted for only 6% of the peptide set. Furthermore, no decamers or undecamers containing Asp2 were identified. At present, we have no structural explanation for this finding, and probably the determination of the three-dimensional structure of B40 will be required in order for light to be shed on this issue.
As in other HLA class I antigens, the second most influential position for binding to B*40:02 was found at the peptide C terminus. Residues with hydrophobic side chains at P⍀ were found in most cases ( Fig. 3 and supplemental data S1). By far, Leu was the preferred C-terminal residue and was present in about 65% of the identified ligands. Phe, Val, Ile, and Met were statistically overrepresented, at least in the nonamer set. Finally, Ala was also found in a number of ligands, although its frequency, lower than expected by chance, suggests that it is a suboptimal anchor motif. The molecular basis for this preference can be inferred from the crystal structure of B41, which shares with B40 most of the residues that shape the F pocket, including the key amino acids Leu95, Tyr116, Tyr123, and Trp147. In B*41:03 and B*41:04, the side chain of the residue at P⍀ is deeply buried in a hydrophobic F pocket in contact with the abovementioned residues (39). Regarding the main anchor positions, a similar binding motif has been described for the closely related allotype HLA-B*40:01 (38). However, some differences in residue usage exist. B*40:01 is more restrictive at P2; only Glu is found at this position, and at P⍀, where Phe is particularly disfavored. The structural basis for this discrepancy is not clear, as both allotypes share the same B pocket and a nearly identical F pocket. Perhaps the limited number of B*40:01 ligands identified to date-56 in the abovementioned study (38)or indirect effects involving secondary anchor residues could explain this divergence. Finally, most peptide positions showed some bias in residue usage, with the exception of P5 and P6 in decamers and P5, P6, and P7 in undecamers (Fig. 4). This lack of selection is probably a consequence of the bulged conformation that long peptides adopt to fit the binding groove (39 -42). As a result, the central region of the peptide establishes no contact with the heavy chain, and thus there are no structural constraints that drive the selection of particular motifs.
One of the main goals of this work was the characterization of the phosphopeptidome displayed by HLA-B40. The identification of phosphorylated species associated with HLA class I molecules has gained considerable attention since they were proposed as putative targets for cancer immunotherapy (12,13). Nevertheless, the identification of phosphopeptides via mass spectrometry is challenging because of several analytical limitations, such as their low stoichiometry, their inefficient ionization, or the poorly informative MS2 spectra obtained upon their fragmentation by collision-induced dissociation (18,19). To circumvent these pitfalls, two strategies were adopted: (i) perform phosphopeptide enrichment prior to LC-MS analysis, and (ii) validate every single identification by comparing both the retention times and the MS2 spectra of the natural and the synthetic peptides.
Enrichment of phosphopeptides is mandatory for the mapping of phosphorylation events in classical bottom-up workflows (18). In the same way, identification of HLA class I-associated phospholigands has benefited from the implementation of these approaches (11)(12)(13). We combined IMAC and TiO 2 affinity chromatography before LC-MS analysis, resulting in the identification of 85 B40-bound phosphopeptides. To our knowledge, this is the largest set of MHC class I phosphorylated ligands reported to date. All of them were found in the IMAC fraction, and 16 (19%) were also observed in the TiO 2 sample. No single identification belonged exclusively to the TiO 2 set, indicating that, in terms of the identification of phosphopeptides, the contribution of TiO 2 affinity chromatography after IMAC was not really valuable. However, as stated above, about 200 nonphosphorylated endogenous ligands were sequenced from this fraction exclusively, meaning that fractionating the B40-bound peptide pool using TiO 2 columns had a positive effect on the sensitivity of the LC-MS analysis.
To overcome the low quality of their MS2 spectra, all the phosphorylated sequences with Mascot scores greater than 25 were manually analyzed, and those showing signals compatible with a neutral phosphate loss were compared with the fragmentation spectra of the equivalent synthetic peptide to remove false positives. In addition, the retention times of the natural and the synthetic phosphopeptides were found to correlate closely. This approach guaranteed that the set of phosphorylated sequences presented in this study was highly curated.
In the 85 sequences reported, phosphorylation occurred in serine (91%) and threonine (9%) but not in tyrosine residues. This distribution parallels the relative frequencies of these posttranslational modifications in the proteome, namely, 90%, 10%, and Ͻ0.05% for phosphoserine, phosphothreonine, and phosphotyrosine, respectively (18). In a relatively high number of cases, phosphorylation occurred before a proline residue, probably reflecting the substrate specificity of proline-directed serine/threonine kinases, such as the MAP or the cyclin-dependent protein kinase families, which recognize and phosphorylate SP or TP sites (47). About half of the phosphorylation events described in this study had been previously reported, though not in the context of HLA-class peptide repertoires, supporting the accuracy of the identifications. 37 sequences revealed novel phosphorylation sites described here for the first time. This proves that HLA peptidomics may also contribute to the characterization and annotation of posttranslational modifications in proteins.
The main finding of this work was the identification of three ligands that lacked the canonical B40 binding motif at P2. These sequences were not false positives, as the MS2 spectra of the corresponding synthetic peptides were identical to the experimental ones. One of them derives from residues 721 to 729 of the regulatory-associated protein of mTOR (Raptor), a component of the mammalian target of rapamycin complex I, which regulates cell growth and autophagy in response to starvation (48,49). Phosphorylation of Raptor at Ser722 has been previously described (50,51). The other two sequences correspond to novel phosphorylation sites of cytochrome b-c1 complex subunit 2, a member of respiratory chain complex III, and runt-related transcription factor 3.
The three sequences harbored a pSer residue at the main anchor position instead of Asp or Glu. Furthermore, S[pS]YGNIRAV and G[pS]FSRFYSL were shown to bind to B40 with high affinity, confirming that they were bona fide ligands. Given the structural similarities between pSer and Glu in terms of size and charge distribution, we hypothesized that pSer and acidic residues would interact with the B pocket in a similar way, leading to the formation of stable complexes. Indeed, these two ligands behaved similarly, in terms of binding affinities, to the related mutant sequences SEYGNIRAV and GEFSRFYSL.
The identification of HLA class I ligands harboring phosphorylated residues at their major anchor position might be relevant in the design of immunotherapeutic approaches for cancer treatment. Given that abnormal phosphorylation is frequently observed in transformed cells, tumor-specific phosphopeptides are obvious candidates for T-cell-based therapies. However, besides a target epitope, a specific CTL response is required in order to eradicate transformed cells. When phosphorylation occurs at nonanchor residues, both the phosphorylated and the nonphosphorylated species will likely be displayed. If this is the case, a T-cell might recognize both species due to cross-reactivity. Although specific CTLs can be raised against phosphopeptides (11), some cytotoxic activity against their nonphosphorylated counterparts may be present (52). Thus, even if a CTL recognizing a phosphorylated ligand could escape negative selection in the thymus, cross-reactivity with the nonphosphorylated epitope would hamper its use as a therapeutic agent. In contrast, peptides phosphorylated at P2 can bypass this limitation, as the posttranslational modification of the residue is essential for binding to the class I molecule. In this scenario, cross-reactivity with the unmodified epitope is not possible, guaranteeing the specificity of the CTL response. Therefore, an important goal of future work will be the identification of HLA class I ligands phosphorylated at P2 in tumor samples.