MHC class I loaded ligands from breast cancer cell lines: A potential HLA-I-typed antigen collection.

To build a catalog of peptides presented by breast cancer cells, we undertook systematic MHC class I immunoprecipitation followed by elution of MHC class I-loaded peptides in breast cancer cells. We determined the sequence of 3196 MHC class I ligands representing 1921 proteins from a panel of 20 breast cancer cell lines. After removing duplicate peptides, i.e., the same peptide eluted from more than one cell line, the total number of unique peptides was 2740. Of the unique peptides eluted, more than 1750 had been previously identified, and of these, sixteen have been shown to be immunogenic. Importantly, half of these immunogenic peptides were shared between different breast cancer cell lines. MHC class I binding probability was used to plot the distribution of the eluted peptides in accordance with the binding score for each breast cancer cell line. We also determined that the tested breast cancer cells presented 89 mutation-containing peptides and peptides derived from aberrantly translated genes, 7 of which were shared between four or two different cell lines. Overall, the high throughput identification of MHC class I-loaded peptides is an effective strategy for systematic characterization of cancer peptides, and could be employed for design of multi-peptide anticancer vaccines.


SIGNIFICANCE
By employing proteomic analyses of eluted peptides from breast cancer cells, the current study has built an initial HLA-I-typed antigen collection for breast cancer research. It was also determined that immunogenic epitopes can be identified using established cell lines and that shared immunogenic peptides can be found in different cancer types such as breast cancer and leukemia. Importantly, out of 3196 eluted peptides that included duplicate peptides in different cells 89 peptides either contained mutation in their sequence or were derived from aberrant translation suggesting that mutation-containing epitopes are on the order of 2-3% in breast cancer cells. Finally, our results suggest that interfering with MHC class I function is one of the mechanisms of how tumor cells escape immune system attack.


Introduction
Breast cancer is the most frequently occurring cancer in women in all racial and ethnic groups [1]. Despite the positive outcomes for most breast cancer patients, the side effects of current treatment are substantial [2][3][4], and lower toxicity treatments are needed. Further, for a significant minority of breast cancer patients, current treatments are inadequate. Substantial advances in the field of immunotherapeutics have resulted in approvals of both vaccines [5] and immune checkpoint inhibitors [6].
It is now recognized that invasive ductal carcinoma of the breast is a heterogeneous disease consisting of several major molecularly defined subtypes, including Luminal A, Luminal B, HER2, Basal, triple-negative, and the claudin-low subset [7,8]. These subtypes have distinct clinical, genomic, and proteomic features; and it is becoming clear that there are differences between breast cancer subtype and response to specific therapeutic agents [9,10]. Luminal tumor cells consisting of luminal A and B cells look like cells that start in the inner (luminal) cells lining the mammary ducts. Luminal A tumors have the best prognosis with high survival rate and low recurrence rate. Luminal B cells have poorer prognosis than luminal A cells and luminal B cells also tend to be estrogen receptor (ER) positive. Triple negative breast cancer consists of several subsets. One subset is basal-like. Basal-like tumor cells look like outer (basal) cells surrounding the mammary ducts. Most triple negative tumors cells are also basal-like and vice versa most basal-like tumor cells are also triple negative cells. Triple negative/basal-like tumors are often aggressive and have a poorer prognosis as compared to luminal A and B tumors. Claudin-low cells represent less common molecular subtypes of breast cancer. Claudin-low cells have more enriched cancer-stem cell-like features and higher activities of ER and progesterone receptor (PR) pathways than basal tumor cells. Claudin-low cells are also least differentiated among all subtypes of breast cancer cells and preferentially display a triple-negative phenotype. Claudin-low tumors are associated with poor survival. Human epidermal growth factor receptor 2 (HER2) subtype breast tumors can be HER2-positive (70%) and HER2-negative (30%). HER2 tumors are tend to be both ERand PR-negative and have poor prognosis [11].
A powerful and safe approach to breast cancer therapy is to amplify the patient's anti-tumor immune response by stimulating autologous T cells with antigen-presenting cells (APCs) loaded with breast cancer-specific antigens. T-cell based immunotherapeutic approaches require knowledge of cancer-specific antigens, and the capacity to determine which of these cancer-specific antigens are presented by the cancer cells of the individual patient. During the past two decades, antigens from various tumor cells have been identified, and their MHC class I-restricted epitopes have been predicted and confirmed in T cell-based assays [12][13][14][15]. Current strategies to identify immunogenic tumor specific MHC class I-restricted epitopes rely on either expression analysis of tumor associated antigens (TAA) followed by synthesis of predicted peptides and T cell activation assays [16,17], or elution of MHC class I-loaded peptides directly from tumors followed by mass spectrometry analysis, gene expression profiling, and in vitro T cell assays [18]. The first strategy largely depends on the accuracy of the predicting tool which is often compromised by the inability to take into account all of many factors such as proteasomal cleavage specificity, additional trimming at the Nterminus of peptide, and transport of peptides into the endoplasmic reticulum [19]. Proteasomal cleavage is especially important in peptide prediction due to differences in regulation of gene expression and cleavage preferences of proteasomes in tumor cells compared to normal cells [20,21]. The second strategy, while less subject to error, is limited because tumor cells extracted from cancer patients are often unavailable or only available in small quantities. To address these shortcomings, we used well established breast cancer cell lines to elute MHC class I-loaded peptides followed by identification of their sequences by mass spectrometry.
We describe systematic MHC class I immunoprecipitation followed by elution of MHC class I-loaded peptides from 20 breast cancer cell lines belonging to the four major subtypes. We determined the sequence of more than 2,700 unique non-mutated MHC class I-loaded peptides from 1,921 proteins as well as the sequence of 85 mutation-containing peptides and 4 peptides derived from aberrantly translated genes. MHC class I binding probability of eluted peptides was also used to plot the distribution of non-mutated MHC class I allelespecific peptides in accordance with the binding score. Finally, by analyzing the available peptide data set, we determined that more than 1,750 peptides have been identified in previous studies, and of these, sixteen peptides have been shown to be immunogenic.

Immunoprecipitation and elution of MHC class I-bound peptides and mass spectrometry (MS) analysis of eluted peptides
MHC class I-bound peptides were isolated according to published protocols [23][24][25] using mouse anti-human HLA (clone W6/32)-specific antibodies followed by acid treatment and concentration by vacuum centrifugation. Briefly, 10 8 cells were pelleted and lysed in 40 ml of PBS containing 50 mM n-octyl-β-d-glucopyranoside, 1 mM CaCl 2 , 1 mM MgCl 2 , 1 mM phenylmethylsulfonyl fluoride, 1 µg/ml leupeptin, 1 µg/ml pepstatin, and 1 µg/ml aprotinin for 1 h at 4°C. Insoluble material was removed by centrifugation. Next, supernatants were incubated overnight at 4°C with 200 µg of mouse anti-human HLA antibodies (clone W6/32) and 100 µl of a 50% Protein G plus agarose slurry (Pierce). Following washes, the beads were treated with 0.1% trifluoroacetic acid (TFA) for 30 min and eluted peptides were passed through 10 kDa cut-off filter units (Millipore). Flow through solutions were dried by vacuum centrifugation and dissolved in 5% formic acid. Peptides were separated using liquid chromatography with a nanoAcquity UPLC system (Waters), then delivered to an LTQ Velos linear ion trap mass spectrometer (Thermo Fisher Scientific) using electrospray ionization with a Captive Spray Source (Michrom Biosciences). Samples were applied at 15 µl/min to a Symmetry C18 trap cartridge (Waters) for 10 min, then switched onto a 75 µm × 250 mm nanoAcquity BEH 130 C18 column with 1.7 µm particles (Waters) using mobile phases water (A) and acetonitrile (B) containing 0.1% formic acid, 7-30% acetonitrile gradient over 106 min, and 300 nl/min flow rate. Data-dependent collection of MS/MS spectra used the dynamic exclusion feature of the instrument's control software (repeat count equal to 1, exclusion list size of 500, exclusion duration of 30 sec, and exclusion mass width of −1.0 to +4.0) to obtain MS/MS spectra of the ten most abundant parent ions (minimum signal of 5,000) following each survey scan from m/z 400 to 1,400.

Mass spectrometry (MS) data analysis
Several releases of the stable human Swiss-Prot canonical protein databases were used (versions 2011.06, 2011.08, 2011.11, 2012.03, 2012.06) with about 20,200 sequences during the course of the project. The database releases were compared and all proteins associated with identified peptides were present in all databases so the results could be safely combined. We used sequence-reversed databases to estimate error thresholds [26]. After adding 179 common contaminant sequences, reversed sequences were constructed and concatenated for final protein databases of roughly 40,800 sequences. Peptide identification was performed with SEQUEST/PAWS [27], using no enzyme specificity, average parent mass tolerance of 2.5 Da, monoisotopic fragment ion mass tolerance of 1.0 Da, and variable modification of +16 Da on methionine residues with a maximum of 2 modifications per peptide. The data from each cell line corresponded to a single liquid chromatography (LC) MS run. It is difficult to apply target/decoy statistical error control methods when dealing with small datasets. We combined set search score thresholds to exclude the main decoy score distributions similarly across all 22 datasets from all 20 cell lines into combined target and decoy score histograms to establish score thresholds to achieve an overall desired false discovery rate (FDR) of 4.65%. Score thresholds were set independently for peptides of different charge states (1+, 2+, and 3+). The individual sample FDRs varied from the overall FDR depending on the number of peptides present in each sample preparation and ranged from 2.8% to 7.8%. The MS proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRoteomics IDEntifications (PRIDE) database partner repository [28] with the dataset identifier PXD006406.
For the identification of recently predicted mutation-containing HLA peptides [29], the Thermo LTQ RAW files were converted to compressed text files using MSConvert of the ProteoWizard (version 3.0.11383) toolkit [30]. Comet (version 2017.01 rev. 2) [31] was used to search a custom FASTA database constructed from predicted HLA peptides [29] to assign peptide spectrum matches (PSMs). The Comet parameters were an average parent ion mass tolerance of 2.5 Da, a monoisotopic fragment ion mass tolerance of 1.0005 Da, no static amino acid modifications, variable oxidation of methionine (M+15.9949 Da), and no enzyme for the protein digestion. The FASTA database was created from the peptides listed in Supplemental Table 4 of [29]. The FASTA accession (key) was constructed by concatenating the cell line name, the protein identifier, and the peptide sequence. The protein sequence was the predicted HLA peptide sequence. There were 40,813 total predicted peptides. Common contaminants (179 sequences) were included, and decoy sequences added by sequence reversal. The final FASTA file had 81,984 sequences.
The Comet search scores were processed with the PAW pipeline [27]. PSMs were separated by cell line (a single mass spectrometer RAW file) and by charge state (1+, 2+, or 3+), and filtered to allow only matches to the full-length predicted HLA peptides (shortened peptide forms are scored during the no enzyme search). The separate PSM lists were then sorted from highest discriminant function score to lowest. Matches to the HLA peptides were accepted until the first decoy match score. The ranking by discriminant score and acceptance filtering were performed with Excel.
To identify altered peptides including peptides with insertions and deletions, we also used the Enosi pipeline [32]. The tool combines sample specific RNA-seq data with RNA-seq data from the TCGA project to create a custom database encoding mutations, splice variants, and other genomic lesions, and uses a two stage FDR controlled pipeline to identify altered peptides that are not part of the global reference.

HLA typing
HLA typing-RNA-seq data were obtained from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48213). The HLA type of each cell line was identified by using paired end reads in fastq format as input to the seq2HLA tool [33].

MHC class I binding predictions and distribution of MHC class I allele-specific peptides
MHC class I binding predictions were made using IEDB analysis resource tool, which combines predictions from the Consensus method [34] consisting of Artificial Neural Network (ANN) [35,36], Stabilized Matrix Method (SMM) [37], and Scoring Matrices derived from Combinatorial Peptide Libraries (Comblib) [38], and NetMHCpan [39] method. To predict peptide:MHC-I binding affinity, we used the Consensus method consisting of ANN, SMM, and Comblib if any corresponding predictor is available for the molecule. Otherwise, NetMHCpan is used. Based on availability of predictors and previously observed predictive performance, this selection tries to use the best possible method for a given MHC molecule. Thus, multiple methods were used to predict peptide binding affinity. If two methods predicted the binding affinity of the same peptide, the consensus value was taken.

ICC and RNA-seq analyses of breast cancer cell lines
In addition to MHC class I-positive breast cancer cells, we sought to identify HLA-A*02positive cell lines because the HLA-A*02 allele occurs frequently in all ethnic groups. HLA-A*02 has been identified in 35% of African-Americans and in 50% of Caucasians [40]. Each cell line was stained with the mouse anti-human HLA (clone W6/32) and HLA-A*02specific (clone BB7.2) antibody followed by Alexa Fluor 488-conjugated donkey antimouse IgG. Control staining was performed with non-specific mouse IgG antibodies. Representative images of the stained MCF7 and MDA-MB-231 cells are shown in Figure  1A. As expected, staining with control mouse IgG showed no signal. Typical quantitative data of HLA-A*02 for MCF7 and MDA-MB-231 is shown in Figure 1B. Quantitative levels of total MHC class I and HLA-A*02 expression in all analyzed breast cancer cell lines are shown in Supporting Information Figure S1.
The level of MHC class I and HLA-A*02 expression in MDA-MB-231 cells was arbitrarily set to 100%, and the expression level of MHC class I and HLA-A*02 in other lines was calculated as a percentage of MDA-MB-231 staining (Table 1). Cell lines were grouped by subtypes and by their measured HLA-A*02 protein throughout the manuscript. MHC class I status in selected cell lines was also determined using RNA-seq analysis. The data showed that, in most cases, MHC class I staining (protein expression) did not correlate with the level of MHC class I mRNA. In addition, MHC class I staining was below the limit of detection in MDA-MB-157, JIMT1, CAL-51, ZR75B, SKBR3, and BT474 cells despite the fact that MHC class I mRNA was present in these cells.
We identified MHC class I alleles in the breast cancer cell lines by RNA-seq using seq2HLA [33] to map RNA-seq reads against a reference database of HLA alleles [29] (Table 1). Our HLA typing agrees with the data published by Boegel et al [29]; however, there is a discrepancy between our data and the data used by the TRON Cell Line Portal (http:// celllines.tron-mainz.de/) with regard to MCF-7 cell line. This discrepancy could arise if there were multiple, genetically unique subpopulations within the MCF-7 breast cancer cell line as described by Tan et al. [42]. The validity of our HLA typing was also confirmed by the observation of the same HLA-typing profile in LY2 cells, as both MCF7 and LY2 cell lines were derived from the same patient [43]. With regard to HLA-A*02 allele, there was general agreement between protein expression and genotype; however, some cell lines were phenotypically HLA-A*02-negative even though HLA-A*02 mRNA was present (Table 1).

MHC class I immunoprecipitation and elution of MHC class I-loaded peptides
To identify MHC class I-restricted peptides expressed on the cell surface, MHC molecules immunoprecipitated as described in Methods were analyzed by Western blotting using anti-MHC class I antibodies (Supporting Information Figure S2). The major purpose of this Western blotting analysis was to confirm the robustness of our protocol to elute MHC class I ligands from the cell surface of breast cancer cells. The obtained results with MDA-MB-231 cells showed that the selected antibodies were able to immunoprecipitate MHC class I molecules. In addition, we found that the selected antibodies were not able to deplete MHC class I molecules from the lysate of approximately 10 8 cells. Because the total level of MHC class I expression determined by immunocytochemistry is comparable in most analyzed cell lines (Table 1), we concluded that 10 8 cells expressed enough MHC class I molecules to saturate the used amount of MHC class I antibodies.
Flow through solutions of eluates from control IgG and anti-MHC class I samples were subjected to MS analysis. IgG control sample from HCC1187 cells contained few peptides (Supporting Information Figure S3). In contrast, anti-MHC class I sample from HCC187 cells contained several hundred peptides. Immunoprecipitation and elution of peptides were repeated for all selected cell lines (Table 1) in independent experiments.
We used the PAW processing pipeline [27] to identify cell surface peptides and control peptide false discovery rates. Peptide summary reports are presented in Supporting Information Table S1. Peptides that map to a single protein are indicated as unique (TRUE in "Unique" column), while redundant peptides are indicated as non-unique (FALSE in "Unique" column). Common contaminating peptides and peptides eluted from both the control IgG agarose and empty agarose were removed from the Table S1. In addition, we found that some peptides were derived from MHC class I proteins themselves. We removed these peptides from our dataset because we could not exclude the possibility that these peptides were non-specifically captured due to the degradation of MHC class I proteins during immunoprecipitation and acid treatment. Indeed, we identified a few mouse IgGderived peptides in the eluted samples that were non-specifically captured during the treatment. Peptides and corresponding proteins were arranged according to the frequency that these peptides were observed (MS/MS total spectral counts). The total numbers of identified peptide spectral counts from target and decoy databases matches are also tallied in a tab of Supporting Information Table S1. Peptide FDR were estimated from decoy (reversed) protein entries [26].
The data in the Table 1 suggest that high levels of cell surface MHC class I protein expression do not guarantee high levels of peptide loading. For example, in three independent experiments, only small number of peptides was recovered from MDA-MB-231 cells, which expressed high levels of MHC class I. We did not identify any shared peptides between these three peptide identifications most likely because there was a large antigen excess as compared to the ability of antibodies to precipitate it ( Figure S2). In addition, we assume that MDA-MB-231 cells have impaired MHC class I function. In support of our conclusion, the available RNAseq data showed that MDA-MB-231 cells harbor mutationcontaining genes bach1, cpsf1, and ier5, which are directly involved in antigen processing and presentation [44]. Mutations in these genes most likely result in defects in the expression of MHC class I-loaded peptides on the cell surface. It implies that MDA-MB-231 cells mostly express MHC class I proteins without any loaded peptides like T2 cells that have defects in transporter associated with antigen processing (TAP) proteins. Thus, it would be hard to identify shared peptides that are presented via defected MHC class I presentation machinery. A total of 3,196 peptides were eluted from MHC class I molecules immunoprecipitated from the surface of breast cancer cells belonging to the 20 cell lines. After removal of duplicate peptides, i.e., the same peptide eluted from more than one cell line, the total number of unique peptides and corresponding proteins was 2,740 and 1,921, respectively. The identity and frequency of shared peptides in all cell lines are presented in the Supporting Information Table S2. We also included in the Supporting information Table  S2 gene expression data of all corresponding proteins from which eluted peptides were derived.
Next, we asked if the eluted peptides contain antigenic mutations, which are primary targets for T cell response. For this purpose, our raw MS data was screened against a catalog of predicted mutation-containing HLA peptides (Supporting Information Table S3) [29]. We determined that among eluted peptides there are 85 peptides that contain mutation in their sequence. In addition, we used the Enosi pipeline, which can detect peptides derived from mutated proteins and from either proteins with deletions and insertions or aberrantly translated genes using RNA-seq data from the TCGA project [32]. We identified tow peptides derived from reverse translated genes tap1 (ATAPGLGGGPEPLGR) and ikbkap (EIISDPGVQGYSR) in HCC1806 cells as well as one peptide derived from translations of a +1 frame-shifted gene cp4 (AVASINSSEALR) in UACC812 cells and one peptide derived +2 frame-shifted gene clipr1 (TAFESITSSDQR) in HCC1806 cells (Supporting Information   Table S4).

Data mining and validation
Analysis of our data showed that some peptides were frequently presented on the cell surface of different breast cancer cells (Supporting Information Table S2). As can be expected, peptides derived from proteins that are expressed at high levels in cancer and normal cells such as elongation factor 2 (EEF2), fructose-bisphosphate aldolase A (ALDOA), E3 ubiquitin-protein ligase RNF213 (RNF213), cytoplasmic dynein 1 heavy chain 1 (DYHC1), helicase with zinc finger domain 2 (HELZ2), and eukaryotic translation initiation factor 3 subunit D (EIF3D) were most frequently presented in the context of MHC class I. However, many of the peptides were derived from low abundance proteins, furthermore, a subset of those had corresponding mRNA levels below the threshold of detection. Even mutation-containing peptides were derived from genes that have both high and low level of expression in the analyzed breast cancer cells. The presence of peptides from low-abundance genes implies some selectivity process. It is noteworthy that identical peptides were eluted from the cell surface of different cell lines confirming the reproducibility of the obtained results. To further elaborate on this issue, we identified peptides either specific or shared between different breast cancer subtypes, which could be of interest to design therapies targeted to specific breast tumor types (Supporting Information Table S5).
To determine if the eluted MHC class I-bound peptides had been identified in previous studies, we searched IEDB, which consists of more than 167,000 human unique peptides. IEDB is sponsored by the National Institute for Allergy and Infectious Diseases (NIAID) who launched a large-scale antibody and T cell epitope discovery program. Data sources to be integrated into the IEDB include publications in peer-reviewed journals, published patents, and direct submissions from institutions or companies. Each publication can also be classified by its general topic (e.g. infectious diseases, autoimmunity, allergy, transplantation, HIV or cancer). Thus, IEDB includes epitopes from a variety of sources and constitutes the most comprehensive database/resource for HLA ligands [45]. We determined that, of the 2,740 eluted unique peptides, 1,751 peptides have previously been shown to bind MHC class I proteins. Importantly, 16 of the peptides in our data set were active in T cell activation assays published earlier ( Table 2) and half of these immunogenic peptides were shared between different breast cancer cell lines. Surprisingly, we found that these immunogenic epitopes were seldom related to breast cancer, and one of them was previously identified as a target of T cells in chronic lymphocytic leukemia (CLL) patients [57].
To find new breast cancer specific MHC class I-loaded epitopes that could have the ability to activate T cell response, we used gene expression profiling to determine the MHC class Ipresented genes that have alterations or elevated expression levels in breast tumors compared to normal cells. First, we determined genes whose expression is altered in invasive breast cancers by copy number amplification, homozygous deletion, mRNA upregulation or downregulation, and mutation using the cBioPortal for Cancer Genomics that contains largescale cancer genomics data sets (Supporting Information Table S6). We arranged all identified genes in accordance with the frequency of alterations in breast cancer samples. For further analysis we selected genes that have alterations in at least 20% of breast cancers.
Second, we used gene expression data from of 708 breast tumors and 329 normal tissues from TCGA (20), EBI (21), and GEO (22) to identify among MHC I-presented genes those genes that have preferential expression in breast cancer samples over normal samples. We averaged expression among all tumor and normal samples for each gene and arranged all genes by the level of differential expression in tumor and normal samples (Supporting Information Table S6). In this analysis we selected genes that have at least 2 times higher expression in breast tumors than in normal tissues. We also attempted to select breast cancer specific candidate genes using gene expression data sets based on RNA-seq data for 62 breast carcinoma cell lines and 6 non-transformed cell lines (Supporting Information Table  S6). We averaged expression data for each gene across all breast carcinoma cell lines and non-transformed cells and for further analysis we selected genes that have 2 times higher expression in transformed than in non-transformed cells. Interestingly, most genes that were overexpressed in breast tumors as compared to normal tissues do not show a similar overexpression pattern in breast carcinoma cell lines as compared to non-transformed cells.
Next, we calculated the immunogenicity score for this set of proteins that were associated with cancer and compared it with that of the known 1217 immunogenic epitopes present in the IEDB database. To determine the immunogenicity score, we employed a tool that was used to predict immunogenicity of viral MHC class I ligands [59]. Previous studies demonstrated that this tool can be successfully used to predict immunogenicity of cancerrelated peptides [60]. We find that our cancer-related dataset contains a number of peptides that have score higher than 0.20, which is close to the top 20% most immunogenic known epitopes (Supporting Information Table S6). These peptides are good candidates for further testing in T cell-based immunogenicity assays.
The availability of comprehensive HLA-A, B, and C typing also allowed us to predict the binding probability of each peptide to the HLA molecules present in the corresponding cells including mutation-containing peptides and peptides derived from aberrantly translated genes (Supporting Information Table S7). For this purpose, we used a Consensus method [34] recommended by IEDB, which consists of ANN [35,36], SMM [37], Comblib [38], and NetMHCpan [39]. It has been shown that the consensus method is superior to any single prediction technique tested [61]. The predicted affinity of each peptide was expressed as a percentile rank, generated by comparing the peptide IC50 value in relation to the corresponding HLA allele against those of a set of random peptides from the SWISSPROT database. Lower rank values indicate higher predicted peptide affinities. Next, we plotted the number of peptides (counts) in relation to these percentile ranks for each HLA allele ( Figure  2). For strong binders, we selected peptides with rank ranging from 0.1 to 1. Our data showed that UACC812, CAMA-1, SUM185PE, HCC1806, and T47D cells use only one HLA allele to present almost all high affinity peptides (rank 0.1-1), while other cells use two or more HLA alleles to load high affinity peptides. To distinguish binders from non-binders, we used rank 10. Based on these selection criteria, we determined the number and ratio of peptides that are not predicted to bind any HLA allele. The ratio of non-binders was in the range of FDR (Supporting Information Table 1) thereby confirming the validity of rank 10 to distinguish between binders and non-binders. We also determined the quantity of shared high affinity peptides between HLA alleles in corresponding cells and ratio of these shared peptides in relation to all eluted peptides ( Figure 2). Our data showed that some high affinity peptides are shared between different HLA alleles while other peptides are not.
which are the focus of intensive molecular and phenotypic characterization [22,43,62]. Efforts to develop a breast cancer vaccine have largely focused on eliciting immune responses to HER2 [63][64][65]. It is unclear, however, whether HER2-derived epitopes are the most potent antigens for HER2-overexpressing breast cancer cells. Identification of new and potent T cell-activating epitopes expressed on the surface of cancer cells remains critical for development of successful immunotherapies. In this paper, we have demonstrated that established cancer cell lines can be used as a source to identify MHC class I-restricted and immunogenic epitopes. The previously identified immunogenic peptides that were present in our dataset do not harbor any mutations in their sequences, thereby their ability to activate T cells would be a result of alterations in either the expression or processing pattern as compared to normal cells.
We identified 2,740 unique peptides eluted from breast cancer cells and 1,921 corresponding proteins. Actually, few papers have described surveys of tumor-associated immunopeptidomes with identifications of more than 100 HLA-bound peptides [60,66,67]. For example, Bassani-Sternberg et al. [60] developed a high-throughput MS-based workflow that yielded more than 22,000 unique HLA peptides across seven cancer cell lines and primary cells. Authors used a similar approach, however, with 5 times more cells than we used in the immunoprecipitation experiments. Another distinction from our approach is the covalent binding of W6/32 antibody to Protein-A Sepharose beads that could improve the yield and purity of the eluted HLA-I complexes. We believe that the increased number of cells and use W6/32 antibody covalently bound to the beads resulted in approximately 10fold improvement in the yield of the eluted peptides as compared to our protocol. On the other hand, the dataset published by Bassani-Sternberg et al. [60] contained only four previously identified immunogenic epitopes while our dataset contains sixteen immunogenic epitopes identified in previous studies. This 4-fold difference in the number of immunogenic epitopes identified in our studies and in the work of Bassani-Sternberg et al. [60] can be result of either type of cancer cells used in the MHC class I immunoprecipitation and epitope elution experiments or difference in the experimental techniques. Most other related works identified far fewer MHC class I-bound ligands. One of the explanations for the inability to elute a large number of peptides from the cell surface can be related to downregulation of HLA ligand presentation machinery in cancer cells. There are many different ways used by cancer cells to escape immune attack. One of them is to reduce the cell's ability to present HLA ligands on the cell surface. For example, our data and results published by others showed that approximately 40% of breast cancers lost MHC class I RNA expression [68]. MHC class I expression can also be downregulated at the post transcriptional level. In agreement, Table 1 showed that expression of MHC class I RNA does not always correlate with that of MHC class I protein abundance. In addition, HLA ligand presentation can be downregulated by reduction/loss of TAP expression that occur in 29% of primary and 42% of metastatic breast cancers [68]. The available RNA-seq and immunohistochemistry data allowed us to relate HLA expression with the quantity of eluted peptides. RNA-seq and HLA expression data do not always correlate with the number of eluted peptides. Thus, we conclude that RNA-seq and protein expression data cannot be used to estimate the level of HLA-loaded peptides on the cell surface and additional factors should be taken into account.
Our data suggest that in most cases the identified peptides were expressed at high or moderate levels on the cell surface. This observation is in agreement with the recently published results of Bassani-Sternberg et al. [60], who demonstrated a strong correlation between protein abundance and HLA-presentation. We have not seen any depletion of antigen after immunoprecipitation, and more frequent MHC class I/peptide complexes could occupy all antigen binding sites of the antibodies used for immunoprecipitations. The fact that only stable MHC class I/peptide complexes are efficiently transported to the extracellular membrane and have an extended half-live on the cell surface indicates that the level of peptide expression is closely linked to peptide affinity for the presenting MHC class I molecule [69]. In agreement, plotting the number of peptides in relation to percentile ranks demonstrated that the ratio of high affinity peptides (rank 0.1-1) is higher than that of lower affinity peptides for most HLA alleles which are preferentially used to present peptides. It is important; however, to identify low affinity peptides, which can be immunogenic [70]. There are some approaches to address this issue. The first approach is to increase the antibody/ antigen ratio and couple the antibodies to a solid support [60]. A second approach, far less specific, is to use a lectin-coated resin to capture all glycosylated MHC class I molecules [71]. A third, non-specific, approach is to elute peptides directly from cells as has been described previously [67,72]. This technique is especially useful for clinical samples where tumor material is often available in only very small quantities.
Recently, Boegel et al. [29] showed that, on average, breast carcinoma cells harbor approximately 300 mutation-containing genes per genome, and the authors predicted that 60-70% of the corresponding mutation-containing peptides can be efficiently presented by MHC class I molecules present in the same cells. To confirm that MHC class I molecules are loaded with mutation-containing peptides, we searched our raw MS data against database of the predicted antigenic mutations [29]. Interestingly, we determined that our eluted peptide pool includes 85 mutation-containing peptides, and 7 of them were shared between different breast cancer cell lines. These shared mutation-containing peptides could provide a basis to design a multi-peptide cancer vaccine. To identify peptides that are not present in normal cells, we also employed an algorithm specifically designed to identify altered peptides [32]. We identified four non-normal peptides that were derived from alternative translations of genes, including tap1 (reverse frame translation), c4a (+1 frame shifted translation), ikbkap (reverse frame translation), and glipr1 (+2 frame shifted translation). Altogether, ur data indicated that altered or mutation-containing peptides are presented at the level of 2-3% on the cell surface of breast cancer cells. Our data also suggest that it is feasible to identify immunogenic epitopes derived from wild type genes expressed in breast cancer cells. Importantly, our data also showed that either non-mutated immunogenic HLA ligands or mutation-containing neo-epitopes cab be shared between different cancer cells that do not have identical HLA alleles. The ideal candidate epitopes for vaccine development will be those that are commonly presented by tumor cells. Our data also confirmed that immunogenic epitopes can be shared between different cancer types including leukemia and epithelial cancers.
The knowledge of HLA genotype and probability of binding of each peptide to MHC class I present in the corresponding cells allowed us to plot the distribution of ranked MHC class I allele-specific peptides for each breast cancer cell line. Our data showed that breast cancer cells use different quantity of MHC class I alleles to present high affinity peptides (rank 0.1-1.0). For example, HCC1806, CAMA-1, SUM185PE, T47D, and UACC812 cells use only one MHC class I allele to present almost all high affinity peptides, while other cells use two or more MHC class I alleles to present high affinity peptides. Based on the inability of many MHC class I alleles to efficiently bind and present high affinity peptides, we propose that during breast cancer progression malignant cells loose MHC class I functionality to interfere with peptide presentation machinery and, as a consequence, avoid immune system attack. Our conclusion that interfering with MHC class I functionality can be used by cancer cells to escape the immune attack is supported by the published data showing that approximately 40% of breast cancers lost MHC class I RNA expression [68].
The current work was driven mostly by our optimized technique to elute MHC class Iloaded peptides from different breast cancer cells with an aim to build a potential HLA-Ityped antigen collection. We understand that a limitation of our work is the absence of immunogenicity analysis of the eluted peptides. We proposed that the gap between the basic research and clinical utility of the project may be effectively closed by developing a high throughput protocol to identify immunogenic epitopes among eluted peptides. To accomplish this task, we developed a technique to attach a patient's own PBMC-derived antigen-presenting dendritic cells (DCs) to the array matrix. The attached DCs can be maturated to present pre-printed peptides on the cell surface (Fig. 3). Subsequent ELISPOT assays using a patient's own T cells will identify peptides that can induce the T cell-based immune response.
We envision that future design of personalized vaccines will be based on the HLA allele subtypes of the individual patient and the ability of these HLA allele subtypes to present MHC class I-restricted peptides. Distribution profiles of presentation of known immunogenic epitopes by each MHC class I allele type will likely inform the design of multi-peptidebased vaccines. In support of these conclusions, in a recent report, nine peptides were used as the basis for a multi-peptide vaccine against renal cell carcinoma cells [18].

Conclusions
In summary and in agreement with previous studies [60], our results demonstrate that established cancer cell lines can be used to identify tumor-specific and immunogenic MHC class I-restricted peptides. Because it is extremely difficult in most cases to obtain enough fresh tumor for efficient immunoprecipitation and MS analysis of eluted peptides, we used a more feasible cell line-based approach to elute peptides from the cell surface followed by sequence determination using MS.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.

Significance
By employing proteomic analyses of eluted peptides from breast cancer cells, the current study has built an initial HLA-I-typed antigen collection for breast cancer research. It was also determined that immunogenic epitopes can be identified using established cell lines and that shared immunogenic peptides can be found in different cancer types such as breast cancer and leukemia. Importantly, out of 3,196 eluted peptides that included duplicate peptides in different cells 89 peptides either contained mutation in their sequence or were derived from aberrant translation suggesting that mutation-containing epitopes are on the order of 2-3 % in breast cancer cells. Finally, our results suggest that interfering with MHC class I function is one of the mechanisms of how tumor cells escape immune system attack.   Binding probabilities of each peptide to the HLA allele present in the corresponding cells. Only 8-14-mer peptides were used for this analysis. Percentile rank for each peptide was generated by comparing the peptide's binding EC50 against those of a set of random peptides from SWISSPROT database. Numbers to the right of cell line name represent the number of analyzed peptides. Numbers below the MHC class I allele type indicate the ratio of ranked peptides in relation to all peptides. High affinity peptides (rank 0.1-1) are presented by two, three or four, and one MHC class I alleles. Numbers below plots and numbers in brackets represent quantity of shared high affinity peptides between different alleles and ratio of these shared peptides in relation to all eluted peptides, respectively. The number of peptides that are not predicted to bind any HLA allele (non-binders) and their ratio to all peptides are also indicated below the plots.   HLA phenotype and genotype in breast cancer cell lines.