Know thy immune self and non‐self: Proteomics informs on the expanse of self and non‐self, and how and where they arise

Abstract T cells play an important role in the adaptive immune response to a variety of infections and cancers. Initiation of a T cell mediated immune response requires antigen recognition in a process termed MHC (major histocompatibility complex) restri ction. A T cell antigen is a composite structure made up of a peptide fragment bound within the antigen‐binding groove of an MHC‐encoded class I or class II molecule. Insight into the precise composition and biology of self and non‐self immunopeptidomes is essential to harness T cell mediated immunity to prevent, treat, or cure infectious diseases and cancers. T cell antigen discovery is an arduous task! The pioneering work in the early 1990s has made large‐scale T cell antigen discovery possible. Thus, advancements in mass spectrometry coupled with proteomics and genomics technologies make possible T cell antigen discovery with ease, accuracy, and sensitivity. Yet we have only begun to understand the breadth and the depth of self and non‐self immunopeptidomes because the molecular biology of the cell continues to surprise us with new secrets directly related to the source, and the processing and presentation of MHC ligands. Focused on MHC class I molecules, this review, therefore, provides a brief historic account of T cell antigen discovery and, against a backdrop of key advances in molecular cell biologic processes, elaborates on how proteogenomics approaches have revolutionised the field.

of the innate immune system is sufficient to return the host's internal milieu back to normalcy; when not, the adaptive immune system is engaged. One end-product of the innate immune response is the display of fragments-fragments derived from agents that incite alterations in the host's internal milieu-on the surface of certain innate immune cells called dendritic cells (DCs). Such fragmentary end-products are recognised by T lymphocytes to initiate an adaptive immune response.

ESSENTIAL HISTORY OF THE FIELD
The 1980s and 1990s were exciting times for students of antigen processing and presentation, and T cell biology. By this time immunologists and geneticists had established that the antigen(s) coded by the MHC controlled allogeneic skin and tumour graft rejection both in mice and men [1,2]. As well, the 70s witnessed the first descriptions of MHC restriction [3,4]-a process that controlled host T and B cell responses to proteins, viruses, and bacteria. These two seemingly distinct immunologic recognition processes needed a biochemical definition. By the late 1970s and early 1980s Nathenson and colleagues had devised ways to cleave MHC-I molecules from cell surfaces and adapted a radiochemical method which, coupled with Edman degradation, unveiled the first primary structure of an MHC molecule-H-2K b (H-2, histocompatibility-2, the MHC of the mouse).
Immediately thereafter, primary structures of several other MHC molecules were determined [5,6].
Having unraveled the primary structures of several mouse and human MHC-I and MHC-II molecules, the stage was set to elucidate the biochemical basis of MHC restriction. Prior to this, the works of Unanue and colleagues had revealed that the activities of T lymphocytes were intimately linked to their interactions with macrophages [7][8][9], and the independent works of Unanue and colleagues, and Grey and coworkers demonstrated that the macrophage-T cell intimacy was to process antigens [7][8][9][10][11][12]. So also, it was known that nucleo-cytoplasmic proteins, notably the SV40 T antigen and influenza A nucleoprotein and derived peptides, or proteins deliberately delivered to the cytosol by fusion of non-replicative influenza A virus or by osmotic shock (e.g., ovalbumin) were targets of MHC-I restricted CD8 + T cells [13][14][15][16][17]. The in vitro binding studies that followed [18][19][20][21] and the solution of the three-dimensional structure of a human MHC- The radiochemical approach-invented to determine the amino acid sequences of peptides and proteins that were available in limited quantities [6]-returned yet another time to unveil the biology of MHC molecules. The first three-dimensional structure of A*02:01 had revealed that the binding site was occupied by a conglomerate of ligands whose chemical identities eluded Bjorkman, Strominger, Wiley and colleagues [22]. They postulated, and the general notion that followed was, that not a few or several but numerous peptides were bound in that A*02:01 antigen-binding groove indicating that the isolation of associated ligands in sufficient quantities to permit These initial reports were shortly followed by direct amino acid sequence determination of individual peptides eluted from the MHC with the aid of the mass spectrometer [36][37][38]. A critical early application of this technology led to the discovery of antigenic phosphopeptides, which now have found use in cancer immunotherapy

HLA-I alleles and peptide binding motifs
To understand what feature/s in an antigen dictated its presentation by an MHC-I molecule and not the others, Rammensse and coworkers devised a simple but clever experiment. They immunoprecipitated different mouse MHC-I molecules with specific monoclonal antibodies and acid eluted associated ligands. After separating the low molecular mass ligands associated with the MHC-I-presumably those that led to the extra density in the structure described above, the pooled ligandswere subjected to Edman degradation. This experiment revealed that the ligands bound to the MHC-I were indeed peptides, and that they were short, made up of 8-9 amino acid residues in length. The most astounding revelation was, depending on the presenting MHC-I, the peptides contained two to three conserved residues at defined positions-that is, peptides bound to H-2K b contained a structurally invariant phenylalanine or tyrosine at position 5 and a hydrophobic, aliphatic residue-such as leucine, isoleucine, or valine-at the carboxy-terminus, and, similarly, those bound to H-2D b contained an invariant asparagine at position 5 and a hydrophobic, aliphatic residue-such as methionine or isoleucine-at the carboxy-terminus.
And that the remaining positions within the peptide accommodated one of the 20 naturally occurring amino acid residues. Hence, peptides bound to an MHC-I contained a binding motif made of an internal and a terminal anchor residues [32]. In conclusion, a given MHC-I molecule can bind theoretically over a tenth-of-a-billion (∼20 6 8-mers) to a billion (∼20 7 9-mers) peptides that are structurally related at the anchors. So, then, if a cell displays ∼50-100 thousand MHC-I at the surface, is there a need to present millions-and billions (as Carl Sagan would say about the stars in the sky!) of peptides? New molecular cell biology seems to hold some of the secrets to this question, perhaps to represent the internal milieu at the cell surface for an appraisal by T cell and to keep immune reactions against self in check.

Excursion: Evolution of HLA-I peptidome diversity
The enormous capacity of MHC alleles to accommodate such high numbers of peptide ligands is motivated by the ability to cover the proteome diversity of pathogens. It is generally thought that the high polymorphism in the HLA locus is selected and maintained through a 'molecular arms race' [54][55][56]. In fact, characterization of immunopeptidomes of 18 individuals revealed that peptides bound to 27 highly prevalent HLA-I molecules were derived from 10% of the expressed genome. This 'hotspot' of self-presentation was driven by the HLA-I genotype of the individual, and increased promiscuity conveyed an improved coverage of self-protein presentation [57][58][59].
Further evidence for overall benefit of MHC diversity and antigenic coverage are found in the analyses of determinants of positive immunotherapeutic cancer treatment outcomes: An increased MHCassociated peptide diversity, and accompanied increased probability of presentation of neoantigens are a strong determinant of the outcome of immune checkpoint blockade in cancer [60,61]. Loss of heterozygosity in the HLA locus, leading to a restricted MHC allele diversity in the tumour, are a prevalent tumour escape mechanism and is associated with poor outcomes in checkpoint blockade therapy [62,63].
Whilst the expression levels of MHC-I are controlled by transcription, translational, and posttranslational mechanisms [64,65], expression levels of certain HLA alleles may be inversely correlated with their ability to present a larger variety of peptide sequences, leading to higher expression of alleles that are more 'fastidious' [66]. Despite this result being counter intuitive, it emphasizes the importance of evaluating quantitative aspects of antigen presentation and recognition. Insights so gained may unveil cause(s) and selection (evolution) of HLA diversity.

HLA-I supertypes and peptide binding supermotifs
Extensive HLA-Ia gene polymorphism is a major impediment to rational design of T cell-taergeted vaccines and are barriers to tissue transplantation [67][68][69][70]. There are over 9,300 HLA-I allotypes recorded, and there are numerous variants [71]. Consequently, the antigenbinding groove of numerous allelic products will have a unique physicochemical architecture [53,68] and, thereby, dictate the motif required for an epitope to bind it [72]. Because patterns of T cell epitope presentation and immune recognition in a given infection are different for individuals expressing different HLA molecules, development of universal T cell vaccine is a challenge.
Brilliantly, Sette and colleagues as well as Buus and co-workers [68][69][70][73][74][75] discovered that all the currently known HLA-A and HLA-B molecules can be grouped into functional 'supertypes' predicated on pockets B and F of members of each supertype having a shared physico-chemical architecture [70]. Pockets B and F accommodate the dominant peptide anchors of HLA-I restricted epitopes: that is, the middle anchor at position 2 and the C-terminal anchor [72]. The discovery of HLA supertypes led to the description of common binding motifs within peptides that bind a supertype and are collectively called 'supermotifs' [68,70]. Most importantly, peptide ligands predicted based on algorithms that have taken into account supermotifs have led to the discovery of numerous virus-derived CD8 + T cell epitopes [76][77][78][79][80]. A recent in-depth study of naturally processed immunopeptidomes of 95 distinct HLA-A, HLA-B and HLA-C molecules by high-resolution mass spectrometry has further refined supertypes based on HLA-I binding submotifs. These 95 HLA-I studied are expressed by 95% of the human population. In doing so, a significant number of HLA-I did not fit into a supertype or have been removed from previous supertypes [49]. That notwithstanding, targeting commonly recognised epitopes by T cells of individuals of the same HLA-I supertype holds promise as a vaccine design strategy.

A NOTE ON METHODS
Given that a single allelic MHC-I can bind millions of peptides, can each one to the last one in the antigen binding groove be extracted and identified? The answer is no because of limitations of the best of detergents to extract proteins from the cell membranes, efficiencies of downstream MHC-I purification and peptide elution methods, and the sensitivities and accuracies of detection, which currently uses state- (c) mild acid extracts of peptides from cell surfaces [88][89][90]. Each of these peptide extraction methods has its own advantages and disadvantages, but, when used in combination, the approaches are complementary and yield significant information about the immune self and non-self [87]. Antigen, an agonistic substance recognised by lymphocyte receptors-for example, the T cell receptor in the context of this review-but also the B cell receptor, and antibody/immunoglobulin; as such not all antigens are immunogens.
Determinants, (archaic, ca. 1970s, 80s, 90s!) all peptides that bind to and are presented by MHC molecules; also called epitopes sensu lato in the current literature.
Epitope, sensu stricto, that aspect of an antigen that is recognised by a T or a B cell receptor.
Immunogen, an agonistic substance that elicits (induces/provokes) a T or a B cell response in a vertebrate host organism or in an in vitro culture model. As such, all immunogens are antigens, but not all antigens are immunogens.
Protective antigens, a pathogen-derived immunogen which elicits a T or a B cell response in a vertebrate host and confers protection to the pathogen when challenged with a lethal dose of the pathogen from which the immunogen was derived but not to the pathogen that does not express the immunogen. In the second experiment, two VACV-derived proteins, x and y -which contain two antigenic peptides x' and y' discovered in the experiment above-were used as immunogens in prime-boost immunization of mice by i.p. route. After 14-72 days post boost, CD8+ T cells so elicited recognised the peptide x' derived from the immunizing antigen x but not the other peptide y', and vice versa (see refs. [45,347]).
Hence, these two antigens are immunogens; in this example, the two immunogens are antigens as well.
In the third experiment, mice were prime-boost vaccinated 2 weeks apart with proteins x and y. After 14-72 days post boost, mice were challenged with a lethal dose of VACV via the intranasal route. Whilst both groups of mice elicited an immunogen-specific CD8+ T cell response, only mice prime-boost immunised with protein x survived the challenge, but the group that received protein y as the immunogen did not (see refs. [45,348]). Hence, x is a protective antigen, but y is not, even though both x and y immunogens are derived from VACV. rapidly by the field, and Gibbs clustering tools as well as binding predictions can assist these stratifications. These approaches will benefit from the recent mass spectrometric profiling of HLA-I associated peptidomes in mono-allelic cells because the databases so created enables accurate peptide assignments and epitope prediction [45,49,113].

The basics
The process by which MHC-I molecules assemble, traffic, and display peptides is an excellent example of how a macromolecule utilizes the cell's topological biochemistry for antigen processing and presentation. Being a type I integral membrane glycoprotein, MHC-I molecules assemble in the endoplasmic reticulum (ER) [114,115]. Whilst the heavy and light chains are co-translationally inserted into the ER owing to their N-terminal signal sequences, the peptide component of the MHC-I molecule is actively transported into this vesicular compartment by accessory protein channels [116][117][118][119][120]. Peptides that assemble with MHC-I molecules are predominantly of cytosolic origin, but ER, nuclear, mitochondrial and phagosomal/lysosomal proteins also contribute to the peptide pool. Regardless of their origin, MHCI-binding peptides meet in the cytosol prior to entry into the ER.
The assembly of the MHC-I molecule is a complex highly concerted and controlled process that ensures cell surface display of only those molecules that are assembled with high affinity peptides (reviewed in ref. [121]). Display of peptide-associated MHC-I molecule at the cell surface is essential as this pathway of antigen presentation evolved to apprise CD8 + T cells of cytosolic events so as to provide a mechanism to safeguard cells from intracellular invasion by viruses and bacteria, and from tumorigenic mutations.

The assembly line
The assembly of the MHC-I molecule is schematised in figure 1.
Assembly begins with the co-translational insertion of the MHC-I heavy-chain into the ER. This heavy-chain co-translationally com-

5.3
Trimming to fit the groove TAP heterodimers transport peptides from the cytosol to the lumen of the ER to overcome the topologic barrier between the compartments where cells generate peptides and the site where cells assemble MHC-I molecules. TAP has a loose ligand specificity: it binds peptides that contain carboxy-terminal hydrophobic or basic residues. Such carboxytermini are known to bind to MHC-I molecules across all species. TAP transports peptides made up of 14-15 amino acid residues, therefore much longer than those that bind to MHC-I [116, 118-120, 132, 133].
The importance of peptide trimming in the ER bore out in experiments in which peptides assembled with H-2K b and D b molecules in ERAAP-deficient and -sufficient cells were eluted and subjected to LC-MS/MS analyses. While retaining a good fraction of peptides presented by MHC-I of wild type ERAAP-sufficient cells, ERAAP-deficient cells, in addition, ferried ligands bound to mouse MHC-I that had significantly altered its composition and length. Further, the latter peptide set was extended at the amino-terminus and not at the other end. Consistent with these findings, wild type ERAAP-sufficient mice elicited a strong CD8 + T cell response against ERAAP-deficient spleen cells indicating that the self immunopeptidomes displayed by MHC-I in ERAAP-deficient cells were immunogenic [134]. Similar features were also reflected in mouse cytomegalovirus (CMV)-derived peptides presented by cells devoid of functional ERAAP. What is more is that the self and CMV peptides presented by ERAAP-deficient cells elicited a distinct CD8 + T cell response focused on the N-terminal extension of the peptide [142].
Together, these findings describe the profound effects ERAP/ERAAP has on the immunopeptidomes of healthy and diseased cells and reveal new targets to treat human diseases.
Human and mouse MHC consists of several clusters of multi-gene families that encode proteins that control both the innate and adaptive immune responses. The MHC-I molecules described thus far are products of MHC-Ia cluster, which consists of genes that are highly polymorphic. In contract to these, the MHC-Ib cluster consists of numerous genes that are highly conserved even across species. Genes in this cluster were once considered evolutionary vestiges but are now known to encode molecules that control both T cell and natural killer cell functions-for example, the human HLA-E and the orthologous mouse H-2Qa1, which are ligands of activating CD94/NKG2 heterodimeric receptors [158][159][160]. To begin to understand the immunopeptidomes of MHC-Ib and their biology, peptides were eluted from the surface of ERAAP-sufficient and -deficient cells, and their features determined in high-throughput mass spectrometry experiments. Peptidomes associated with MHC-Ia molecules have features described above. Curiously, the number and immunogenicity of peptidomes presented by MHC-Ib molecules were substantially increased in ERAAP-deficient cells [161,162]. Hence, ERAAP trims a substantial repertoire of peptides to fit into MHC-Ib grooves. These findings convincingly implicate the ER as a major site for MHC-I associated immunopeptidome generation, shifting from the conventional notion F I G U R E 1 A schematic rendition of MHC-I biosynthesis and assembly with peptide cargoes. The assembly of MHC-I molecules begins with the co-translational insertion of the heavy chain into the lumen of the endoplasmic reticulum (ER). Herein the nascent heavy chain binds to the ER chaperone calnexin to facilitate initial folding and assembly with β2-microglobulin (β2m). This unstable heterodimer is stabilized by binding to a related ER chaperone calreticulin. This interaction makes the complex receptive to the peptide loading complex (PLC). This association with the PLC stabilizes the empty heterodimer such that the antigen-binding groove adopts and maintains a conformation receptive to peptide loading. The PLC-consisting of the heavy chain-β2m heterodimer, calreticulin, tapasin, and the ER-resident thiol-oxidoreductase/disulphide isomerase ERp57-facilitates peptide binding to the heterodimer. Initial peptide-bound MHC-I undergoes architectural editing via tapasin in the PLC to ensure high-affinity peptide (p)/MHC-I complex formation prior to exiting the ER. TAP-binding protein related (TAPBPR), independent of the PLC, edits for high-affinity peptide binding to MHC-I in a poorly understood mechanism. Peptides generated in the cytosol-the sources of which and their production are explained in the text-are made available for pMHC-I assembly in the ER lumen by transporter associated with antigen processing (TAP)-1 and TAP-2. Many of the peptides that are delivered into the ER are longer than the preferred 8-10 residues; these undergo further trimming by ER aminopeptidases, human ERAP1 (mouse ERAAP) and/or human ERAP2. Finally, high-affinity pMHC-I complexes are released from the PLC, which then falls apart into constituent parts, available for the next round of pMHC-I assembly. Perhaps to make the process efficient, in addition to peptide translocation from the cytosol to the ER lumen, TAP-1 and TAP-2 heterodimer forms a scaffold that tethers two PLCs into a complex. pMHC-I released from the PLC quickly egresses from the ER, and negotiates the Golgi apparatus en route to the cell surface for an appraisal by CD8 + T cells that most MHC-I associated peptides are generated in the cytoplasm by the action of the proteasomes-more on this matter is below.
Several studies have found that components of the MHC-I restricted antigen processing pathway also impact MHC-II antigen presentation. One such study reported that ERAAP-deficiency altered the immunogenicity of certain cytosolic peptides presented by H-2A b molecules ( [163] and references therein). Mass spectrometry analyses of peptidomes found that H-2A b molecules presented a pool of peptides derived from the cytosol of ERAAP-deficient cells [164].
Hence, ERAAP has effects on MHC-II associated peptidomes as well; how this occurs remains and awaits investigation.

Editing for best fit
Tapasin and its homologue TAP-binding protein related (TAPBPR) function to facilitate peptide binding to assembling MHC-I molecules and also as editors, the former in the PLC and the latter independent of the PLC. Current evidence suggests that tapasin and TAPBPR quality control the C-terminal end of the peptide. This editing function ensures that peptides of sufficient affinity are loaded into the antigenbinding groove to assure stable display of pMHC-I at the cell surface [165][166][167][168][169][170]. This editing function of both tapasin and TAPBPR loads soluble MHC-I molecules with high affinity peptides, a process capitalised to generate high affinity pHLA-I tetramers by in vitro catalysis using TAPBPR [99,171]. Despite these very close functional similarities,

Cellular roteostasis: Roles for proteasomes, immunoproteasomes, & thymoproteosomes
It is generally thought that the natural turnover of proteins in the cytoplasm contributes a sizable fraction of peptides to the immunopep-tidome. This assumption, however, is at odds with four features of peptides presented by MHC-I molecules: First, MHC-I immunopeptidomes contain peptides derived from long-lived proteins whose half-life average ∼45 h [175,176]. Second, presentation of virus-derived peptides occur even before virus proteins are detectable and assembly begins, and excess proteins turn over: for example, VSV-N (vesicular stomatitis virus nucleocapsid), VACV (vaccinia virus), and IAV (influenza A virus) [177][178][179][180][181]. Third, low copy number proteins-those that form a minor fraction of a given cell's proteome-are peptide sources and compete favourably against highly represented cellular proteins, which includes supra-stoichiometrically generated proteins that either misfold or cannot find partners in multimeric proteins [182][183][184]. Fourth, although controversial, peptides are derived from genome hotspots [57,58], which is not observed in tumour cell lines [49,164]  Toxoplasma gondii or within cancer cells [47,96,164,[191][192][193][194][195][196]. Whilst the immunologic consequences of the immunopeptidome shift in response to virus infection remain to be determined, IFN-γ did not influence the immunopeptidomes in another study [49]. In striking contrast, a high-throughput immunopeptidome analysis of nontransfected primary cells-thymocytes and professional antigen presenting DCs-of wild type and mouse β2i/MEKLand β5i/LMP7deficient mice showed a significant contribution of immunoproteasome in sculpting the immunopeptidome. This study showed that the immunoproteasome has proclivities for cleavage site, not amino acid residue, and unstructured regions in the substrate proteins [197].
Similar changes to cancer immunopeptidomes as a consequence of IFN-γ and TNF-α action were observed in other studies and, as a consequence, impacted tumour immunity or cancer immunotherapy [194][195][196]. Furthermore, there are reports that show that the immunoproteasomes can make or break epitopes with significant immunologic consequences [194,[198][199][200][201][202]. So then, why the differences between these findings? One cause may be the use of different cell sources, 6.1.2 Peptide dicing and splicing adds to antigen diversity An effort to identify the peptide epitope recognised by a HLA-A3restricted renal cell carcinoma-specific CD8 + T cell clone led to a serendipitous finding that the epitope was generated by the splicing of the protein antigen FGF-5 (fibroblast growth factor-5) [211].
Whilst the evidence pointed to the cytosol as the site of protein splicing, shortly thereafter, the proteasome was shown to splice peptide epitopes together after proteolytic cleavage within its catalytic chamber. This notion was firmed by incubation of purified 20S proteasomes with the precursor peptide RTKAWNRQLYPEW derived from gp100/MEL melanocyte antigen and identification of one of the spliced products RTK-QLYPEW by mass spectrometry and T cell assay [212]. Additional spliced virus-and tumor-derived antigenic epitopes are known [213,214]. Such diced and spliced epitopes derived also from a minor histocompatibility (H) antigen, that which mediate graft-versus-host response in HLA-identical bone marrow transplant recipients. In the case of the minor H antigen SP110, cleavage between the threonine-alanine peptidyl-bond (underlined) within the STPKRRHKKKSLPRGTASSR (bold indicate the two parts of the spliced epitope) fragment yielded the necessary energy for re-ligation of two resulting fragments in reverse order to create the HLA-A*03:01-restricted, SP110-derived minor H epitope SLPRGT-STPK [215]. Spliced peptides are not peculiar to virus-, alloantigen-or cancer-derived epitopes but are derived from bacterial proteins as well, for example, Listeria monocytogenes-a bacterium with an obligatory cytosolic lifestyle [216,217]. Thus, a novel antigen processing mechanism involving cleavage and re-ligation of peptide fragments within the proteasome was revealed. It is noteworthy that up until these discoveries, protein splicing was known only in plants And that, albeit controversial [49,220,221], peptide splicing may contribute between 1-30% of peptides within the immunopeptidome [49,213,220,[222][223][224][225][226].
The large range in the contribution of spliced peptides to the immunopeptidome reported from different works can be rationalised by understanding the methodology applied to their discovery [227].
LC-MS peptide identifications are generally made by assigning the most probable amino-acid sequence from a sequence database to a given spectrum, and the accuracy of this assignment is dependent on many factors including the spectral quality and the size and design of the sequence database. While the accuracy of such peptide-spectrum to sequence assignments may be controlled through parallel interrogation of randomised sequence databases for estimation of the false discovery rate, the designation of a specific amino acid sequence to be a product of proteasomal splicing needs careful biological validation [213,226,228]. Since validation of spliced peptide assignments can and has to date only been performed for subsets of peptide annotations, the true extend of spliced peptide sequences in the immunopeptidome remains as yet undetermined [227].
β5i/LMP7-containing immunoproteasome enhance the production of a novel gp100/MEL epitope by peptide splicing: RTKAWNRQ-LYPEW substrate reverse spliced to QLYPEW-RTKAWNR and diced to QLYPEW-RTK product epitope [229]. Similarly, immunoproteasome was shown to enhance the production of the SP110-derived minor H epitopes as well [215,218]. Curiously however, in large-scale studies, IFN-γ had little influence if any on the nature of the peptides in the immunopeptidomes investigated even though the cytokine induced components of the immunoproteasomes and accessory protein in the PLC [49].
Then there are thymoproteasomes, those made with the β5t subunit, which assembles in cortical thymic epithelial cells in association with β1i and β2i. β5t assumes the place of β5 and β5i in these cells [230]. β5t-containing thymoproteasomes, are thought to promote positive selection of CD8 + T cells, but the underlying mechanism remains unknown [230][231][232][233][234]. The chymotrypsin-like proteolytic activity of thymoproteasomes is low and, consequently, produce a distinct immunopeptidome [231,232]. Or alternatively, as β5t, β5 and β5i are paralogues begotten from gene duplication (β5t and β5) and two rounds of whole genome duplications (β5 and β5i) [235,236], and because β5i enhances splicing of certain peptide epitopes [215,218,229], an intriguing possibility is that the thymoproteasomes may have increased peptide splicing activity. These predictions, however, require further investigation.

DEFECTIVE RIBOSOMAL PRODUCTS
The immunodominant CD8 + T cell epitope from VSV-N is generated within the first 45 min post infection of cells [177]. A similar observation was reported for the HIV-1 Gag protein, which is an incredibly stable protein [237]. Hence, the presentation of these epitopes occurs much sooner than the turnover of the two source proteins begins. As well, over 30% of new synthesised proteins are turned over by the proteasomes [237]. This rapid protein turnover is consistent with the finding that the major substrates for TAP transport are generated from newly translated proteins [238]. These astute observations led Yewdell to postulate the DRiP hypothesis over 20 some years ago [239]. immunodominance [179].
Estimates are that DRiPs contribute to >30% of the peptides in the immunopeptidomes [239]. It is noted that the DRiP hypothesis does not in any way refute the contribution of peptides emerging from the natural turnover of stable cellular proteins-proteins that retire from their function/s-to the immunopeptidome [240]. In the light of evolution, it makes perfect sense to generate and present microbial antigens at early stages of infection as discovered in the kinetic study above so as to achieve effective immune surveillance and to stymie an

Unconventional translation: Where shall I begin?
Few groups had reported MHC-I restricted presentation of cryptic peptides-peptides that arise from polypeptides templated from the 5' and 3' untranslated region (UTRs,) and alternative reading frames (ARFs) that are generally thought not to be translated. Such cryptic peptides form targets for virus-and tumor-specific T cell-mediated immunosurveillance ( [241][242][243] and reviewed in ref. [244]). It is estimated that the cancer immunopeptidomes are constituted by 2%-20% cryptic peptides [242,245].
In studies designed to understand how cryptic epitopes arise, Shastri and colleagues discovered the surprising use of CUG in contrast to the conventional use of AUG as the initiator codon [246][247][248][249][250].
What is more is that translation initiation at CUG used the elongator leucinyl-tRNA anti-codon Watson-Crick base paired with the leucine codon. That is, the methionyl-initiator tRNA (tRNA i Met ) is not used as the initiator tRNA in a wobbled base pairing with the CUG codon.
Initiation at CUG required eIF2A (eukaryotic initiation factor-2A) to form the ternary complex [249]. This form of unconventional translation is enhanced by proinflammatory signals including virus infection [180,251] and appears to guide tumorigenesis, which upends conventional translation by the phosphorylation of eIF2α [252]. New evidence indicates that RPS28 (40S ribosomal protein S28) tunes peptide generation via unconventional translation [253]. These findings alert to immunoribosomes, those potentially dedicated to creating self and non-self immunopeptidome. More on this matter is below. such as CUG>GUG>AUG>>UUG for translation [252]. Together then, unconventional translation reduces conventional protein synthesis but biases the process toward cancer specific gene expression. Coupled with alternative initiator codon usage, the immune system has found a way for immunosurveillance so that tumours and infected cells have nowhere to hide.
Hence, translation from 5'-and 3'-UTRs (see refs. in the preceding paragraph), ARFs [243,255,256] including the negative strand (e.g., influenza virus) and translation initiation from a non-AUG but specifically CUG codon are all known to contribute to immunopeptidomes (reviewed in refs. [239,246,257]).
An estimated 1% of the proteome mis incorporates methionine residues with the use of Met-misacylated onto non-methionyl-tRNAs.
Methionine misincorporation into the proteome not only protects proteins from oxidation, but also expands the functional, expressed genome. As viruses, dead and live, enhance Met-misacylation via innate signalling mechanism and reactive oxygen production, such Metmisacyalted proteomes can contribute peptides with non-templated methionine/s to immunopeptidomes [258]. This notion awaits formal evidence.
Inhibition of IDO-1 then prevents the generation of neoepitopes and, thereby, obviates antitumor immunity.

Nuclear translation: Translating introns and across intron-exon boundaries
Translation of introns and intron-exon junctions provide a source of DRiPs [244,[262][263][264][265][266]. Two studies provide compelling evidence that antigenic peptides are generated via pioneer translation in the nucleus. In the first of these, inhibitors of RNA polymerase II (pol II) that prevented nuclear export of transcribed mRNA blocked cytoplasmic translation of a recombinant IAV neuraminidase (rNA) gene.
This recombinant protein generated an antigenic peptide engineered into rNA stalk region despite undetectable cytoplasmic translation [267]. The second study used a model in which mRNA is super rapidly exported from the nucleus in a HIV-1 Rev-dependent CRM1-mediated pathway. Super rapid mRNA export decreased the presentation of an antigenic peptide whose gene was engineered into the intron of the β-globin gene consistent with the nuclear translation of antigenic epitopes. Further, in situ localization mapped the pioneer translation product to peri-nuclear area in association with RPS6 and RPL7 [264].
So then, are there immunoribosomes?

Immunoribosomes-Gained in translation
DRiP hypothesis had postulated the presence of immunoribosomes as a means to channelize protein synthesis to peptide generation and TAP transport [268] (reviewed in refs. [269][270][271]). Initial evidence for the engagement of a distinct ribosome subset in translating DRiPs came from studies that inserted a pretermination codon downstream of a segment that encodes an antigenic peptide from within the β-globin gene. This premature stop codon initiates the RNA quality control mechanism termed non-sense mediated decay (NMD; see ref. [254]). the two 60S large subunit proteins inversely control DRiPs [253].
Deficiency in a third RP, RPS28, increased HLA-A2 levels at the cell surface; one plausible explanation for this increased expression could be increased peptide supply. Ribosomal profiling (Ribo-Seq) experiment showed increased unconventional translation of uORFs from 5' and 3' UTRs from non-AUG initiator codon. Of immunologic consequence, the increased peptide supply gained from RPS28 deficiency engineered into a melanoma cell line made the tumour line much more sensitive to NY-ESO derived pHLA-A2-specific CD8 + T cell-mediated killing [253]. These data together, for the first time, suggest the existence of immunoribosomes, which may play an important role in cancer immunosurveillance. Hence, the authors conclude that mutations in RP genes, which are common in cancers [272], may result in cancer immunoevasion.
In sum, unconventional translation, 'W-bumps' and translational stall and frameshift, translation of intron and intron-exon boundaries, and translation with immunoribosomes expand the sources of DRiPs and, consequently diversify the self and non-self immunopeptidomes.
Such DRiPs can be easily missed in experiments that use the current proteogenomics methods. To overcome this limitation, proteomics approaches need to incorporate Ribo-Seq technologies [252,273,274] to characterize a homeostatic and cancer translatome so as to better define what immune self and non-self mean to T cells. New studies are beginning to address this need in the cancer immunopeptidome space [242,275,276].

Microbial epitope discovery
The different approaches to discover T cell epitopes have been reviewed recently [277,278] and, hence, not belaboured here. The most popular of these is algorithm-based epitope prediction coupled with biochemical and immunologic validation. Over 40 such algorithms exist, which have been recently compared and reviewed by others [279]. Further, algorithms trained on naturally processed immunopeptidomes in addition to the traditional affinity-based tools have better predictive power as has the recent study of 95 HLA-I immunopeptidomes consisting of over 185 thousand peptides [49]. Two other methods gaining interest in T cell epitope discovery include, one, a massive, high-density peptide array technology that allows identification of all possible peptides that have the potential to bind to and be presented by MHC molecules in the absence of a functional PLC [280,281]. And two, phage display of pMHC complexes and epitope identification with yeast display of TCRs [282].
Algorithm based epitope discovery could lead to the discovery of mimotopes because it focuses mainly on MHC binding and antigen processing and presentation but does not account for the features for antigen-receptor interactions [283]. Accounting for this interaction is critical as the TCR is very sensitive: that is, the receptor can recognize and respond to one-to-ten molecules of an antigen [84,284]. As well, it can discriminate between two peptides differing by a methylene group or a methyl and a hydroxyl group in an accessory anchor-for example, H4 minor histocompatibility alloantigens [86,285,286]. This sensitivity coupled with a rather loose 'recognition logic' and micro-tomilli-molar binding affinity with which the TCR interfaces its cognate antigen-the p/MHC-is thought to make the TCR highly cross reactive [286 -290]. A case in point is the recognition of ∼100 different peptides by an H4 b -reactive CD8 + T cell line [291]-yet the 100 mimotopes so identified did not contain the actual epitope [86,285]. This was not a peculiarity of an alloreactive TCR because the simian virus 40-derived epitope-4 specific and herpes simplex virus 1 glycoprotein B-reactive T cell clones also recognised over 50 mimotopes. A common feature within the three mimotope sets was the presence of a TCRspecific recognition motif consisting of one or two conserved putative solvent exposed residues with a potential to interact with the TCR.
At the other extreme, a single autoimmune TCR was recently shown to recognize over a million different peptides within a broad crossreactivity profile [292]. Such cross reactivity is not peculiar to MHCIrestricted TCRs as several MHCII-restricted TCRs were shown to cross react in a similar manner (see refs. [282,293] and references therein). predictive power [282]. These learnt adaptations to epitope prediction algorithms has significantly enhanced T cell epitope discovery [283,[295][296][297][298].
Epitope prediction is high-throughput and effective for microbes with small proteomes such as those of viruses, the largest of which express ∼250-300 open reading frames (ORFs). Experiments using the power and rapidity of predictive algorithms coupled with T cellbased validation have resulted in the discovery of numerous putative and actual immune epitopes that are deposited in the IEDB (immune epitope database) [299]. In contrast, discovery of T cell epitopes from larger microbes such as M tuberculosis and Plasmodium spp. by using prediction algorithms would be challenging because the expressed genome of these microbes can encode ∼4000-6000 proteins. In addition to the scale (about a million potential determinants) of epitope screening problem, these microbes might use their own proteasomes to destroy beneficial epitopes discovered by predictive methods even before they are available for presentation by MHC-I.
Consistent with this notion, only a few epitopes were presented by HLA-A2.1 molecules expressed by M. tuberculosis strain H37Rainfected U937-A2 cells (3 nested/overlapping, HLA-A*02:01-resticted epitopes) [300] or M. bovis-derived strain BCG-infect THP-1 cells (12 A*02:01-resticted peptides) [301]. The large differences in the range of epitopes presented perhaps lie in the fact that viruses translate their ORFs and some of their ARFs on host ribosomes, DRiPs generated from which are substantial sources of antigenic peptides [255,257].
By contrast, mycobacteria translate their genomes on their own ribosomes, wherein DRiPs may be lost to rapid degradation by microbial proteasomes [302,303] and, hence, unavailable for presentation.
In striking contrast to the relatively small number of HLA-A*02:01-resticted mycobacterial peptides identified, a tedious and thorough study identified a large number of T. gondii-the agency of toxoplasmosis-encoded peptides (195) presented by A*02:01 on infected cells [47]. peptides should be TAP-independent-a notion that is easily tested.
Whether features observed in T. gondii-encoded ligands are unique to this pathogen or is common to microbes and parasites contained within parasitophorous vacuoles awaits further study. In this context, the features of HLA-I restricted T. gondii-derived peptides and other epitopes described earlier [304][305][306] bring perspective two orphan studies reported over 25 years ago describing the association of peptides longer than the conventional 8-10 mer to a mouse and a human MHC-I molecule [307][308][309].
Naturally processed epitopes presented by several HLA-I molecules have been characterised with the aid of proteomics approaches [45][46][47]96,100,101,164,179,[191][192][193]301]. A theme that emerged from one of these studies is that VACV-infected cells generate many, many more epitopes than are antigenic-those recognised by CD8 + T cells that are elicited by virus infection . These antigenic epitopes were also immunogenic-that is, they elicited a CD8 + T cell response in the appropriate mouse strain. Whether the non-antigenic virus-derived peptides were immunogenic (for a definition of antigen and immunogen, see BOX 1) was not determined. Another study of VACV-derived epitopes by mass spectrometry revealed that the mouse MHC-I presents peptides derived from almost all 200 or so virus ORFs. Further a large majority of these peptides were immunogenic, suggesting that the mouse has a large T cell repertoire directed against VACV peptides. Whether all of these peptides are antigenic was not determined [46]. It is less likely that all of the naturally processed VACV peptides identified by mass spectrometry are antigenic because previous reports by this group showed that 49 VACV peptides accounted for all of the antigenic epitopes. Further, CD8 + T cell responses to five peptides accounted for up to 40% and to all 49 peptides accounted for up to ∼95% of the total response to VACV in mice [310,311].
At first pass, it might seem that the infected cell wastes immense resources to generate and present so many different epitopes. But consider the following: if all of the readers of this manuscript are HLA-B*07:02 positive, but express different HLA-I molecules from the remainder five loci, our T cell repertoire would be as distinct and diverse as the number of individuals in the reader population. Hence, each repertoire will recognize a distinct, and potentially an overlapping set of epitopes. This is exactly what was observed in multiple studies [45,312,313], which we called variegated T cell antigen recognition [45,314]. This variegated recognition coupled with heterotypic immunity perchance explains the success of vaccination with VACV against smallpox with the eventual eradication of the disease from the globe [101,310].

Proteogenomics for cancer antigen discovery
Preceding the advent of the proteogenomic approach, tumour-specific antigens were discovered with mass spectrometry of T cell active There are two strategies to integrate these data with immunopeptidomic analyses. Firstly, the translated mutant proteome is subjected to T cell epitope prediction using HLA-binding predictors: for example, NetMHCpan4.1 [319]. This information then allows the specific targeted acquisition of the predicted variant peptide sequence within the material eluted from a given MHC-I molecule using MRM experiment.
From the resulting naturally processed tumor epitopes, immunogenicity was predicted in silico with both immunogenicity and protection validated in vivo [102,103,105]. Alternatively, the genomics information can be included in the protein databases used for interrogation of the LC-MS spectra from purified MHC-associated peptidomes. Here, dependent on the quality and extent of the genomically refined protein sequence data, this approach allows for the discovery of non-canonical antigens in the context of disease as exemplified above.
The potential variation within each peptide that is caused by snSNPs is ascertained from the genomes or transcriptomes of allogeneic or cancer cells and validated in immunologic assays [57,104], and immune reactivity has been confirmed for endogenous retroviral and lncRNA peptide sequences. Using high-resolution mass spectrometry, such epitopes relevant to several cancers have been discovered making possible therapies based on harnessing antigen presentation by means of vaccination, as well as T cell expansion and cell therapy [320][321][322][323][324][325][326][327].
Some of the neoepitopes were generated from oncogenic driver mutations, not only lending to highly personalised anti-cancer vaccination but to 'off the shelf' vaccines for individuals expressing HLA alleles of a supertype [327][328][329][330]. Neoepitopes are not only generated by snSNPs, but can emerge from frameshift mutations via dysregulated alternative splicing and exitron splicing events, and microsatellite instability, all of which are hallmarks of tumorigenesis [331][332][333][334]. We are just beginning to understand how a single amino acid alteration in a neoepitope beats immune tolerance to elicit an anti-cancer response [335]. Until welllearnt, we are at the mercy of combinatorial therapy such as checkpoint blockade, chemotherapy, or radiotherapy but at the cost of collateral damage (reviewed in refs. [331,336]). In this regard, such therapies can benefit from oncolytic virus infections that cause immunopeptidome shifts as alluded to above, [45][46][47]96,100,101,164,179,[191][192][193]301].
This finding raises the intriguing possibility that oncolytic viruses, such as adenovirus or vaccinia virus [164], recombinant viruses that ferry innate immune adjuvants [337] or chemical agents that target specific cellular processes [338,339], can aid to coax the expression and generation of neoepitopes to promote tumor immunity.
Finally, current insights into the nature, and the depth and breadth of immune self and non-self [50,51]