Driving Forces of Proteasome-catalyzed Peptide Splicing in Yeast and Humans*

Proteasome-catalyzed peptide splicing (PCPS) represents an additional activity of mammalian 20S proteasomes recently identified in connection with antigen presentation. We show here that PCPS is not restricted to mammalians but that it is also a feature of yeast 20S proteasomes catalyzed by all three active site β subunits. No major differences in splicing efficiency exist between human 20S standard- and immuno-proteasome or yeast 20S proteasome. Using H218O to monitor the splicing reaction we also demonstrate that PCPS occurs via direct transpeptidation that slightly favors the generation of peptides spliced in cis over peptides spliced in trans. Splicing efficiency itself is shown to be controlled by proteasomal cleavage site preference as well as by the sequence characteristics of the spliced peptides. By use of kinetic data and quantitative analyses of PCPS obtained by mass spectrometry we developed a structural model with two PCPS binding sites in the neighborhood of the active Thr1.

and the fragment is liberated. In PCPS, the N terminus of another peptide fragment performs a nucleophilic attack on the acyl-enzyme intermediate, leading to "direct transpeptidation" and the final generation of the proteasome-generated spliced product (PSP). Although the transpeptidation model is widely accepted, direct experimental evidence for its validity during normal substrate processing is still missing. An implication of this model is that the reaction mechanism is not regulated by a particular sequence motif but can occur at any substrate cleavage site and that splicing does not occur in case the initial peptide cleavage is followed by hydrolysis and PCP release. Thus, PCPS would only depends on the proteasomal site-specific cleavage strength (SCS), which determines how frequently proteasomes cleaves specific peptide bond (5,6). Interestingly, Dalet et al. identified a single PSP to be generated in absence of proteolysis, albeit with an extremely low efficiency, in a reaction that they defined as "condensation" (8) and that will be reported here as hydrolysis ϩ transpeptidation to discriminate it from the direct transpeptidation reaction.
Because of the novelty of PCPS as part of our understanding of the ubiquitin proteasome system function and some inherent technical difficulties, the biochemical models as well as the comprehension of the relevance of PCPS were so far only partially investigated. Therefore, we carried out an in vitro study by investigating the mechanism of PCPS and its driving forces, thereby obtaining sufficient elements to design a novel model of the PCPS process with specific structural features. Furthermore, the quantification of PSPs, by an innovative method named QME (quantification with minimum effort), and the comparison of PCPS activity of different human and yeast proteasome iso-forms provided hints with regard to the physiological relevance of PCPS.

EXPERIMENTAL PROCEDURES
Peptides and Peptide Synthesis-The sequence enumeration for the polypeptides gp100 40 -52 (RTKAWNRQLYPEW), gp100  (VSRQLRTKAWNRQLYPEWTEAQR) and gp100 201-229 (AHSS-SAFTITDQVPFSVSVSQLRALDGGNK) is referred to the human protein gp100 PMEL17 , for the peptide pp89 16 -40 (RLMYDMYPHFMPTN-LGPSEKRVWMS) to the murine cytomegalovirus pp89 protein and for the peptide LLO 291-317 (AYISSVAYGRQVYLKLSTNSHSTKVKA) to the murine Listeria monocytogenes's Listeriolysin O protein. Peptide sequences of the 14 previously described PSPs (10) as well as the 25 new PSPs identified in the proteasomal processing of the four synthetic substrates are reported in supplemental Table S1. All peptides were synthesized using Fmoc solid phase chemistry as previously described (11). The purity of synthetic peptides was tested by amino acid analysis.
Cell Cultures-Lymphoblastoid cell lines (LcLs) are human B lymphocytes immortalized with Epstein-Barr virus (EBV), which mainly express i-proteasomes (12). T2 cell line is a human T cell leukemia/B cell line hybrid defective in TAP1/TAP2 (transporter associated with antigen presentation) and ␤1i/␤5i subunits. T2.27 is a cell line originating from T2 cells and transfected with murine ␤1i and ␤5i subunits (13).
In Vitro Digestion of Synthetic Peptide Substrates-Synthetic peptides at different concentration (30 -40 M) were digested by 1-3 g 20S proteasomes in 100 l TEAD buffer (Tris 20 mM, EDTA 1 mM, NaN 3 1 mM, dithiothreitol 1 mM, pH 7.2) over time at 37°C. For the experiments performed in H 2 18 O-TEAD buffer we used water with 97% 18 O (Campro Scientific GmbH, Germany). To minimize undesired side reactions like the acidic-catalyzed 18 O labeling of carboxyl groups (i.e. at the C terminus or at acidic amino acids) (16), we performed the analyses of the samples by nano-liquid chromatography-matrix-assisted laser desorption ionization/time of flight/TOF-MS (LC-MALDI-TOF/TOF-MS) immediately after stopping the reaction by TFA acidification (0.3% final concentration). The relative quantification of the ratio direct transpeptidation/(hydrolysis ϩ transpeptidation) has been based on the isotopic pattern of the PSPs [RTK]-[QLYPEW] (gp100 40 -42/47-52 ) and [VSRQL] [VSRQL] (gp100 35-39/35-39 ) from the digestions, in H 2 18 O-TEAD buffer and by LcL and yeast wild type 20S proteasomes, of the polypeptides gp100 40 -52 and gp100 35-57 , respectively. The isotope pattern of these two PSPs generated in H 2 16 O-TEAD buffer and of the PCPs [QLYPEW] (gp100 47-52 ) and [RTKAWNR] (gp100 40 -46 ) were used as reference controls. The congruence of the isotope patterns of the PSPs and PCPs with the theoretical isotope patterns evaluated according to the their elemental composition (17) was computed as reported in supplemental material. In summary, the congruence of the isotope patterns of PSPs generated in H 2 18 O-TEAD buffer with the theoretical isotopic patterns represents the prevalence of direct transpeptidation whereas the congruence of the isotope patterns of PSPs generated in H 2 16 O-TEAD buffer and of the PCPs with the theoretical isotopic patterns was used, on the contrary, to estimate the accuracy of our measurements.
All experiments reported in this study were repeated and measured at least twice.
Analysis of ESI/MS/MS data was accomplished using Bioworks version 3.3 (ThermoFisher Scientific, USA). Database searching was performed using the SpliceMet's ProteaJ database version 1.0 released in 2010 (10) and the following parameters: no enzyme, mass tolerance for precursor ions 0.5 Da and for fragment ions 1amu. Oxidations of methionine and tryptophan were considered and ruled out as artificial. We rejected the following masses for the MS/MS analysis: 370.9, 371.9, 372.9, 391.1, 392.1, 393.1. These masses belong to plasticizer material derived from the MS instrument. In time-dependent processing experiments (signal intensity versus time of digestion) we analyzed the kinetics of the identified peaks by using LCquan software version 2.5 (Thermo Fisher).
Analysis of MALDI-TOF/TOF-MS data was accomplished by the peaklist-generating software 4000 Series Explorer version 3.6 (Applied Biosystem) and by using MASCOT version 2.1 (Matrixscience, London, UK). Database search was performed using SpliceMet's ProteaJ database (10) and the following parameters: no enzyme, mass tolerance for precursors, Ϯ 80 ppm and for MS/MS fragment ions, Ϯ 0.3 Da.
The number of entries in the searched database varied between different substrates because of their different sequence lengths. In particular, for the polypeptides gp100 40 -52 , gp100 35-57 , gp100 201-229 , pp89 16 -40 , and LLO 291-317 the number of database entries were 5810,  57982, 173355, 84255, and 112339, respectively.  MALDI-TOF/TOF-MS/MS spectra, ESI-MS/MS spectra and extracted ion chromatograms of the identified PSPs are reported in  supplemental Figs. S1-S4. QME, Titration and Raw MS Methods-In order to estimate the absolute amount of the total proteasomal cleavage/splicing products (⌺ PCP/PSP) within the proteasomal digestion of the substrates we developed QME and compared it with the titration and the raw MS methods in the representative proteasomal digestion of the substrate gp100 40 -52 . QME estimates the absolute content of Α PCP/PSP based on their ESI-MS signal measured in the digestion probe. QME is based on the law of the mass conservation and MS instrument features. The parameters and parameters' values of the QME algorithm were empirically computed (supplemental Fig. S5-S9). From the quantitation of Α PCP/PSP we could compute the site-specific cleavage strength (SCS) by applying the SCS algorithm, which computes the frequency of proteasome cleavage after any given residue of the synthetic polypeptide substrate by analyzing the amount of any digestion product (18,19).
Estimation of the MHC Class I-restricted Potential Epitopes-The list of the 9 -12mer Α PCP/PSP, detected in the processing of all four synthetic substrates by 20S proteasomes, were screened by two MHC class I epitope prediction algorithms, i.e. SYFPHEITY (20) and IEDB (21), available on the Web. We adopted as threshold to identify the best candidates the score of 20 for SYFPHEITY and IC50 ϭ 500 nM for IEDB. In the Results and Discussion sections we discussed mainly the results obtained adopting the IEDB prediction because of its increasing database, prediction power and the recently reported superior performances (22,23).
Statistics-Statistical analyses of cis/trans PCPS (Table IIB) and the relative amount of the direct transpeptidation (Table III) were performed using the t-Student test for independent tests adjusted using Bonferroni correction. p Ͻ 0.05 was considered statistically significant. In each data set, homogeneity of variance was checked by Levene's test. All analyses were implemented using SPSS software. The means and S.D. reported in Table IIA and Table IVB represent the means, for each 20S proteasome, obtained from the sum of the four substrates degradation and the S.D. over time. This type of statistical analysis is supposed to better mimic the in vivo situation where proteasomes are processing different substrates at the same time producing a unique pool of peptides. The maximum and minimum frequency values of PSPs, 9 -12mer PSPs and potential MHC class I PSP epitopes reported in the text refer to the time course means computed for each proteasome type and substrates.
A complete description of the methods can be found in supplemental material.

RESULTS
To determine proteasomal cleavage and splicing preferences, and to investigate the quantitative relevance of PCPS and its underlying biochemical mechanisms it was mandatory to compute the absolute amount of reactant peptides that are available for peptide splicing, i.e. the proteasome-generated cleaved peptides (PCPs) and the amount of proteasomegenerated spliced peptides (PSPs) produced during the PCPS reaction. Therefore, we compared three different methods for quantifying the absolute amount of all proteasomal cleavage/splic-ing products (⌺ PCP/PSP) generated over time by in vitro digestion of the human melanoma-derived synthetic 13mer polypeptide gp100 40 -52 (RTKAWNRQLYPEW) by standard (sproteasomes) and i-proteasomes. The raw MS method, which because of its immediacy has been used in the past (e.g. by Cardinaud et al. (24)), assumes that the MS signal of each peptide directly corresponds to its amount, thereby setting the conversion factor between MS signal and the absolute amount for any peptide equal to that of the substrate gp100 40 -52 . The titration method computes the conversion factor between the MS signal and the peptide's absolute amount by titrating the synthetic peptides corresponding to all products of digestion. By applying this method to all PCPs and PSPs identified in the digestion of gp100 40 -52 we observed that the conversion factors differed from peptide to peptide, de facto invalidating the assumption of the raw MS method (supplemental Fig. S5). The QME method, an algorithm-based method developed by us, computes the conversion factor between MS signal and the peptide's absolute amount by combining the MS signal, MS instrument features and biochemical principles such as the mass conservation's law (see Experimental Procedures and supplemental material). Of the three methods tested, QME provided the best mass conservation in the over time reaction, whereas with the titration method a notable gain of mass was observed (Fig.  1A). In general, the QME and the titration methods resulted in a similar estimation of the amount of the generated ⌺ PCP and ⌺ PSP (Figs. 1B, 1C and supplemental Fig. S6). By focusing on the production (per nmol of substrate cleaved) of the two major PCPs, i.e. gp100 40 -46 (RTKAWNR) and gp100 47-52 (QLYPEW) (Figs. 1D, 1E) we noted that applying the titration method, the values exceeded the limit of 1 nmol (that is the theoretical maximum obtained when the substrate is cleaved always in the same position thus generating only two PCPs), indicating that this method could in some cases overestimate the real ⌺ PCP/PSP amount. Application of raw MS on the other hand resulted in an underestimation of the amount of specific PCPs and PSPs, a phenomenon that was even more pronounced for shorter peptides (supplemental Fig. S6). We therefore considered QME as the method most suited and we applied QME to all subsequent analyses to determine the ⌺ PCP/PSP amount and the site-specific cleavage strength (SCS).
PCPS is Catalyzed by All Active Sites of 20S Proteasomes-So far PCPS was described only for human proteasomes. Therefore, we initially asked whether proteasomes of the yeast Saccharomyces cerevesiae were also able to catalyze peptide splicing or whether this was a function peculiar of human proteasomes designed to broaden the peptide repertoire for MHC class I antigen presentation. Yeast proteasomes were also chosen because of the availability of wellcharacterized mutant strains lacking one or more active sites and thus they could provide information regarding the involvement of the different active sites in PCPS. To allow a more general conclusion we needed to collect data on a sizeable number of PSPs generated during the degradation of different synthetic polypeptides. We therefore searched 20S proteasomal degradation products for PSPs generated from the Lysteriolysin delineated peptide substrate LLO 291-317 , the human melanoma gp100 delineated gp100 35-57 and gp100 201-230 polypeptides and the MCMV IE pp89 delineated pp89 16 -40 polypeptide by applying the previously developed PSP search method SpliceMet (10). Using this approach we identified 39 PSPs, which were generated by both human and yeast 20S proteasomes thereby demonstrating that PCPS is not a peculiar function of human 20S proteasomes only (supplemental Table S1 and supplemental Fig. S1-S4).
To identify the catalytic sites responsible for the splicing reaction we analyzed the ⌺ PCP/PSP generated from the four synthetic substrates by 20S proteasomes purified from yeast wild type and mutant strains, which harbor ␤ subunit mutants with inactive (␤1 T1A, ␤2 T1A, ␤1 and ␤2 T1A) or affected (␤5 K33A) cleavage sites (supplemental Table S2). In the ␤1 and ␤2 T1A mutants the active Thr1 is replaced by Ala thereby rendering the corresponding ␤ subunits proteolytically inactive. The T1A replacement in the ␤5 subunit is lethal (25). Because Lys33 is required for autocatalytic ␤5 propeptide processing and for the subunit's peptide cleavage activity, Lys33 within the active site pocket is replaced by Ala. In consequence, K33A mutation abolishes or impairs ␤5 subunit maturation leading to the formation of a proteolytic intermediate with inactive ␤5 subunit (25,26).
All types of yeast 20S proteasomes (wild type and the four active site mutants) produced either all or at least 50% of the PSP generated by human 20S proteasomes, providing first evidence that all three different catalytic sites can carry out PCPS. For more detailed investigation we next calculated the specific cleavage strength (SCS) for each polypeptide substrate. By comparing the SCS of the different yeast 20S proteasome mutants we determined which of the active sites were mainly responsible for cleavage after a given residue. We considered an active site mainly responsible for a specific cleavage when the ␤-subunit mutant proteasome showed a significant decrease in the frequency of cleavage after a given residue as compared with the wild type (see Experimental Procedures; Fig. 2 and supplemental Fig. S10). This information was compared with the amino acid sequences of all generated PSPs, thereby allowing the identification of ␤ subunits that were responsible for those cleavages that generate the N-and C-terminal residues of the PCPs, i.e. the splicereactants, used in the formation of a new spliced peptide. We named them as following: Pn ϭ N-terminal residue of the N-terminal splice-reactant; P1 ϭ C-terminal residue of the N-terminal splice-reactant; P1Ј ϭ N-terminal residue of the C-terminal splice-reactant; Pc-ϭ C-terminal residue of the C-terminal splice-reactant ( Fig. 2A). Importantly, the P1 residues suggested to form the acyl-enzyme intermediate with the active site threonine, were found to be generated by the ␤1, ␤2 and ␤5 subunits (Table I)  A quantitative comparison of the SCS and the amount of correlating PSPs provided also first insights regarding the driving forces of PCPS. For example, the cleavage after the residue Leu 39 of the polypeptide gp100 35-57 was impaired in the 20S proteasome ␤5 mutant, suggesting that this subunit was mainly responsible of this cleavage. This is further confirmed by the enhanced site-specific proteolysis of the 20S proteasome ␤1-␤2 mutant possessing only ␤5 subunits as active sites (Fig. 2B). Intriguingly, the PSP [VSRQL][VSRQL] (gp100 35-39/35-39 ), which had the Leu 39 as PSP residue P1, was generated with a lower efficiency by the ␤5 mutant proteasome (Fig. 2C), indicating a correlation between the amount of each produced PSP and the availability of associated splice-reactant peptides. The inactivation of the active site ␤ subunits in no instance completely abolished the usage of the identified substrate cleavage sites. Therefore, it was essential to quantify and compare the amount of each PSP produced by different yeast mutant proteasomes because a simple qualitative (yes/no) evaluation was not possible.
PCPS is Not a Rare Event-The data reported above gave first insight into the driving forces of PCPS but also suggest that PCPS in vitro, at least for specific PSPs, was not a rare event. For an overall evaluation of the relative number of PSPs within ⌺ PCP/PSP, we analyzed the degradation of the substrates gp100 35-57 , gp100 201-230 , pp89 16 -40 and LLO 291-317 by 20S proteasomes purified from human erythrocytes and T2 cell lines (s-proteasomes) as well as from human spleen, T2.27 and LcL cell lines (mainly i-proteasomes). Overall, the average ⌺ PSP amount produced by all 20S proteasomes turned out to be 1.89% of ⌺ PCP/PSP with notable differences between substrates (⌺ PSP relative amount ϭ 0.55 -5.49%). Although the amount of specific PSPs largely varied between s-and i-proteasomes as shown in Fig. 3

TABLE I Proteolytic ␤-subunits responsible for the generation of the N-and Ctermini of the splice-reactants
The number of cleavage sites, ascribed to different active site ␤ subunits, that produced the N-and C-termini of the splice-reactants of the four substrate polypeptides is here reported. We named them as following: Pn ϭ N-terminal residue of the N-terminal splice-reactant; P1 ϭ C-terminal residue of the C-terminal splice-reactant; P1´ϭ N-terminal residue of the C-terminal splice-reactant; Pc-ϭ C-terminal residue of the C-terminal splice-reactant (Fig. 2a). The cleavages that generated the terminal residues, of the splice-reactants, which were afterwards located at the PSP position Pn, P1, P1Ј, and Pc could be ascribed to all proteolytically active ␤ subunit. We defined "unknown" those cleavages that, based on the yeast 20S proteasome assays, were produced by all proteolytic ␤ subunits to a similar extend (supplemental Fig. S10). significant differences in the relative amount of ⌺ PSPs was observed when we compared s-and i-proteasomes (Table IIA). Because of the high number of trans PSPs identified so far (supplemental Table S1) we asked whether trans PCPS was more frequent process than cis PCPS. To quantify the frequency of trans PCPS reaction we performed in vitro digests in which the unmodified 13mer gp100 40 -52 peptide was applied to proteasomal processing in the presence of the same peptide but with the heavy amino acid residues 13 C 6 -Lys, 13 C 6 -15 N-Leu, and 13 C 5 -15 N-Glu (RTK ϩ6 AWNRQL ϩ7 -YPE ϩ6 W). We detected PSP variants being the results of cis (variants -A and -D) or of trans (variants -B and -C) PCPS and we computed their relative amount by comparing the MALDI-MS signals ( Fig. 5 and supplemental Fig. S11). Interestingly, in experiments carried out either by s-and i-proteasomes cis PSPs prevailed over trans PSPs with a small but statistically significant difference (Table IIB).
Human and Yeast 20S Proteasomes Catalyze Peptide Splicing with Similar Rates and with no Active Site Preference-To quantitatively analyze yeast 20S proteasome-catalyzed splicing reaction and to verify whether any of the ␤ subunits was prevalently involved in it, we measured the ⌺ PSP amount produced by yeast wild type and mutant 20S proteasomes by processing the substrate gp100 35-57 . The relative amount of ⌺ PSP within the ⌺ PCP/PSP generated by yeast wild type proteasome, i.e.1.05% (Ϯ 0.38), was in the range of what had been measured for human s-and i-proteasomes, which was 1.83% (Ϯ 0.16) and 1.15% (Ϯ 0.42), respectively. Considering the quantitative relevance of active site ␤ subunits for PCPS by comparing the ⌺ PSP within the ⌺ PCP/PSP generated by yeast wild type or mutant proteasomes, no significant differences emerged (Table IIC), although the amount of specific PSPs was affected (Fig. 2C).
Sequence Preferences Regulate PSP Generation-We next asked whether PCPS was driven by the amount of the splicereactants, which is a corollary of the transpeptidation model or whether other factors were also involved. Therefore, we compared the amount of PSP pairs that shared one of the splicereactants and the amount of the corresponding splice-reactants. This approach was mandatory, because by comparing only the amount of a single PSP with the amount of the corresponding splice-reactants we could not discriminate between possible sequence dependences of cleavage site usage and of the PCPS reaction. Among the ⌺ PCP/PSP generated by proteasomal digestion of the substrates gp100 35-57 , gp100 201-230 , and LLO 291-317 we identified several PSP pairs sharing one of the splice-reactants. To reduce the complexity of the comparison we focused on PSP pairs where one PSP was generated by the ligation of two identical PCPs, e.g.
[VSRQL][VS-RQL], whereas the other PSP was produced by ligation of the shared splice-reactant and a second different splice-reactant, e.g.
[QLYPEWTEA] [VSRQL]. Furthermore, we computed for each example reported in Fig. 3-Fig. 4 the amount of PCPs and PSPs per nmol of substrate cleaved to obtain data independent to the substrate degradation rate, which often differs between s-and i-proteasomes. For instance, for the 23mer   (Fig. 3C). The shared reactant peptide [VS-RQL] was generated more efficiently by T2.27 i-proteasomes than by T2 s-proteasome (Fig. 3D), evidence that directly correlated with increased efficiency of PSP gp100 35-39/35-39 formation (Fig. 3B). In contrast, T2 and T2.27 20S proteasomes generated the peptide [QLYPEWTEA] (gp100 47-55 ) with similar efficiencies and there was no difference in the generation of the PSP gp100 47-55/35-39 suggesting that, in this case, the reactant peptide which was driving the reaction was gp100 47-55 (Fig. 3E). The PCP gp100 47-55 was also produced at lower amounts than the other reactant peptide [VSRQL] by both the 20S proteasome isoforms, suggesting that the reactant peptide produced at lower amount was the rate-limiting factor of the splicing reaction. Similar conclusions emerged from the analysis of the LLO 291-317 polypeptide substrate, which was degraded only slightly faster by T2.27 20S proteasomes (Fig. 4A) LLO 291-293 and LLO 300 -306 , being the rate-limiting compounds. Notably, this dependence of the PCPS reaction on the splice-reactant present in lower amounts seemed to be independent of the position of the splice-reactant in the nascent PSP, because we observed this phenomenon for both N-(gp100 47-55 ) and C-(LLO 291-293 and LLO 300 -306 ) terminal splice-reactants.
Although we found a correlation between the amount of the reactant peptides (i.e. the PCPs that will be spliced) and the products (i.e. the PSPs), we asked whether that was the only driving factor of the PCPS or whether also the sequence of the splice-reactants would affect the PCPS efficiency. Therefore, we computed the SCS for each substrate and the frequency of the N-and C-terminal residues of the splice-reactants. We focused our attention on PSP P1 residues, because they are thought to be directly linked to the active site involved in the splicing reaction, and on PSP P1Ј residues ( Fig. 2A), which are supposed to perform the nucleophilic attack on the acylenzyme intermediate (5). In case PCPS was driven only by the amount of splice-reactants, the general frequency of cleavage (i.e. SCS) and the frequency of cleavage generating the PSP P1 and P1Ј residues would be expected to be similar. Quite in contrast, for all substrates and in all the digestions carried out by 20S proteasome isoforms of any origin, we observed a substantial difference between the SCS and the frequency of cleavage generating the PSP P1 and P1Ј residues (Fig. 6). In fact, both the PSP P1 and P1Ј residues often derived from minor cleavage sites, thereby suggesting also a sequencedependence of the PCPS process that is independent of the overall cleavage preferences.
Cis and Trans PCPS Occur by Direct Transpeptidation-Unexpectedly, the results shown above were partially in contradiction with the transpeptidation model as proposed by Vigneron et al., which had been deduced from the observation that the 20S proteasome did not catalyze the ligation of the peptides RTK and QLYPEW (5). Therefore we set out to experimentally study the transpeptidation model by performing the digestion of the substrates gp100 40 -52 (RTKAWNRQ-LYPEW) and gp100 35-57 (VSRQLRTKAWNRQLYPEWTEAQR) by LcL and wild-type yeast 20S proteasome in H 2 18 O-TEAD buffer.
In the direct transpeptidation reaction, which is the core of the transpeptidation model (5), the formed acyl-enzyme intermediate, bound to the proteasomal Thr1, must be attacked by the C-terminal splice-reactant thereby preventing the N-terminal fragment from being released by hydrolysis (Fig. 7A). In opposite, in the hydrolysis ϩ transpeptidation reaction, the N-terminal splice-reactant is released from the active site ␤ subunit Thr1 residue by hydrolysis and subsequently forms a new acyl-enzyme intermediate followed by its ligation to the other splice-reactant and the formation of the final PSP. Therefore, the direct transpeptidation and hydrolysis ϩ transpeptidation reactions can be discriminated by performing the digestions in H 2 18 O-TEAD buffer, because only during the hydrolysis, peculiar of the hydrolysis ϩ transpeptidation, the N-terminal peptide incorporates 18   fragment. Consequently, the PSP molecules produced at the last step of this reaction will possess to the 50% the monoisotopic mass and to 50% the monoisotopic mass ϩ 2Da because of the 18 O-labeling. Other condensation processes (not represented in Fig. 7) may occur, involving other proteasome sites instead of the active site ␤ subunit Thr1, but producing the same final result. This experimental set up has the advantage that proteasome can cleave the substrate along its sequence during in vitro digestion and direct transpeptidation as well as hydrolysis ϩ transpeptidation can occur within the same sample, thereby allowing a relative quantification of the two reactions.
We (gp100 47-52 ) and [RTKAWNR] (gp100 40 -46 ), as additional controls of the reactions in H 2 18 O-TEAD buffer and we calculated the congruence between the isotopic pattern of both PCPs and PSPs and the theoretical isotopic patterns (17). The congruence of the isotope patterns of PSPs generated in H 2 18 O-TEAD buffer with the theoretical isotopic patterns represents the prevalence of direct transpeptidation. The congruence of the isotope patterns of PSPs generated in H 2 16 O-TEAD buffer and of the PCPs with the theoretical isotopic patterns can, in addition to estimate the accuracy of our measurements (see supplemental material).
The PCP [QLYPEW] (gp100 47-52 ), which has the C terminus that is not produced by cleavage, revealed a similar isotopic pattern in the digestions carried out by LcL proteasome in H 2 16 O-or H 2 18 O-TEAD buffer (Fig. 7B). In the same diges-tions, the PSP gp100 40 -42/47-52 showed a similar isotopic pattern, too (Fig. 7C). This PSP has a calculated m/z 1220. 6 and because the C terminus is not produced by cleavage, the increased amount of the isotopes with m/z of 1222.6, 1223.6, 1224.6, and 1225.6 Da would be due to the incorporation of one 18 O after the K residue during hydrolysis ϩ transpeptidation. In the same digestions, the PCP [RTKAWNR] (gp100 40 -46 ) showed a clear shift of all isotopes of ϩ2 Da (Fig. 7D). This mass shift was due to the incorporation of one 18 O at the C terminus during the hydrolysis as confirmed by MS/MS analysis (data not shown). A similar and expected shift of ϩ2 Da of all isotopes in the digestion by LcL 20S proteasome of the substrate gp100 35-57 was detected also for the PSP gp100 35-39/35-39 (Fig. 7E) because of the incorporation of one 18 O at the C terminus during the hydrolysis as confirmed by MS/MS analysis (supplemental Fig. S12). As shown in Table III (Table III) leading to the conclusion that these two PSPs are both produced by direct transpeptidation because no significant incorporation of 18  class I antigen presentation, we also investigated whether PSPs might have a different prevalence than PCPs with respect to potential MHC class I epitopes. We therefore computed the frequency and absolute amount of ⌺ PCP/PSP with a length between 9 and 12 amino acids (roughly the size of standard MHC class I epitopes and precursors) within the ⌺ PCP/PSP produced by proteasomal processing of the substrates gp100 35-57 , gp100 201-230 , pp89 16 -40 and LLO  . Surprisingly, the frequency of 9 -12mers was higher among PSPs than PCPs in both s-and i-proteasome reactions. Indeed, 7.56% of the 9 -12mers generated by all 20S proteasomes were PSPs, with a considerable difference between different substrates, i.e. 0.62% (gp100 201-230 cleaved by T2.27 proteasomes) and 29.69% (LLO 291-317 cleaved by T2 proteasomes) (Table IIIA). Notably, PCPs and PSPs produced during the degradation of the four substrates exhibited a similar average length although the frequency of 9 -12mer was higher among PSPs because of the short length of the splice-reactants (Table IVB). PSPs also harbored a relatively higher percentage of potential MHC class I-restricted epitopes. Considering the epitope list selected by the prediction algorithm IEDB (21), we calculated that PSPs amounted to 11.61% of the potential MHC class I epitopes generated by all 20S proteasomes, corresponding to 15.58 pmol per nmol of substrate cleaved. Again, we observed a strong variation between substrates, with the frequency of ⌺ PSP among potential MHC class I epitopes varying between 0.25% (gp100 201-230 cleaved by T2.27 proteasomes) and 47.42% (LLO 291-317 cleaved by T2 proteasomes). Among the potential MHC class I epitopes, specific PSPs were better produced by proteasomes iso-forms. For example, the PSP [VSRQL][VSRQL], which was predicted to be precursor of a binder of the HLA-B * 2705 by both MHC class I prediction programs, was more efficiently produced by i-than s-proteasomes (Fig. 3B). Nevertheless, although s-and i-proteasomes differed significantly in their production of specific PSPs that were predicted to be MHC class I binders, no significant difference emerged when we considered the ⌺ PSP's amount among potential MHC class I epitopes (Table IVA). This information is here referred as the frequency of cleavage after the residue W 52 , which generates the T 53 at P1Ј position, considering in the computation only the PSPs and not the ⌺ PCP/PSP. Conversely, the frequency of the cleavage (i.e. SCS) after the residue W 52 is 5.7% (Ϯ 0.9) if we consider ⌺ PCP/PSP. The symbol "/" in the x axis refers to all splice-reactants whose N termini have been spliced without previous cleavage, like, for example, [VSRQL] or [VSRQLRT]. This event is, of course, possible only for the P1Ј positions. Digestions of 3 nmol of synthetic substrate in 100 l reactions were carried out by 1.5 g 20S proteasome purified from human erythrocytes or spleen. The frequencies were computed by SCS algorithm from the QME calculation of the ESI-MS data. Relative frequencies are reported in % and the bars represents S.D. of three experiments measured three times each. Erythrocyte and spleen 20S proteasome data are reported in black or white histograms, respectively.

DISCUSSION
In the present study, we identified a large number of new PSPs, which were generated by 20S proteasomes from four different polypeptide substrates. The large number of new PSPs and the number of different polypeptide substrates in combination with QME, a new algorithm-based method facilitating the computation of the absolute amount of all proteasomal processing products, enabled us to define the proteasomal active sites involved in PCPS, to identify mechanisms and driving factors of the reaction as well as to estimate the potential physiological relevance of PCPS.
It is worth mentioning that PCPS does not simply represent a reverse reaction of substrate proteolysis because it combines noncontiguous fragments to generate novel peptides (Fig. 8). The 20S proteasome and its isoforms could therefore differently catalyze reactions, which differ in the equilibrium between proteolysis and PCPS, thereby resulting in different total amounts of PSP (⌺ PSP). In consequence, the comparison of the ⌺ PSP generated from a larger number of substrate polypeptides by different 20S proteasome isoforms, as performed here, allowed further insight into the PCPS mechanism and into the structural features of proteasome that support PCPS.
For example, our experiments demonstrate that also 20S proteasomes of the yeast Saccharomyces cerevesiae catalyze peptide splicing, suggesting that PCPS represents an evolutionary conserved intrinsic property of proteasomes, rather than a specific activity of mammalian 20S proteasomes developed to extend the diversity of MHC class I epitopes. Taking advantage of the different yeast 20S proteasome active site mutants our experiments also revealed that, without any obvious preference for a given active site, all proteolytic ␤ subunits are able to catalyze the PCPS. Because of the struc-

TABLE III cis and trans PSPs occur by direct transpeptidation
The congruence of the isotope patterns of the PSPs gp100 40 -42/47-52 and gp100 35-39/35-39 with the theoretical isotope patterns for each type of experiment is reported as percentage. The congruence of the isotope patterns represents for PSPs generated in H 2 18 O-TEAD buffer the prevalence of direct transpeptidation, whereas in H 2 16 O-TEAD buffer the accuracy of measurements. Indeed, in the experiments performed in H 2 16 O-TEAD buffer, the congruence of the measured isotope pattern with the theoretically estimated distribution of the isotopic peaks (17) of the PSPs is supposed to be 100% and thereby the deviation from this value is ought to be due to technical issues. In opposite, in the experiments performed in H 2 18 O-TEAD buffer the congruence of the measured isotope patterns of the PSPs with the theoretical isotope patterns can be due also to the incorporation of 18 O at the splice-site, whether PCPS occurs via hydrolysis ϩ transpeptidation, and therefore it represents also the prevalence of direct transpeptidation. Congruence with the theoretical isotope patterns was computed as described in the supplemental material by elaborating the area of the isotopic peaks of PSPs gp100 40  tural similarity between yeast and mammalian 20S proteasomes this observation most likely also applies to the different active sites of mammalian 20S proteasomes (27). Interestingly, irrespective whether yeast wild type or ␤ subunit mutant 20S proteasomes were studied, similar ⌺ PSP per nmol of cleaved substrate were generated. This on first sight surprising result may be explained by the fact that the major splicing sites such as the residues Q 38 , L 48 , P 50 , E 51 and T 53 in the polypeptide gp100 35-57 were generated by all three proteolytic ␤ subunits to a similar extent (supplemental Figs. S10 and S13).  (Figs. 3, 4) and demonstrated that the amount of the less abundant spliced reactant peptide is one of the rate-limiting factors of PCPS, independent of whether the splice-reactant is located at the N-or C-terminus of the generated PSP.
However, in striking contrast to Vigneron's model, which also implied that PCPS is not restricted by a particular sequence motif and can occur at any major substrate cleavage site used by 20S proteasomes, our experiments demonstrate that the main cleavage sites within the substrate sequence are often not the main ligation sites of the PSPs. The degradation analysis of gp100 35-57 by spleen 20S proteasome may serve as representative example (Fig. 6). Indeed, whereas the cleavages after Leu 39 represented 62.5% of the total proteasomal cuts, only 16.4% and 7.7% of cleavages behind Leu 39 produced the PSP's P1 and P1Ј residues, respectively. Conversely, Glu 51 , Trp 52 , and Thr 53 were often P1 and P1Ј splicesites, despite the fact that they represented only minor cleavage sites. Hence, the sequence-specificity that determines cleavage site usage and the sequence specificity that determines ligation efficiency both affect PCPS although they do not necessarily overlap. A) The content of PCP and PSP 9 -12mers generated by different 20S proteasomes during the processing of the four substrates is reported in the second and third column. The amount of the PCP and PSP potential MHC class I epitopes is shown in the fourth and fifth column (as predicted by SYFPHEITY (20)) as well as in the sixth and seventh column (as predicted by IEDB (21)). The content is expressed as pmol ⌺ PCP or ⌺ PSP per nmol cleaved substrate (ϩ/Ϫ S.D.) and it is the mean of PCPs or PSPs of all four substrates (gp100 35-57 , gp100 201-230 , pp89 16 -40 , LLO 291-317 ) as computed by QME for each time point of the reactions. S.D. is the standard deviation amongst different time points of the means of all experiments performed with all substrates. Notably, PCPs and PSPs produced during the degradation of the four substrates exhibited a similar average length but different statistical variance, with a higher number of PSPs close to the standard length of MHC class I epitopes and precursors. Such a phenomenon could explain the higher frequency of 9 -12mers amongst PSPs and might be due to the significantly shorter length of the splice-reactants in comparison to the entire pool of PCPs. (B) The average length of PCPs and PSPs identified amongst the ⌺ PCP/PSP of the substrates gp100 35-57 , gp100 201-230 , pp89 16 -40 , LLO 291-317 is here reported and shall be compared to the average length of the PCPs that are spliced by proteasomes thereby generating the PSPs. This calculation is based on the number of PSPs and PCPs identified and not on their amount. The novel structural model of the PCPS catalytic pocket as described in Fig. 9 may explain the majority of our results. According to what is shown in Fig. 9A, the active Thr1 is localized between two PCPS binding sites. The PCPS binding site ␥ binds the N-terminal splice-reactant and the PCPS binding site ␦ binds the C-terminal splice-reactant. Because the stability of the acyl-enzyme intermediate is a key factor in PCPS and, at least in part, depends on the binding of the P1-P2-P3 residues of the N-terminal splice-reactant to the nonprimed substrate binding site of the catalytic site (27), the PCPS binding site ␥ and the nonprimed substrate binding site very likely coincide. In opposite, the PCPS binding site ␦ and the primed substrate binding site might be distinct. Indeed, one might hypothesize that the PCPS binding site ␦ and the primed substrate binding site are both allocated within the proteolytic pocket with their grooves ending at the active Thr1. The proteolytic pocket of the proteasome could have indeed sufficient room to allocate both the substrate and the C-terminal splice-reactant as suggested by the crystal structure of mouse proteasomes (27). A three dimensional representation of such an hypothesis is described for the chymotryptic-like pocket of the mouse i-proteasome in supplemental Fig. S14. Therefore, according to our model, although the substrate is entering the catalytic pocket and allocating its sequence at the nonprimed and primed substrate binding sites, the C-terminal splice-reactant could already be bound at the PCPS binding site ␦ and perform the nucleophilic attack on the acyl-enzyme intermediate as soon as (or while) the C-terminal substrate fragment leaves the primed substrate binding site (Fig. 9B). Being occupied by the splice-reactants, the catalytic pocket could therefore create an closer proximity of the splice-reactants thereby providing together with the surrounding proteasome surface a molecular crowding environment to facilitate PCPS (9). In addition, the retention time of the N-terminal splice-reactant as acylenzyme intermediate could be extremely short with the C-terminal splice-reactant already being in close proximity of the active Thr 1. Furthermore, we found that the average size of the N-and C-terminal splice-reactants is around 6 amino acids (Table IVB). This data fits with the hypothesis that the PCPS binding site ␥ and the nonprimed substrate binding site coincide. Indeed, amino acids located up to five residues before and after a cleavage position have been shown to determine specific cleavage usage suggesting that the substrate binding grooves might allocate around five substrate residues (3,28).
Nevertheless, the frequencies of cleavages generating ⌺ PCP and the ⌺ N-and C-terminal splice-reactants substantially diverge ( Fig. 6 and supplemental Fig. S13). This apparent contradiction could be explained by hypothesizing that: i. PCPS binding site ␦ and the primed substrate binding site do not coincide and therefore can exhibit distinct peptide specificities (Fig. 9); ii. the retention time of the N-terminal splicereactant may have an opposite effect on cleavage and PCPS rates. Indeed, a short retention time of a given substrate sequence, which reflects its low affinity for the proteasomal substrate binding sites, leads to a null or low cleavage rate after the P1 peptide bond (3). Hence, a longer retention time of this sequence at the substrate binding sites produces a high cleavage of the peptide bond. Albeit a prolonged stabilization of the acyl-enzyme intermediate might slow down the activity of the proteolytic ␤ subunit leading to an overall reduced cleavage of this peptide bond. Conversely, the endergonic nature of peptide ligation renders the process energetically unfavorable (9). A longer retention time of the splice-reactants in proximity of the proteasome catalytic site (Thr 1) might hence be expected to be compulsory for the reaction. Therefore, an increase of the life span of the peptide bound to the nonprimed substrate binding site FIG. 8. Biochemical representation of the cleavage and splicing reactions. A biochemical model of proteasomal cleavage and splicing reactions of a hypothetical substrate ABCD. By cleavage after each residue (reaction k 1 ) the substrate ABCD is transformed in the products A, B, C, D. A reverse reaction ligates the product A to B to C to D thereby reconstituting the original substrate ABCD (reaction k -1 ). Ligation reactions between the products of k 1 generate new compounds: AC (reaction k 2 ), BD (reaction k 3 ), CA (reaction k 4 ). For each of these new compounds a proteolytic reaction can reconstitute the product pool A ϩ B ϩ C ϩ D, i.e. the reaction k -2 , reaction k -3 , reaction k -4 , respectively. Although the reactions k 2 , k 3 , k 4 are ligations as k -1 they are all different reactions because they produce different products. Proteasomes, as any enzyme, are catalyzing a reaction which has its own equilibrium and they can only accelerate the reaction based on enzyme-substrate affinity. Nevertheless, proteasomes can accelerate one reaction more than others. Which reaction is favored by proteasome depends on its active site conformation and the affinity to the reactants. Therefore, different proteasome isoforms might accelerate reactions, which have different equilibrium between reactants and products. For example, in our experiments we sometimes observed that the amount of specific PSPs was higher than the amount of the less abundant reactant peptides. This was the case for the PSP ). Thus, one may legitimately say that, in the biochemical model represented here, the higher catalysis of the reactions k 2 /k -2 rather than the reactions k 1 /k -1 or the reactions k 3 /k -3 by one proteasome iso-form, because of a higher affinity for the peptides A and C, might lead to a substantially different amount of ⌺ PSP as well as of the spliced peptide AC. could lead to an overall reduced cleavage and remarkable ligation of the peptide bond. Our experimental data supporting this hypothesis and relative interpretation are reported in supplementary material.
Such a model would explain the correlation between the amount of splice reactants and PSP (Figs. 3, 4), the discrepancy between the SCS and the frequency of cleavages generating PSP P1 and P1Ј residues (Fig. 6) and also why the ligation sites are often represented by minor cleavage sites ( Fig. 6 and supplemental Fig. S13). Nevertheless, the structural model reported in Fig. 9 and supplemental Fig. S14 commands further structural and mechanistic elucidations to better understand the biochemical process and to verify the location of the PCPS binding site ␦.
The structural model of PCPS with two binding sites that allocate preferentially small peptides might support the hypothesis of an evolution of the MHC class I pockets according to the features of the PCP and PSP produced by proteasome. This hypothesis would be in agreement with the idea that, by exploiting a preexisting process, PCPS might have contributed to maximizing the diversity of antigenic peptides at low energy cost for cells during evolution (9,29). Indeed, both human and yeast 20S proteasomes generated a relative high amount of PSPs suitable for being presented on MHC class I molecules, because the limited length of the splice-reactants rendered the PSPs better MHC class I-restricted epitope candidates than the conventional PCPs (Table IVA).
This observation also implies that an important part of the MHC class I-restricted antigenic pool, produced by PCPS, may have been ignored so far. FIG. 9. Model of PCPS binding sites. A, Illustration of the PCPS binding sites ␥ and ␦ and the primed substrate binding site convergent to the active Thr1 of the proteasome. The PCPS binding site ␥ most likely coincides with the nonprimed substrate binding site (27), whereas the PCPS binding site ␦ could be different than the primed substrate binding site as it is illustrated here. Both PCPS binding sites ␥ and ␦ have a pocket that can accommodate 5-6 residue peptides, which can have a N-or C-terminal extension of an undefined length, respectively. The substrate is here represented by black (N-terminal to the cleavage) and white (C-terminal to the cleavage) circles, whereas gray circles represent the C-terminal splice-reactant. Each circle symbolizes an amino acid. B, Substrate binds with its N terminus the PCPS binding sites ␥/nonprimed substrate binding site and with its C terminus the primed substrate binding site. The C-terminal splice-reactant binds the PCPS binding site ␦ with its N terminus in proximity of the active Thr1. Substrate and C-terminal splice-reactant might bind at the same time the binding grooves of a catalytic ␤ subunit. During the cleavage by Thr1 of one of the substrate peptide bond the acyl-enzyme intermediate is formed and the C-terminal fragment of the substrate is released (step T1). Consequently, the C-terminal splice-reactant, which is already in proximity of the Thr1, might perform the nucleophilic attack with its N terminus to the acyl-enzyme intermediate leading to the formation of the new PSP (step T2). A 3D representation of the proposed model is shown in supplemental Fig. S14.