Ubiquitinated Proteome: Ready for Global?*

Ubiquitin (Ub) is a small and highly conserved protein that can covalently modify protein substrates. Ubiquitination is one of the major post-translational modifications that regulate a broad spectrum of cellular functions. The advancement of mass spectrometers as well as the development of new affinity purification tools has greatly expedited proteome-wide analysis of several post-translational modifications (e.g. phosphorylation, glycosylation, and acetylation). In contrast, large-scale profiling of lysine ubiquitination remains a challenge. Most recently, new Ub affinity reagents such as Ub remnant antibody and tandem Ub binding domains have been developed, allowing for relatively large-scale detection of several hundreds of lysine ubiquitination events in human cells. Here we review different strategies for the identification of ubiquitination site and discuss several issues associated with data analysis. We suggest that careful interpretation and orthogonal confirmation of MS spectra is necessary to minimize false positive assignments by automatic searching algorithms.

Ub (Ub) 1 is a small protein modifier that can be covalently attached to protein substrates to regulate their stability, localization, and/or activity. In a canonical reaction, ubiquitination is catalyzed by an enzymatic cascade composed of Ub activating enzyme (E1), Ub conjugate enzyme (E2), and Ub E3 ligase (1)(2)(3)(4). Substrate specificity is primarily determined by the E3 ligase (5), although the E2 enzyme may also have its influence (6,7). Ubiquitination reaction will generate an isopeptide bond between the carboxyl terminus of Ub and the -amino group of lysine in the substrate (8), and in rare cases, the N-terminal amine or Cys residue might also be modified (9,10). Ubiquitination is reversible by deubiquitination enzymes (DUBs), which can cleave off the Ub moieties from the substrate (11,12). In the human proteome, greater than 500 Ub E3 ligases and around 100 DUBs are present to modify potentially thousands of substrates (13)(14)(15). In addition, cell evolves with proteins with Ub binding domains (UBD) for Ub binding, further expanding the functional diversity of the Ub proteasome system signaling network (16,17). Substrates can be modified by different forms of Ub. They can be mono-ubiquitinated at one or multiple lysine residues or poly-ubiquitinated by poly-Ub chains. Ub contains seven internal lysine residues and all can serve as conjugation sites to build up poly-Ub chains that may impart different functions (18,19). Poly-Ub can regulate either proteolysis (such as K11 and K48 linkage) or nonproteolysis (K63 linkage) of the substrates (20 -22). Other poly-Ub linkages may also regulate proteasome-mediated protein degradation (18,(23)(24)(25). Furthermore, the existence of the N-terminal head-to-tail linear poly-Ub chain has been proposed, which may directly activate protein kinases in immune response (26,27). However, direct chemical evidence by mass spectrometry that supports such poly-Ub chain is still lacking. In addition to these homogenous poly-Ub chains, heterogeneous poly-Ub chains that form fork structures with mixed linkages have also been detected in vivo (28 -30). It is possible that some of these uncommon structures may be resistant to proteasome mediated degradation (31, 32).
Pinpointing the lysine residue(s) for Ub conjugation is essential for the molecular understanding of ubiquitination. The identification of ubiquitination sites provides the ultimate proof that the putative substrate is indeed ubiquitinated. Large-scale analysis of ubiquitination sites has been a daunting task with only a few successful attempts in the past (28 -30, 33-38). As the abundance of ubiquitinated species is generally too low to be directly detected by mass spectrometry, strategies that can enrich the substrates are required. A conventional way for ubiquitination enrichment is to utilize epitope-tagged Ub. Recently, several affinity reagents were developed for ubiquitinated substrate and peptide enrichment, including Ub-chain specific antibodies, the ubiquitination remnant antibody, and tandem Ub binding domains (29, 30, 39 -41).
The ubiquitinated species, once enriched and purified, will be identified by mass spectrometry (MS). Application of fast speed, high resolution and high mass accuracy mass spectrometers for "shotgun" proteomics has greatly increased peptide identification and potentially eliminated false-positive assignments (42)(43)(44)(45). Powered by technology advancement, we and others have identified several hundreds of lysine ubiq-uitination sites in human cells. Concurrently, the large amount of spectra collected at high precursor mass accuracy now pose new challenges for the accurate assignment of ubiquitination sites using automatic search algorithms.
Here we review different affinity purification strategies for ubiquitination enrichment, and discuss technical issues that one may encounter during data analysis. We present examples of mis-assignment by automatic search and offer practical solutions to these problems. We suggest that careful manual inspection of individual MS/MS spectrum is still an important final step in eliminating false positive identifications from automatic search.
Different Strategies to Isolate Ubiquitinated Proteins-Because of the relatively low cellular abundance, ubiquitinated substrates have to be enriched by Ub affinity reagents for large-scale identification of ubiquitination sites. Following enrichment, the isolated samples will then be separated to reduce the complexity before MS analysis. Currently there are at least three different strategies for large-scale ubiquitination profiling (Figs. 1A-1C): (A) Ub epitope-tag expressing systems, (B) Ub binding domains, especially tandem UBDs that offer high poly-Ub binding affinity, and (C) Ub antibodies (either poly-Ub linkage-specificity antibody or ubiquitination remnant-recognizing antibody).
The Ub epitope-tagging strategy was initially applied for profiling yeast ubiquitome. Using an engineered yeast strain where all Ub genes are null and the expression level of exogenous Ub is controlled, Peng et al. identified an impressive number of ubiquitination sites, totaling 110, using a LCQ mass spectrometer (28). A similar approach was adapted in the mammalian system, in which a tandem His-biotin tagged Ub was stably expressed in HeLa cells for efficient substrate isolation. In this experiment, ϳ50 ubiquitination sites were identified with a linear trap quadrupole (LTQ)-Orbitrap mass spectrometer (38). Most recently, Danielsen et al. reported the identification of ϳ750 lysine ubiquitination sites from two human cell lines that stably express HA-Ub, which stands by far the largest collection of ubiquitination sites (46). Remarkably, all the MS/MS spectra from this study were collected by LTQ Orbitrap Velos with higher-energy collisional dissociation method where both precursor and fragment peptides were analyzed in the Orbitrap with high mass accuracy, and with the inherent large mass-range of fragment peptides that together improve the overall spectra quality.
One of the major advantages of Ub epitope-tagging strategy is that substrates can be purified under protein denaturing conditions, thereby to a large extent eliminating nonsubstrate proteins. However, tagging Ub in animal tissues or pathological specimens is difficult with limited success (47)(48)(49). In addition, it remains uncertain to which extent Ub overexpression may interfere with normal cellular functions. Although it may not be a major concern for the engineered yeast strain, in which the endogenous Ub protein is eliminated and the level of exogenous Ub can be properly controlled, expression of the tagged Ub in mammalian systems entails the possibility of its influence on the physiology of the cell by changing the kinetics of the ubiquitination enzymatic reaction, interfering with the cell cycle, and/or by altering the subcellular localization of exogenous Ub (e.g. sequester the exogenous Ub in the nucleus). It is much less optimal to investigate linkage specific substrates by expression of a tagged lysine mutant Ub. Competition between the endogenous Ub with the exogenous mutant for conjugation (mixed poly-Ub chains) will render the data difficult to interpret. The presence of four Ub loci in the mammalian genome makes it a daunting task to analyze the substrates in a clear Ub gene knockout background. Recently, however, a strategy using tetracycline-inducible RNAi that eliminates endogenous Ub and expression of an exogenous K63R mutant Ub was developed that may partially circumvent this problem (50). This method holds the potential to profile linkage specific ubiquitination substrates.
UBDs can be used to isolate endogenous poly-Ub proteins. UBDs are small structural entities that have affinity to Ub. Over 20 different UBDs have been discovered (16,17,51,52).

FIG. 1. A schematic presentation of three different strategies to isolate ubiquitinated substrates.
A, The affinity purification strategy using Ub tag that allows purification under proteome denaturing condition. B, Tandem Ub binding domain for endogenous substrate enrichment. C, Poly-Ub chain specific antibody or ubiquitination signature remnant antibody. D, Three different contaminants of either protein precipitation during incubation, nonspecific binders to the matrix, or specific binding proteins to the affinity bait and poly-Ub conjugates exist during purification process.
The existence of a variety of UBDs implies their functional diversities in the Ub proteasome system signaling network. Although all UBDs bind Ub, the binding affinity varies greatly with the K d value in the range of several hundred millimoles to a few micromoles (16). However, the inherent low affinity of a natural UBD is generally not sufficient for large-scale purification of substrates.
The development of engineered tandem UBD with antibody-like affinity makes it an attractive reagent for substrate purification. We have used such quadruple-UBD (qUBD) to detect close to 300 lysine ubiquitination sites from human cells that were not treated with proteasome inhibitors using a high speed LTQ Velos Orbitrap mass spectrometry. A complication for this approach is that purification cannot be performed under denaturing conditions; thus, "contaminant proteins" (especially true for UBD proteins that bind poly-Ub chains) will be inevitably copurified and sequenced by MS, which may mask the identification of low abundant substrates. Moreover, UBDs are more favorable for poly-Ub chains than mono-Ub and this strategy could be biased toward substrates that are polyubiquitininated. One intriguing question about UBDs is whether they have preference toward specific type of poly-Ub linkage or the length of the chain, which remains to be fully investigated. A better understanding of UBD and Ub interaction may facilitate the design of poly-Ub linkage-specific reagents with higher affinity and selectivity.
High-affinity Ub antibodies are powerful tools for ubiquitination profiling. Previous attempts to generate high affinity Ub antibody from mice and rabbits were not successful. Recently, several linkage-specific monoclonal antibodies selected from phage display libraries were developed and have been shown to be useful for immunoprecipitation and Western blot (39,40). Whether or not these antibodies can efficiently profile linkage-specific substrates remains unclear, but it is expected that future experiments will reveal their utility soon.
Although tandem UBDs and presumably Ub antibodies are effective tools for substrate purification, ideally, comprehensive ubiquitination site mapping requires extra analytical work to separate ubiquitinated peptides from the unmodified ones. An excellent reagent for this purpose has been recently introduced by Xu et al. The Gly-Gly remnant antibody was generated against the Gly-Gly signature peptides (29). Because this antibody recognizes specifically the modified peptides, it is more efficient for ubiquitination site profiling than any other method that isolates the entire substrates. The authors have further demonstrated the power of this reagent by profiling hundreds of ubiquitination sites in human cells. It is noteworthy that as Gly-Gly remnant antibody does not distinguish ubiquitination from certain Ub-like modifications such as ISG15 or NEDD8, as a result, it may be necessary to couple with other purification method in order to obtain a homogenous population of ubiquitinated species for peptide IP.
Precautions to be Taken in Affinity Purifications-One of the major challenges for large-scale substrate purification is to overcome the interference from contaminant proteins. There are mainly three types of protein contaminants for Ub affinity purification (Fig. 1D): (1) nonspecific binders to the solid matrix (e.g. agarose beads) for affinity reagents, (2) precipitated proteins that are accumulated during incubation, and (3) specific binders to poly-ubiquitinated proteins and/or Ub itself.
Although it is very difficult to completely eliminate them all, the amount of contaminants can be significantly reduced by (1) carefully removing the insoluble precipitations following highspeed centrifugation, (2) shortening the affinity incubation time, (3) increasing protein solubility with stronger detergent such as 0.1% SDS, and (4) washing the beads with higher concentration of salt.
The second challenge during purification is to preserve the ubiquitination by restricting the DUB activity. There are ample DUBs in the cells that readily remove Ub from the conjugates. The use of DUB inhibitors of both Cys alkylation (such as iodoacetamide, choloroacetamide, or N-ethylmaleimide) and zinc chelating chemical (1, 10--phenanthroline) can alleviate the loss of poly-Ub and preserve the intact modified substrates (53). In addition, the use of freshly prepared cell lysate might be helpful to preserve ubiquitinated species.
It has been shown that excessive Cys alkylation by iodoacetamide (IAA; 55 mM) can lead to chemically induced artifact with the same atomic composition of Gly-Gly remnant (C 4 H 6 N 2 O 2 ; 114.043 Da) that mimics lysine ubiquitination (54). Use of an alternative alkylation chemical such as chloroacetamide might avoid the occurrence of such artifact. Further study suggests that the pseudo ubiquitination signature can be significantly reduced or eliminated at room temperature or at low dosage of IAA (such as at 1 mM) (18). When use IAA as the alkylation reagent for DUB inactivation, it is critical to monitor the conditions during sample preparation to avoid excessive alkylation and/or high temperature that induce ubiquitination artifacts.
Potential Caveats of Ubiquitination Site Assignments and False Discovery Rate Estimation-Ubiquitination can be identified by mass spectrometry with the detection of a mass shift of 114.043 Da-the ubiquitination signature that is derived from the di-glycine remnant of Ub following trypsin cleavage. Ubiquitination peptides (as well as sites) are identified by searching the protein sequence database to match a tryptic peptide sequence with the addition of the ubiquitination signature mass on the particular amino acid, typically on Lys residues. Such identifications are then evaluated with corresponding scores derived from various statistical models. In some cases, a scoring threshold is setup with an empirical filter followed by manual verifications (55)(56)(57)(58). A more widely practiced strategy is the "target-decoy" searching in which the decoy hits are considered incorrect to filter the unreliable matches (59,60). The target-decoy strategy is useful in estimating false discovery rate (FDR), which allows the applica-tion of identical criteria over various conditions. This has proven to be successful for peptide sequence identification, but becomes complicated when post-translational modifications (PTMs) are introduced. PTMs will certainly expand the search space; however, the impact of such expansion for different PTMs may not be identical. For instance, because of the discrete nature of amino acid mass, phosphorylation (monoisotopic mass of 79.966 Da) would not be confused with any amino acid and thus may have limited impact on the search space; such a scenario does not apply for the ubiquitination remnant (114.043 Da), which is di-glycine by nature and has indistinguishable mass with Asn. As a result, the search space is potentially expanded with the possibility of observing a 114.043 Da increase to a candidate peptide sequence assigned to either lysine ubiquitination or the extra amino acid Asn or Gly-Gly in the candidate sequence. The current implementation of FDR calculation has not taken such a situation into consideration. It is expected that the value of FDR allowing for modification is potentially inflated, and the extent of inflation is dependent on the nature of modification.
A considerable amount of false positive identification of ubiquitinated peptides was observed at 1% FDR. We searched 62 liquid chromatography-tandem MS (LC-MS/MS) runs to tentatively identify ubiquitinated peptides using two most popular search engines (Sequest and Mascot) (61,62). Interestingly, at 1% FDR, Sequest (filtered with monotonic score of Xcorr) identified ϳ500 unique ubiquitinated peptides, whereas Mascot (with ion score) identified 223 and is much more reliable. Manual verification, however, could only confidently assign 47% of the unique identifications (IDs) recovered by both algorithms combined at 1% FDR (data not shown).
It was also noticed that various algorithms may have different sensitivity and specificity for ubiquitination ID. For example, Xu, et al. (29) observed that the overlap of ubiquitinated peptide IDs from two different algorithms is less than 60%. A similar observation was made by us (Fig. 2). As expected, the common IDs made by the two algorithms tend to pass manual verification readily, but the unique ones contain more false positives, which is especially true for the assignments by Sequest (Xcorr). It is very risky to combine the search results for ubiquitination IDs by different algorithms, and the 1% FDR as defined for the search of unmodified peptides does not apply in the search of ubiquitinated peptides.
As single scores used for target-decoy search strategy might be less able to provide enough resolution to separate correct and incorrect assignments, use of composite scores that incorporate multiple factors (e.g. mass accuracy, peptide charge and length distribution, and enzymatic specificity) may increase the sensitivity and presumably the specificity of the peptide identifications (63,64). In addition, it is also worth knowing that a handful of factors such as the type of mass spectrometer, precursor ion fragmentation pattern, or the target-decoy strategy used for search (separated, concatenated, or shuffled sequence database) may also impact the final results (65)(66)(67). A comprehensive evaluation of different parameters for ubiquitination search will certainly help for more accurate estimation of FDR.
When low resolution traps such as LTQ (where mass accuracy of precursor mass is generally less than 1 Da) is used, false-positive assignments may occur to peptides that contain internal lysine residues adjacent to certain amino acids (such as Leu, Asn, or Asp that has similar mass to that of Gly-Gly) (68). Most of such ambiguous assignments could be distinguished when the parent ions are analyzed by a high resolution mass spectrometer with a mass accuracy of several parts per million (ppm). Use of high mass accuracy mass spectrometers for proteome-wide PTM mapping (45) is always preferable. Measuring fragmentation ions with high mass accuracy using the Orbitrap as a detector might also help for ubiquitination site ID, although compromised sensitivity is generally expected.
It is noteworthy that even for high mass accuracy data of 1% FDR with great probability scores, false positive identifications occur by mislocalized modification sites or ambiguous fragmentation pattern. For example, as shown In Fig. 3A, as Asn has near-identical mass (114.043 Da) to that of the Gly-Gly remnant, miscleavage of Lys-Asn at either every carboxyl terminus of a protein or the Lys-Asn-Lys (Arg) motif can be mis-assigned to the adjacent lysine residue as ubiquitination. Moreover, when iodoacetamide is used for Cys alkylation, which requires the addition of 57.021 Da (the mass of carbamidomethyl) as a dynamic modification, two unmodified Cys with the exact ubiquitination signature mass may confuse the program for lysine ubiquitination assignment (Fig. 3B). Indeed, based on our observation, virtually all C-terminal ubiquitination candidates are improperly assigned. In addition, a significant proportion of ambiguous or weak assignments (Fig. 3C) happens in the 1% FDR data (Ͼ30%) from the Sequest search, although to a much less extent when Mascot is used. Fig. 3D shows a relatively challenging scenario that is difficult to judge by the MS/MS patterns alone. However, two other important parameters suggest it to be more likely a false-positive assignment. First, the full mass accuracy (⌬M) of the spectrum is Ͼ10 ppm, which is considered the outlier of the mass accuracy distribution (which follows the Gaussian distribution with Ͻ2ppm). Second, the peptide coverage for the particular protein is so too low (Ͻ3%) that only this "modified" form, but not any other peptide derived from this protein, is identified. In this regard, the peptide coverage is informative and should be used as an additional constraint for modification assignment.
Although it has been reported that trypsin is capable of cleaving the C-terminal ubiquitinated lysine of synthetic Ub peptides (69), we have not observed such spectra, consistent with a previous report using synthetic Gly-Gly peptides (68). We did detect a larger ubiquitination remnant of LRGG (383.228 Da) on a small proportion of poly-Ub peptides (Figs. 4A-4C). The LRGG-modified poly-Ub forms tend to be slightly more hydrophilic and eluted earlier than the typical Ub-linkage peptides (data not shown). Adding another ubiquitination parameter of LRGG-modified lysine (ϩ383.228Da) for the database search might increase the chance for ubiquitination identification.
Concluding Remarks-The availability of the methods described in this review has opened a door for profiling of ubiquitinated substrates. However, very few published studies focused on the quantitative aspects of ubiquitination analysis, such as ubiquitination substrates, Ub E3 ligase complex, or linkage quantification (70 -73). The greatest challenge is to systematically identify E3 ligase or DUB substrates by quantitative proteomics so that one can pair the substrates with the enzymes. To achieve such a goal, the first step is to comprehensively identify ubiquitinated substrates under different conditions. It seems likely that most of the ubiquitinated species remain undetected, as suggested by the rather small overlap of ubiquitination data sets reported by three groups recently (29,30,46), although part of the reason can be attributed to different purification strategies or data quality. A genetic approach based on high-throughput fluorescence measurement called Global Protein Stability Profiling has been successfully developed to correlate protein level and stability to potential Ub E3 ligases (74). This, however, is an indirect method, and would preclude the detection of substrates whose ubiquitination does not directly control protein turnover. A direct method would require quantitative profiling of the global "substrate ubiquitination status," which will reveal all classes of substrates.
With the advancement of Ub affinity purification and high resolution protein and peptide separation, paralleled with the improvement of mass spectrometers, the day for ubiquitination profiling at proteome-level is fast approaching. One of the major challenges to Ub proteomics is matching substrates to the corresponding Ub enzymes and studying the dynamics of ubiquitinated proteomes under different biological conditions such as different genetic backgrounds, or with specific drug treatment. As more ubiquitination data accumulates, it is equally important to improve the current statistical methods for more accurate estimation of FDR. Although careful verification of each single MS/MS spectra that tentatively identifies the ubiquitination site is the most reliable methods, it requires experience and will become a bottleneck for global profiling. Ub proteomics is now at the stage of "exploding" with numerous exciting opportunities and new challenges.