Intrinsically Disordered Segments Affect Protein Half-Life in the Cell and during Evolution

Summary Precise control of protein turnover is essential for cellular homeostasis. The ubiquitin-proteasome system is well established as a major regulator of protein degradation, but an understanding of how inherent structural features influence the lifetimes of proteins is lacking. We report that yeast, mouse, and human proteins with terminal or internal intrinsically disordered segments have significantly shorter half-lives than proteins without these features. The lengths of the disordered segments that affect protein half-life are compatible with the structure of the proteasome. Divergence in terminal and internal disordered segments in yeast proteins originating from gene duplication leads to significantly altered half-life. Many paralogs that are affected by such changes participate in signaling, where altered protein half-life will directly impact cellular processes and function. Thus, natural variation in the length and position of disordered segments may affect protein half-life and could serve as an underappreciated source of genetic variation with important phenotypic consequences.


INTRODUCTION
Protein degradation is the endpoint of gene expression, and correct turnover of proteins is essential for cellular function. Indeed, protein half-life impacts virtually all cellular processes including the cell cycle (Pagano et al., 1995), DNA repair (Lakin and Jackson, 1999), apoptosis and cell survival (Rutkowski et al., 2006), alternative splicing (Irimia et al., 2012), circadian rhythm (van Ooijen et al., 2011), cell differentiation (Ramakrishna et al., 2011), development (Hirata et al., 2004, and immunity (Babon et al., 2006). Altered protein half-life can lead to abnormal development and diseases such as cancer and neurodegeneration (Ciechanover, 2012). For instance, artificially extending the half-life of the Hes7 transcription factor by $8 min severely disorganizes embryonic development in mice (Hirata et al., 2004). Missense mutations in succinate dehydrogenase that increase turnover rates contribute to neuroendocrine tumors (Yang et al., 2012).
The proteasome mediates controlled and selective degradation of most proteins in eukaryotic cells, and access to the proteasome is key to controlling the half-life of substrates (Goldberg, 2003;Hershko and Ciechanover, 1998). Substrate recruitment to the proteasome is primarily mediated through their polyubiquitination by ubiquitin ligases (Komander and Rape, 2012;Ravid and Hochstrasser, 2008;Varshavsky, 2012). This mechanism regulates the half-life of proteins, which ranges from seconds to days (Belle et al., 2006;Kristensen et al., 2013;Schwanhä usser et al., 2011). The large number of ubiquitin ligases and deubiquitinating enzymes encoded in eukaryotic genomes highlights the importance of this system (Hutchins et al., 2013;Komander et al., 2009). Although the role of ubiquitination in delivering proteins to the proteasome is well established, it remains unclear to what extent intrinsic structural features of substrates influence their half-life once bound to the proteasome and whether such features have been exploited to alter half-life during evolution.
An important feature implicated in affecting protein half-life is the presence of polypeptide regions that do not adopt a defined 3D structure, typically called intrinsically disordered, or unstructured regions (van der Lee et al., 2014). Disordered regions are present in a large number of eukaryotic proteins and play key roles in protein function along with structured domains (Babu et al., 2012). A number of genome-scale studies have investigated the relationship between the overall fraction of disordered residues of a protein and its half-life, but these have yielded contradictory results ranging from no correlation (Yen et al., 2008) to weak correlation (Tompa et al., 2008) to a strong effect (Gsponer et al., 2008). The reason for the inconsistencies is perhaps that these studies investigated correlations without the guidance provided by the biochemical mechanism by which disordered regions might contribute to protein turnover.
In this work, we develop a theory of how disordered segments influence protein half-life, through a systematic analysis of multiple data sets describing sequence, structure, expression, evolutionary relationships, and experimental half-life measurements from both unicellular and multicellular organisms. We present evidence that proteins with a long terminal or internal disordered segment have a significantly shorter in vivo half-life in yeast on a genomic scale. The same relationship is found in mouse and human. Upon gene duplication, divergence in terminal and internal disordered segments leads to altered half-life of paralogous proteins. Many affected paralogs participate in signaling pathways, where altered half-life will influence signaling outcomes. We suggest specific biochemical mechanisms by which disordered segments may influence degradation rates, how these changes might modulate cellular function and phenotype, and how natural variation in the length and position of intrinsically disordered protein regions may contribute to the evolution of protein half-life.

RESULTS
To investigate the relationship between the structural architecture of proteins and their cellular stability, we inferred the disorder status of every residue in the proteomes of yeast, mouse, and human using the DISOPRED2 (Ward et al., 2004), IUPRED (Dosztá nyi et al., 2005), and PONDR VLS1 (Obradovic et al., 2005) software. In vivo protein half-life data for yeast were obtained from a study that used strains in which proteins expressed from their endogenous promoter contained a tandem affinity purification (TAP) tag at the C terminus (Belle et al., 2006). After inhibition of protein synthesis, protein abundance was measured at three time points by western blotting with TAP antibodies. Protein turnover in mouse and human cells was measured using isotope labeling and mass spectrometry (Kristensen et al., 2013;Schwanhä usser et al., 2011). We combined the information on protein half-life, and other large-scale data sets, with the position and length of disordered segments and analyzed the data using appropriate statistical tests (3,273 proteins in yeast, 4,502 in mouse, and 3,971 in human; Experimental Procedures; Table S1A; Figure S1A).

Long N-Terminal Disordered Segments Contribute to Short Protein Half-Life In Vivo
We first classified yeast proteins into two groups depending on the length of the disordered termini, treating the N and C termini separately: those with short (%30 residues) and those with long (>30 residues) disordered tails ( Figure 1A). The length cutoff was based on recent molecular models of the proteasome (da Fonseca et al., 2012;Lander et al., 2012;Lasker et al., 2012) and on in vitro biochemical studies using purified proteasomes showing that there is a critical minimum length of $30 residues that allows a disordered terminus of a ubiquitinated substrate to efficiently initiate degradation (Inobe et al., 2011). Indeed, analysis of the yeast data confirms that protein half-life does not depend linearly on the length of disordered segments (Figure S1B;Supplemental Experimental Procedures).
Proteins with a long disordered N terminus have a significantly shorter half-life compared to proteins with a short disordered N terminus (p = 5 3 10 À6 , Mann-Whitney U test, a nonparametric test for assessing whether two samples come from the same un-derlying distribution [H 0 ]; Figure 1B). The approach for measuring half-lives in yeast involved C-terminal tagging with a TAP tag, which is 186 amino acids long and largely structured. Since all proteins had identical C termini due to the TAP tag, we should see little difference in half-life between proteins with long and short C-terminal disorder as characterized from the original genome sequence. Indeed, these groups display similar distributions of protein half-life (p = 0.99, Mann-Whitney U test; Figure 1C).
In order to assess to what extent the disordered state of the N terminus affects half-life, we performed three analyses. First, we investigated proteins with a highly structured N terminus (>30 residues predicted to be structured) and found that they display a longer half-life compared to proteins with a long disordered N terminus (p = 2 3 10 À7 , Mann-Whitney U test; Figure 1D). Second, we classified the proteome into three groups of roughly equal size, based on their half-life: (1) short-lived proteins (halflife % 30 min), (2) medium half-life proteins (31-70 min), and (3) long-lived proteins (>70 min) ( Figure S1A). The distributions of the length of N-terminal disorder differ significantly across the three groups in a manner consistent with the above observations: proteins with a shorter half-life tend to have longer N-terminal disordered segments (p = 3 3 10 À6 , Kruskal-Wallis test, which extends the Mann-Whitney U test to three or more groups, Figures 1E and S1F). Again, this relationship is not true for the C terminus, because the TAP tag causes all proteins to have the same C terminus (p = 0.2, Kruskal-Wallis test; Figures S1E and S1F). Third, we quantified the effects of disordered segments on half-life by comparing conditional probabilities for finding proteins with and without long N-terminal disorder within specific half-life ranges. The likelihood of finding a protein with a short half-life among those that have long N-terminal disorder was two times higher than the ''reverse'' probability of finding proteins with long N-terminal disorder among those with short half-life (p[short half-life given long N-terminal disorder] = 0.44; p[long N-terminal disorder given short half-life] = 0.18; Tables 1A and S3A). This indicates that the presence of a long disordered N terminus often results in short half-life but proteins with short half-life need not always have a long N-terminal disordered segment. Thus, the presence of a disordered N terminus is linked to short half-life, but other properties also affect protein turnover (see Discussion).

Internal Disordered Segments Also Contribute to Short Protein Half-Life
The proteasome not only digests proteins starting from their termini but also can cleave or initiate from disordered regions in the middle of the chain (Fishbain et al., 2011;Liu et al., 2003;Piwko and Jentsch, 2006;Prakash et al., 2004;Takeuchi et al., 2007;Zhao et al., 2010). The catalytic residues for proteolysis are buried deep within the proteasome core particle, accessible only through a long narrow channel, and the same is true for the ATPase motor that drives protein substrates through the degradation channel (da Fonseca et al., 2012;Lander et al., 2012;Lasker et al., 2012). To reach these sites, a disordered segment in the middle of a protein has to be longer than a segment at a protein terminus (Fishbain et al., 2011). Therefore, to investigate whether the presence of internal disorder influences protein half-life, we identified proteasomesusceptible internal disordered segments as continuous stretches of at least 40 disordered amino acids (see Discussion). Proteins that contain such an internal disordered segment have a significantly shorter half-life than proteins that do not (p = 3 3 10 À29 , Mann-Whitney U test; Figure 2A). This observation is robust to our choice of cutoff used for detecting internal disordered segments, but systematically varying the length cutoff revealed that maximal difference in median half-life is obtained for a value of 40 amino acids (Table S1E). Further, the relationship is independent of N-terminal disorder, as the half-life of proteins with internal disordered segments is significantly lower than of those without, regardless of the length of the disordered terminus ( Figure 2B). (A) A total of 3,273 yeast proteins were grouped based on the length of the disordered segment at the N terminus. Long (dark red) and short (light red) terminal disordered segments were defined as stretches of >30 and %30 disordered residues. (B-D) Boxplots of protein half-life distributions. Proteins were classified based on the length of the disordered segment at the N terminus (B) or the C terminus (C) and the presence of N-terminal disordered or structured segments (D, long N-terminal structured regions [dark gray] were defined as >30 structured residues). (E) Boxplots of the distributions of N-terminal disorder length for different half-life groups, indicated with schematic exponential degradation curves (from short half-life [dark green] to long half-life [light green]). Central boxplot notches mark the median and the 95% confidence interval. Colored boxes represent the 50% of data points above (30.75) and below (30.25) the median (30.50). Vertical lines (whiskers) connected to the boxes by the horizontal dashed lines represent the largest and the smallest nonoutlier data points. Outliers are not shown to improve visualization. p values reported are from Mann-Whitney U (B-D) and Kruskal-Wallis (E) tests. p values, the number of data points (n), and differences between the half-life medians of the compared groups ðDHÞ are shown to the right. See also Figures S1 and S3 and Table S1.
To quantify the contribution of internal disorder to protein half-life, we computed conditional probabilities for finding proteins with and without internal disordered segments within specific half-life ranges. The probability of observing a protein with a short half-life among those that contain an internal disordered segment is high and comparable to the ''reverse'' probability of finding a protein containing an internal disordered segment among those with short half-life (p[short half-life given internal disordered segment] = 0.45; p[internal disordered segment given short half-life] = 0.49; Tables 1B and S3B). This suggests that presence or absence of an internal disordered segment is an important determinant of the half-life of a protein.

Terminal and Internal Disordered Segments Have Combined Effects on Half-Life
Interestingly, proteins with multiple internal disordered segments have even shorter half-lives than proteins with a single segment (Figures 2C and S2C). This prompted us to investigate the combinatorial effects of terminal and internal disordered segments. Indeed, proteins that have both a long terminal disordered segment and an internal disordered segment tend to have the shortest half-lives ( Figure 2D). Furthermore, the probability of having either a terminal or an internal disordered segment given that a protein has a short half-life is the highest (p[long N-terminal or internal disorder given short half-life] = 0.57; Table 1C). Consistent with this observation, we find that the probability of having both terminal and internal disordered segments among proteins with a long half-life is very low (p[long N-terminal and internal disorder given long half-life] = 0.04; Table 1C). Taken together, these results suggest that disordered segments are modular in their ability to affect protein halflife and that these segments can act in a combinatorial manner to accentuate their effects.
The Effects of Disordered Segments on Half-Life Are Independent of the Overall Disorder Degree So far, we have investigated the effects of continuous stretches of disordered residues (i.e., disordered segments) on protein turnover. However, the fraction of disordered residues (i.e., overall degree of disorder), which is an estimate of the packing, folding, and structural stability of a protein, also correlates with half-life, although previous studies disagree on the extent of the effect (Gsponer et al., 2008;Tompa et al., 2008;Yen et al., 2008). Proteins with a greater overall disorder degree generally contain longer terminal and internal disordered segments  Figure S3A). To determine whether the effects of disordered segments on protein turnover (Figures 1 and 2) are independent of the overall degree of disorder, we matched proteins that have a similar fraction of disordered residues but have varying combinations of disordered segments (long or short N-terminal disorder and/or presence or absence of internal disordered segments; Supplemental Results; Figure S3B).
Comparison of the half-life distributions of proteins from different classes with similar overall disorder degrees (Figure S3C) reveals similar trends as the analysis that uses all proteins ( Figure 2D): proteins with both long N-terminal and internal disordered segments typically have the shortest half-lives, followed by proteins with either long internal or long N-terminal disordered segments. Proteins without disordered segments typically have the longest half-lives. The effect sizes of the differences between the half-life distributions are comparable when using all or only proteins with matched overall disorder degree ( Figure S3D, upper triangles). Furthermore, most half-life distributions are significantly different, though p values are less significant due to smaller sample sizes ( Figure S3D, lower triangles). (A) Boxplots of protein half-life distributions for different groups of yeast proteins that contain (dark red) or lack (light red) an internal disordered segment (defined as a continuous stretch of R40 disordered residues), subclassified based on (B and D) the length of N-terminal disorder (as in Figure 1: long, >30 residues or short, % 30 residues) and (C) the number of internal disordered segments (from zero, top, to three or more, bottom). Each protein is present in only one category per panel. See Figure 1 for further information. See also Figures S2 and S3 and Tables S1 and S4.
These results indicate that long disordered segments at the N terminus or internally are important intrinsic features that contribute to shorter protein half-life in living cells and that these effects are independent of the fraction of disordered residues across the whole protein. It should, however, be noted that this does not rule out an additional effect of the overall disorder degree on half-life, i.e., among proteins that do or do not have a disordered segment, proteins with higher degrees of overall disorder tend to have a lower half-life compared to those with a lower degree of disorder (see Discussion).

Disordered Segments Have Direct
Effects on Half-Life Rather than Acting Indirectly by Embedding Destruction Signals Disordered segments could influence half-life either indirectly, by embedding short peptide motifs that serve as destruction signals such as ubiquitination sites or docking sites for ubiquitinating enzymes (Ravid and Hochstrasser, 2008), or directly, by better initiating degradation by the proteasome (Gö dderz et al., 2011;Inobe et al., 2011;Peñ a et al., 2009;Piwko and Jentsch, 2006;Prakash et al., 2004;Takeuchi et al., 2007;Verhoef et al., 2009;Zhao et al., 2010). To investigate the indirect effects, we collected data on four known destruction signals: experimentally determined ubiquitination sites as well as predicted KEN box motifs, destruction box motifs, and PEST sequences. More than half (56%) of all proteins with long terminal or internal disordered segments do not contain any of these destruction signals in their disordered segments that could account for the short half-life (Supplemental Results). Consistently, half-life distributions of proteins with and without predicted destruction signals within the disordered regions were not significantly different (p = 0.1 for N-terminal disorder; p = 0.2 for internal disorder; Mann-Whitney U test; Supplemental Results). Indeed, the majority of experimentally determined ubiquitination sites involved in degradation are in structured rather than disordered regions (Hagai et al., 2011).
Furthermore, sequence analysis revealed that disordered segments of proteins with short half-life lack enriched, uncharacterized sequence motifs that could result in rapid degradation, for example by serving as docking sites for ubiquitin ligases (Supplemental Results). Together, these findings suggest that disordered segments do not affect half-life primarily indirectly by embedding destruction motifs. Rather, the general characteristics of disordered segments seem to directly result in short half-life by forming initiation sites for degradation by the proteasome.

Disordered Segments Have Similar Effects on Protein Turnover in Mouse and Human
Given that the ubiquitin-proteasome system and the architecture of the proteasome itself are conserved from yeast to mammals (da Fonseca et al., 2012;Lasker et al., 2012), we hypothesized that the observed relationships may be evolutionarily conserved. We investigated the effects of terminal and internal disordered segments on protein degradation in mouse NIH 3T3 fibroblasts (Schwanhä usser et al., 2011) and in human THP-1 myelomonocytic leukemia cells (Kristensen et al., 2013) and found similar trends: the presence of N-terminal and internal disordered segments is linked with significantly faster protein turnover in both mouse (shorter half-lives) and human (higher degradation rates) ( Figure 3; Table S5). In the mouse and human studies, protein degradation was monitored using isotope labeling and mass spectrometry, so that proteins did not need to be tagged at the either terminus (in contrast to the yeast study). Therefore, we could assess the contribution of the disordered segment at the C terminus and found that proteins with a long C-terminal disordered segment display increased turnover in mouse and human, though in mouse, the effect seems smaller than for N-terminal disorder and is not statistically significant ( Figures  3B and 3E). These results collectively suggest that the effects of disordered regions on protein half-life are evolutionary conserved.

Divergence in Disordered Segments during Evolution Can Impact Protein Half-Life
Our observations suggest that protein turnover rates could be tuned by divergence in terminal or internal disordered segments during evolution ( Figure 4A). To test this, we investigated protein pairs in yeast that arose from gene duplication (i.e., paralogs) (Experimental Procedures). Since paralogs are encoded within the same genome, this makes it possible to compare half-lives between evolutionarily related proteins under similar conditions. We specifically asked whether paralogs diverged in the length of N-terminal disorder or in the number of internal disordered segments (but are otherwise largely similar) and, if they did, whether this corresponded to changes in their half-life. Protein half-life data are available for both paralogs of 1,440 pairs (Table S7), and many of these paralog pairs have diverged in the length and number of terminal and internal disordered segments (Figures 4B and 4C,Tables S7 and S8).
We classified the pairs of paralogs into (1) those that during evolution maintained N-terminal disorder of roughly equal length (i.e., both proteins have a short [%30 residues] or both have a long [>30 residues] disordered segment at the N terminus; 1,049 pairs) and (2) pairs with disordered N termini of different length (i.e., one protein of the pair has a short and the other has a long disordered segment; 391 pairs). Paralogous protein pairs that diverged in the length of N-terminal disorder show significantly larger differences in half-lives than pairs that maintained roughly equal N-terminal disorder (p = 9 3 10 À6 , Mann-Whitney U test; Figure 4B), in a manner that agrees with the trends reported above: the protein with the longer N-terminal disordered segment usually has a shorter half-life than its paralog with a shorter disordered segment. More precisely, (1) paralogous proteins with similar length of terminal disorder tend to Boxplots of the distributions of half-life values in Mus musculus (A-C), and relative degradation rates in Homo sapiens (D-F), for proteins with long and short N-terminal, C-terminal, and internal disordered segments. Note that the scale for protein half-life is in hours for mouse, rather than minutes as in yeast. Values are reversed for the human data: proteins with a short half-life have a high relative degradation rate. See Figure 1 for further information. See also Table S5. have similar half-life values (median difference in half-life is close to zero; Figure 4B, bottom boxplot; Table S6A), and (2) the halflife of proteins with longer N-terminal disordered regions tends to be 14 min shorter (median) than that of their paralogous partners ( Figure 4B, top boxplot; Table S6A). The converse is also true, as paralogous pairs with large half-life changes show a large divergence in the length of N-terminal disorder (Supplemental Results; Figure S4A). A 14 min difference in half-life between paralogs is substantial in the context of yeast biology, as this is comparable to the time from division to budding (G1 phase) in laboratory strains growing exponentially in rich media at 30 C, which is 15-37 min (Di Talia et al., 2007). Thus, altered half-life due to divergence in the length of terminal disorder could have a significant impact on the duration for which a protein can impart its function in a cell and thus affect cellular behavior.
Paralogous proteins that differ in the number of internal disordered segments also show significantly larger changes in halflife than pairs with the same number of internal disordered regions (p = 1 3 10 À5 , Mann-Whitney U test; Figure 4C). The half-life of proteins with more internal disordered regions tends to be 7 min shorter (median) than that of their paralogs (Table  S6A), which again can be a considerable amount of time considering the doubling time of yeast. In paralogous pairs that have the same number of internal disordered segments but diverged in the total internal disorder length (i.e., the sum of all internal disordered segments), the half-life of the protein with the longer total internal disorder also tends to be shorter (median half-life difference is 5 min; Table S6A).
Analysis of conditional probability values allowed us to quantify the trends (Tables 2 and S8). The majority (73%) of paralogous pairs that diverged in the length of terminal or number of internal disordered segments show a consistent change in half-life: the paralog with the longest terminal disordered segment or largest number of internal disordered segments tends to have the shorter half-life (p[shorter half-life given divergence of N-terminal or internal disorder] = 0.73; Table 2C). This effect is large even if only segments at the N terminus or only internal segments are considered (p [shorter half-life given divergence of N-terminal disorder] = 0.64 and p[shorter half-life given divergence of internal disorder] = 0.58 ; Tables 2A and  2B). Again, the converse is also true as the probability of observing a paralogous pair that has diverged in both terminal and internal disorder, and in which the paralog with the longest terminal disordered segment and most internal disordered segments has the longer half-life, is very small (p[divergence of N-terminal and internal disorder given longer half-life] = 0.01; Table 2C). Taken together, the results suggest that the gain or loss of long terminal or internal disordered segments can significantly influence the half-life of a protein upon gene duplication during evolution. Thus evolution of intrinsic features such as disordered segments may be an important contributor to the degradation rate of proteins. A B C Figure 4. Divergence in Disordered Segments during Evolution Can Impact Protein Half-Life (A) Schematic depiction of how the half-life of paralogs could be altered by changes in N-terminal and/or internal disordered segments during evolution. The dark and light green degradation curves denote a short and long half-life. This schematic is not intended to cover all possible scenarios for divergence of disordered segments between paralogs. (B) Distributions of half-life differences (DH) in pairs of yeast paralogs, grouped according to the difference in the length of their N-terminal disordered segments. Top: one paralog has a short and the other paralog a long disordered N terminus (SL). Bottom: both paralogs have short (both %30 residues; SS) or both have long (both >30 residues; LL) disordered N termini. (C) Distribution of half-life differences (DH) in paralog pairs, grouped according to the difference in the number of internal disordered regions (DI). Top: pairs where one of the two paralogs has a higher number of internal disordered segments (DI R 1). Bottom: pairs with identical numbers of internal disordered segment (DI = 0). Each paralog pair is arranged so that DL = L 1 À L 2 (B) and DI = I 1 À I 2 (C) are always positive (i.e., L 1 R L 2 and I 1 R I 2 ). This order is used for the DH calculation (that is, the half-life of the paralog with the shortest N-terminal disorder, or the smallest number of internal disordered segments, will be subtracted from the half-life of the other one; DH = H 1 À H 2 ). As a result, DH will be negative for pairs where an increase in N-terminal or internal disorder coincides with a shorter half-life (Experimental Procedures). For DI = 0 (C, bottom), two DH distributions were obtained by ordering the paralogs within a pair according to the total length of all internal disordered segments (increasing and decreasing to simulate gain and loss of internal disorder length during evolution; Table S6A). See Figure 1 for further information. See also Figure S4 and Table S6.
Functional Analysis and Literature Evidence Support the Role of Disordered Segments in Governing Protein Half-Life and Phenotype Disordered segments that affect half-life could be important for governing phenotypes, because precise protein turnover is important for many cellular processes. An analysis of function annotations of proteins with long N-terminal or internal disordered segments revealed enrichment for protein kinases and phosphoproteins and associations with regulatory and transcription functions, as well as cell-cycle processes (Tables S1F and S4B). Paralogs that have diverged in terminal or internal disordered segments have similar functions and are additionally involved in, for example, ATP and nucleotide binding and ubiquitin conjugation activities (Table S6C). These are all functions involved in signaling and regulation, where alteration of protein half-life can significantly affect the duration of activity of the protein and thereby impact cellular phenotype (Legewie et al., 2008) (see Discussion).
A literature search revealed several examples where changes in disordered segments lead to phenotypic differences through altered protein half-life. Stabilizing the half-life of the yeast kinase Ime2, a positive regulator of meiosis, by deletion of an internal disordered region results in altered sporulation efficiency (Guttmann-Raviv et al., 2002). Similarly, deletion of a highly disordered 47-amino-acid stretch at the N terminus of yeast Cdc6 prevents its degradation, although in this case, the deletion also abolishes the interaction with a ubiquitin ligase complex (Drury et al., 1997). Deletion of the first 31 residues of the human nuclear receptor Nurr1 significantly reduces its degradation by the ubiquitin-proteasome pathway and consequently leads to increased activation as a transcription factor (Alvarez-Castelao et al., 2013). Interestingly, the deleted region corresponds completely to a putative disordered segment at the N terminus and the size of the deletion could now explain the effects on half-life. These selected examples illustrate the importance of disordered segments for maintaining correct protein turnover.

DISCUSSION
Ubiquitination by E3 ligases has a dominant role in deciding when a protein gets targeted for proteasomal degradation, but it has remained unclear how intrinsic features affect the lifetime of a protein and whether such features have been exploited to alter half-life during evolution. Here, we uncovered genome-scale principles of how intrinsically disordered segments influence protein turnover in the cell and during evolution. On a genomic scale, in vivo, sufficiently long disordered regions at the termini or in the middle of proteins can directly decrease half-life ( Figure 5). A large number of control calculations confirmed that the reported trends are independent of confounding factors, such as the cutoffs used to group the proteins, the disorder prediction method, the statistical tests used, protein abundance and length, subcellular localization, membrane proteins, and the nature of the N-terminal residue (Supplemental Results). Finally, we found that changes in the length and number of disordered segments upon gene duplication are linked with altered half-life, suggesting  Figure 4 for a description of the definitions. N SL denotes pairs where one paralog has a short and the other paralog a long N-terminal disordered segment. I n = > n+x denotes pairs where one of the two paralogs has a higher number of internal disordered segments (DI R 1). See also Tables S8A (for part A) and S8B (for part B).
that such variation can contribute to the tuning of half-life during evolution.
The Structure and Composition of the Proteasome Suggest Molecular Mechanisms to Explain the Observations The structure of the 19S regulatory particle (da Fonseca et al., 2012;Lander et al., 2012;Lasker et al., 2012) provides insights into the mechanisms by which disordered segments may increase the efficiency of proteasomal degradation and affect protein half-life. The distance between the two ubiquitin receptors, Rpn10 and Rpn13, and the ATPase unfolding channel is $70-80 Å . The essential deubiquitinating enzyme Rpn11 sits $60 Å from the ATPase ring. A terminal disordered segment of 30 residues would comfortably span these distances and could serve as a degradation initiation site. Similarly, 40 residues would be enough for an internal disordered segment to reach into the ATPase ring of the regulatory particle, even when folding back on itself. The precise distance requirements for a disordered segment to serve as an initiation site will depend on specific properties of the proteasome and the geometry and binding position of the substrate. For example, at least five different substrate receptors associate with the yeast proteasome, and some of them exhibit extensive conformational flexibility (Finley, 2009). Substrate-specific aspects that affect the distances include the state of the termini, which are frequently subject to maturation through cleavage and trimming (Lange and Overall, 2013), and properties of the polyubiquitin tag such as the linkage type (e.g., K48 and K11) and number of ubiquitin moieties (Komander and Rape, 2012), and the attachment point to the substrate (Hagai et al., 2011;Inobe et al., 2011). This could explain why, with in vivo data for thousands of different proteins, we do not observe a strict length cutoff for when disordered segments influence protein half-life: cutoffs of about 30 terminal and 40 internal disordered residues produce the largest differences between the half-lives of proteins with and without disordered segments, but shorter and longer segments also contribute to shorter half-life (Table S1E). Thus, individual proteins are likely to have specific length requirements of disordered segments that depend on a variety of factors and contribute to the range of lengths at which disordered segments decrease protein half-life on a global scale. The ATP-independent regulators PA28 and P200 can also facilitate opening of the 20S proteasome entry gate and contribute to substrate degradation (Stadtmueller and Hill, 2011). The ATPase complex p97/VCP perhaps serves as an alternative cap that directly binds the 20S core particle as well (Barthelme and Sauer, 2012). All these complexes may have different requirements for disordered segments in the substrate proteins. In fact, it has been suggested that p97/VCP may unfold proteins lacking disordered regions (Beskow et al., 2009). In vitro, the 20S proteasome core particle by itself can degrade highly disordered proteins in a process termed degradation by default and it may also be able to do so in vivo (Tsvetkov et al., 2008). The average distance between the entry pore and the proteolytic sites in the 20S core particle is $70 Å (da Fonseca et al., 2012;Lasker et al., 2012). An internal disordered segment of at least 40 residues is able to span twice this distance and thus could be cleaved by the core particle alone ( Figure S2D). Thus, proteins with disordered segments of specific length may be processed quickly due to efficient initiation of degradation as discussed.
The overall disorder degree of a substrate might further affect its half-life. Upon initiation of degradation, the proteasome may quickly degrade proteins with high overall levels of disorder, because its ATPase subunits spend less time to unfold these disordered proteins once they are engaged compared to proteins of similar length that are structured and need to be unfolded before they can be processed. Indeed, biochemical evidence suggests increasingly structured and stable substrates have higher turnover times and energy costs (Henderson et al., 2011;Peth et al., 2013).

Disordered Segments Influence Half-Life as an Intrinsic
Feature that Can Be Modulated by Other Mechanisms Although proteins with long terminal or internal disordered segments tend to have a short half-life, various factors can increase or decrease the half-life of individual proteins (see also Supplemental Discussion). For example, the presence of a highly structured N-terminal domain may shield proteins with internal disordered segments from degradation (Simister et al., 2011). Disordered proteins may also be protected by forming protein complexes or through interactions with other proteins. For instance, several specialized proteins have been shown to bind to and stabilize disordered proteins (Tsvetkov et al., 2009). Furthermore, specific low-complexity sequences or tandem repeats in the degradation initiation site can attenuate initiation or progression of degradation, thereby affecting substrate halflife (Sharipo et al., 1998;Tian et al., 2005;Zhang and Coffino, 2004). This is consistent with the idea that, although disordered segments are all similar in that they lack the ability to independently fold into a compact structure, many types of sequences exist within this definition that have different biophysical and conformational characteristics. For example, some disordered sequences are relatively globular and collapsed while others are expanded, and this is determined by the overall charge and sequence composition of the disordered region (Mao et al., 2013). Indeed, the proteasome has clear amino acid sequence preferences as degradation of model substrates that differ only in their disordered initiation regions varies over at least an order of magnitude (Fishbain et al., 2014). The broad distributions of half-lives observed in our study support this, as they reflect the combined properties of many possible subtypes of disordered segments, some of which are able to efficiently initiate degradation, while others may not.
In short, one can distinguish at least three distinct determinants of protein half-life. (1) Sequence motifs and the presence of regulatory proteins such as ubiquitin ligases contribute to the overall half-life of a protein by determining when substrates will be ubiquitinated and hence targeted to the proteasome. (2) Upon recognition of the ubiquitinated substrate by the proteasome, the presence of disordered segments of sufficient size either at the terminus or internally may facilitate efficient degradation initiation, thereby leading to lower half-life. (3) The overall degree of disorder may contribute to a general trend in lowering half-life by increasing the processivity of degradation upon engagement (after recognition and initiation) by the proteasome. Thus, our observations suggest that disordered segments influence protein half-life as an underlying factor that can be modulated by other cellular mechanisms, sequence determinants and the structural stability of the substrate.

Disordered Segments Could Influence the Dynamics and Regulation of Signaling Pathways
Disordered regions are prominent in regulatory and signaling proteins (Tables S1F, S4B, and S6C) (van der Lee et al., 2014). Since divergence in disordered segments may affect protein half-life, this could influence the response kinetics of signaling and regulatory pathways involving such proteins (see also Supplemental Discussion). In fact, among the paralogous pairs that show the largest divergence in the length of N-terminal disorder and half-life in yeast (Tables S6C and S7), there are several regulatory protein kinases such as MAP kinases (MKK1, MKK2, HOG1, and STE7), serine/threonine kinases (YPK1, YPK2, and KIN28), and cyclin-dependent kinases (PHO85 and CAK1). Paralogs that have diverged in terminal or internal disordered segments are generally enriched in nucleotide binding, kinase regulatory activity, and phosphoproteins. Alterations in the degradation rate of kinases, for instance, can have significant implications for the dynamics of signaling networks (Legewie et al., 2008;Purvis and Lahav, 2013). Such effects have been shown for the yeast kinase Ime2 (Guttmann-Raviv et al., 2002) and mouse transcription factor Hes7 (Hirata et al., 2004), where mutations in disordered segments lead to changes in protein half-life, which in turn severely deregulate signaling and development, respectively.
Our observations raise the possibility that proteins with a long terminal disordered segment might be better presented as antigens to the immune system. This is because the immunoproteasome, a variant of the canonical proteasome, may better process such proteins into peptides for presentation by the major histocompatibility complex (MHC) molecules (Groettrup et al., 2010). In line with this idea, it has been shown that (1) Nextended epitopes are efficiently processed by the immunoproteasome and serve as better substrates for antigen presentation (Cascio et al., 2001) and (2) the presence of a disordered region determines the direction of degradation, which in turn determines the spectrum of generated peptides (Berko et al., 2012). It is tempting to speculate that one could improve vaccine efficiency by adding or extending terminal disordered regions to epitope-containing proteins. Our findings also call for careful interpretation of half-life measurements made on proteins that are tagged at their termini using constructs with a varying degree of structure or intrinsic disorder (e.g., GFP, TAP tag, His-tag).

Divergence in Disordered Segments Provides a Means for Tuning Protein Half-Life during Evolution and Could Generate Phenotypic Variation
We observed that divergence in disordered regions might influence protein half-life among paralogs. An outstanding question is whether such changes in half-life through divergence of disordered segments are under selective pressure. Natural variation leading to alteration of disordered regions may provide a simple means for regulatory subfunctionalization of paralogous proteins upon gene duplication. It also suggests a mechanism for divergence of half-life among orthologous proteins between species. Thus, while it is clear that the emergence of destruction signals such as ubiquitination sites and dedicated ubiquitin ligases affect targeting of a protein for degradation, variation in disordered segments may provide a simple evolutionary mechanism for fine-tuning protein turnover rates.
Several genetic and molecular mechanisms may generate diversity in terminal or internal disordered segments. These include repeat expansion, alternative splicing, and alternative transcription start sites, all of which can influence the length of terminal and/or internal disorder of protein products, thereby potentially influencing the half-life. This idea is supported by the observation that protein disorder is common in insertions and deletions (Light et al., 2013). Furthermore, given that in multicellular eukaryotes (1) alternative transcription start sites commonly generate variation in N termini (Carninci et al., 2006) and (2) alternatively spliced exons are enriched in intrinsic disorder (Buljan et al., 2013), it is likely that such events that generate diversity in protein sequences in different cell types within an individual will have an effect on protein half-life. Similarly, given that disordered regions often contain homopolymeric repeat sequences (Tompa, 2003), and because tandem repeats in DNA sequences can lead to expansion or deletion of genetic material through strand slippage during replication (Levinson and Gutman, 1987), it is plausible that individuals in a population harbor genetic variants that code for proteins with altered length of disordered segments and thus have different half-lives. Changes in protein turnover in turn may disturb protein abundance and could lead to disease (Babu et al., 2011;Hirata et al., 2004;Yang et al., 2012), especially in the case of pleiotropic, regulatory, or signaling proteins. Thus mechanisms that generate diversity in the length or number of disordered segments could serve as a source of genetic variation that may have important phenotypic consequences.

EXPERIMENTAL PROCEDURES Protein Half-Life Data and Calculation of Disordered Segments
Protein half-life data and other data (Table S1A) were collected for yeast (Saccharomyces cerevisiae), mouse, and human. Intrinsic disorder was predicted for all reviewed protein sequences of these organisms (downloaded from UniProtKB/Swiss-Prot; http://www.uniprot.org/) using three complementary methods: DISOPRED2, IUPRED long, and PONDR VLS1. The presence and length of N-terminal, C-terminal, and internal disordered segments were then calculated using different algorithms and integrated with the halflife data. Proteomes were classified into groups according to the length of disordered segments: (1) proteins with short and long disordered termini (length cutoff 30 residues, treating the N and C termini separately) and (2) proteins with and without internal disordered segments (at least 40 disordered residues). The overall degree of disorder of a protein was calculated as the fraction of disordered residues (number of disordered residues divided by sequence length). The distributions of half-life values and protein disorder were analyzed using appropriate statistical tests.
See the Supplemental Experimental Procedures for more details.

Paralog Data and Calculations
Yeast paralog pairs were obtained from an all-against-all sequence comparison using BLASTClust . More divergent paralogs from the yeast whole-genome duplication event (Wolfe and Shields, 1997) were added to the list. To calculate the differences in half-life (DH) and N-terminal disorder length (DL) between the individual proteins in a paralog pair, DL is defined to be always positive and obtained by subtracting the N-terminal disorder length of paralog 2 from the N-terminal disorder length of paralog 1 (DL = L1 À L2; L1 R L2). To calculate DH, the order of paralogs in a pair is maintained, so that DH can be positive or negative (DH = H1 À H2). Thus, DH is negative whenever the relationship ''longer disordered N terminus = shorter half-life'' holds true. Similarly, the difference in the number of internal disordered segments DI is defined to always be positive (DI = I1 À I2; I1 R I2), and DH is calculated accordingly. Paralog pairs were separated into categories according to the divergence in N-terminal (pairs that maintained or that diverged N-terminal disorder) or internal disordered segments (pairs with an identical or different number of segments  . Description of the yeast data used in our study Figure S1B. The raw data show the non-linear relationship between the length of disordered segments and protein halflife Figure S1C. The results for N-terminal disorder are independent of the average disorder scores Figure S1D. The results are independent of the average disorder scores of the long N-terminal disordered segments Figure S1E. C-terminal disorder lengths for proteins in different half-life groups do not differ significantly as a result of the experimental design used for half-life measurements Figure S1F. Terminal disorder lengths for proteins in different half-life groups, divided along the median Figure S1G. The results are independent of protein length Figure S1H. The relationship between N-terminal disorder and half-life does not appear connected to the N-end rule Table S1A. Compendium of datasets used in our study Table S1B. Summary of boxplot parameters and significance estimates for the effects of disordered segments on protein half-life in yeast (DISOPRED2 )  Table S1C. The results are independent of the method used to predict intrinsic protein disorder -IUPred Table S1D. The results are independent of the method used to predict intrinsic protein disorder -PONDR VSL1 Table S1E. Disordered segments of different lengths influence protein half-life Table S1F. Function enrichment analysis of proteins with long N-terminal disorder, long C-terminal disorder, and long N-terminal structure Table S1G. The results are independent of protein length Table S1H. The results are independent of protein abundance Table S1I. The trends are not affected by cytoplasmic or nuclear localization Table S1J. Removal of membrane proteins does not affect the observed trends Table S1K. Using the Kolmogorov-Smirnov test to assess half-life differences yields equivalent results  Table S3A. Extended conditional probabilities for N-terminal disorder and protein half-life Table S3B. Extended conditional probabilities for internal disorder and protein half-life  Figure S2B. The results are independent of the average disorder scores of the internal disordered segments Figure S2C. Proteins with multiple internal disordered segments generally have shorter half-lives than those with fewer Figure S2D. Calculation of the minimal number of residues that may constitute an internal disordered segment that is directly cleavable by the 20S proteasome Table S4A. The results for internal disorder are independent of the length of N-terminal disorder of the proteins Table S4B. Function enrichment analysis of proteins with internal disordered segment(s) Figure S3. The effects of terminal and internal disordered segments on protein halflife are independent of the overall degree of disorder Figure S3A. The overall degree of disorder correlates with the length of disordered segments Figure S3B. Proteins with various combinations of disordered segments were matched by their overall degree of disorder Figure S3C. The combined effects of disordered segments on protein half-life are independent of the overall degree of disorder Figure S3D. Statistics for the differences between half-life distributions are comparable when assessing only proteins with similar overall disorder degree or when assessing all proteins Figure S3E. Proteins with and without disordered segments were paired based on their overall degree of disorder Figure S3F. The individual effects of disordered segments on protein half-life are independent of the overall degree of disorder Figure S3G. The number of unique proteins in pairs with and without disordered segments Figure S3H. Statistics of half-life differences of paired proteins with and without disordered segments Figure S3I. Proteins with different numbers of internal disordered segments were matched by their overall degree of disorder Figure S3J. The effects of multiple internal disordered segments on protein half-life are independent of the overall degree of disorder Figure S3K. Statistics for the differences between half-life distributions are comparable when assessing only proteins with similar overall disorder degree or when assessing all proteins Table S5. Summary of boxplot parameters and significance estimates for the effects of disordered segments on protein turnover in mouse and human  Figure S4B. Results of the paralogous protein analysis, without paralogs resulting from the ancestral yeast wholegenome duplication Table S6A. Divergence of N-terminal or internal disordered segments is linked to protein half-life changes during evolution Table S6B. Approximate permutation tests for N-terminal disorder in paralog pairs Table S6C. Function enrichment analysis of paralog pairs that diverged in N-terminal or internal disordered segments    Boxplots of the distribution of half-life values and lengths of disordered segments for the different groups of proteins in S. cerevisiae. For further information about the plot content see Figure 1 in the main text.

(A) Description of the yeast data used in our study
The histograms show, from left to right, the distributions of protein half-life (log-scale), N-terminal disorder length, Cterminal disorder length, and internal disorder length. For clarity the axes were limited to 60 (N-and C-terminal disorder length) and 100 residues (internal disorder length); several longer disordered segments do occur, but only in low frequencies. Note that multiple internal disordered regions may occur in one protein, which increases the number of data points in the internal disorder panel.
(B) The raw data show the non-linear relationship between the length of disordered segments and protein half-life Scatterplots of protein half-life versus N-terminal disorder length (left), C-terminal disorder length (middle), and the length of the longest internal disordered segment (right). Insets show the same plots, zoomed in on cluttered areas. Red dashed lines correspond to the cutoffs used for grouping the proteins into those with long and those with short disordered segments: 30 residues for both N-and C-terminal disorder, and 40 residues for internal disorder. In the N-terminal and Cterminal disorder plots, one data point above 600 residues in length was not shown to improve visualization. The same is true for one data point above 750 residues in length for the internal disorder plot.
(C) The results for N-terminal disorder are independent of the average disorder scores Proteins were divided into tertiles according to the average disorder score for the whole protein as reported by the three disorder predictors: DISOPRED2 (left), IUPred (middle) and PONDR VSL1 (right). P values (Mann-Whitney U test) shown compare the disordered group to its non-disordered counterpart within the same score tertile.

(D) The results are independent of the average disorder scores of the long N-terminal disordered segments
Proteins were divided into tertiles according to the average disorder score for the long N-terminal disordered segments as reported by the three disorder predictors: DISOPRED2 (left), IUPred (middle) and PONDR VSL1 (right). The left-most boxplot in each panel (dark green) contains proteins without the disorder type of interest, i.e. proteins without a long disordered N-terminus. P values (Mann-Whitney U test) shown compare each score tertile to the non-disordered group.

(E) C-terminal disorder lengths for proteins in different half-life groups do not differ significantly as a result of the experimental design used for half-life measurements
The nature of the experimental design used to measure protein half-life necessarily ensured that all proteins have identical C-termini (i.e. the TAP-tag) (Belle et al., 2006). Therefore, as expected, the distributions of C-terminal disorder lengths, which were predicted based on the sequence of the untagged genomic proteins, are not significantly different (P = 0.2, Kruskal-Wallis test) between different half-life groups (indicated with exponential degradation curves; from short half-life, dark green, to long half-life, light green). Outliers with C-terminal disorder length >60 residues are not shown to improve visualization. (F) Terminal disorder lengths for proteins in different half-life groups, divided along the median Plots are shown for N-terminal disorder length (left) and C-terminal disorder length (right). Proteins were divided into two half-life groups: (i) shorter than or equal to the median (≤42 minutes, short half-life), and (ii) longer than the median (>42 minutes, long half-life). Division of the proteins into two half-life groups, instead of three ( Figures 1E and S1E), does not affect the conclusions. Outliers with a length of >60 residues are not shown to improve visualization.

(G) The results are independent of protein length
Proteins were classified based on, from left to right: overall length (i.e. size), both the length of N-terminal disorder and overall length, both the length of C-terminal disorder and overall length, and both the presence of an internal disordered segment and overall length. Outliers with half-life >250 minutes are not shown to improve visualization.

(H) The relationship between N-terminal disorder and half-life does not appear connected to the N-end rule
Frequency distributions of N-terminal residues next to the initiator methionines in yeast (top-left), mouse (top-right) and human (bottom) for the entire proteome (blue), proteins with a long disordered N-terminus (dark red), and proteins with a short disordered N-terminus (light red). The N-end rule defines non-destabilizing and destabilizing (primary, secondary, and tertiary) residues. See also Supplemental Results S9. The P values shown compare the disordered group to its non-disordered counterpart within the same score tertile, and were calculated using the Mann-Whitney U test.

(B) The results are independent of the average disorder scores of the internal disordered segments
Proteins were divided into tertiles according to the average disorder score for the internal disordered segment as reported by the three disorder predictors: DISOPRED2 (left), IUPred (middle) and PONDR VSL1 (right). The left-most boxplot in each panel (dark green) contains proteins without the disorder type of interest, i.e. proteins without an internal disordered segment. The P values shown compare each score tertile to the non-disordered group, using the Mann-Whitney U test.
(C) Proteins with multiple internal disordered segments generally have shorter half-lives than those with fewer Proteins were classified based on the number of internal disordered segments (defined as continuous stretches of ≥40 disordered residues; from zero, to three or more, shown below each boxplot) as reported by disorder predictors IUPred (left) and PONDR VSL1 (right). The corresponding DISOPRED2 panel can be found in Figure 2C in the main text.

(D) Calculation of the minimal number of residues that may constitute an internal disordered segment that is directly cleavable by the 20S proteasome
The average distance from the 20S entry pore to the three proteolytic sites inside the proteasome is ~70Å. An extended polypeptide unit spans ~3.5Å. Therefore a conservative estimate of the minimum number of internal disordered residues that can be cleaved directly by the 20S proteasome is ~40-45 amino acids. Distance measurements were done on the crystal structure of the 20S yeast proteasome (PDB ID: 1RYP (Groll et al., 1997)) using the PyMOL Molecular Graphics System, Version 1.3 (Schrodinger, 2010). cerevisiae. Proteins were classified into those that have a long disordered N-terminus (N L I a ), those that have an internal disordered segment (N S I p ), those that have both a terminal and an internal disordered segment (N L I p ), and those that have neither (N S I a ). For each protein with a long N-terminal disordered segment that also has a long internal disordered segments (N L I p , 226 cases, the minority class), one protein from each of the other classes that is closest to the N L I p protein in terms of the fraction of disordered residues was selected. P values were calculated using the Wilcoxon Signed-Rank test, which is a non-parametric test for assessing difference between two paired samples. P values and median differences are also given in the top panel of Figure S3D. values are reported in the scatterplots. Disorder degree density estimates were calculated using a Gaussian kernel with a smoothing bandwidth given by Silverman's rule of thumb (~0.056 for N L I a -N S I a ; ~0.038 for N S I p -N S I a ; ~0.050 for N L I p -N S I a ). Correlations and density overlays indicate almost perfect linear correlation between the overall disorder degree of the various pairings, demonstrating that paired proteins selected using the above described method indeed have almost identical overall degrees of disorder.

(F) The individual effects of disordered segments on protein half-life are independent of the overall degree of disorder
Boxplots of the distribution of half-life values for proteins with and without disordered segments, paired on similar overall degree of disorder. Proteins were classified into those that have a long disordered N-terminus (N L I a ), those that have an internal disordered segment (N S I p ), those that have both a terminal and an internal disordered segment (N L I p ), and those that have neither (N S I a ). Each N S I a was paired with a protein from each of the other classes based on the overall degree of disorder, making N L I a -N S I a (left), N S I p -N S I a (middle), and N L I p -N S I a pairs (right). Reported P values were calculated using the Wilcoxon Signed-Rank test, which is a non-parametric test for assessing differences between two paired samples. reports the difference between the half-life medians of the compared groups.

(G) The number of unique proteins in pairs with and without disordered segments
The approach for pairing proteins with disordered segments to proteins with no disordered segments (N S I a ) based on overall disorder degree similarity can result in an N S I a protein being sampled more than once. 'Pairs' shows the total number of pairs and equals to the number of proteins from the class that has least proteins (in all pairings this is the class with disordered segments). 'Unique N S I a proteins' shows the number of unique proteins without disordered segments (N S I a ); the difference with the 'Pairs' column indicates the number of pairs containing N S I a proteins that have been sampled more than once (i.e. in multiple pairs). 'Maximal occurrences of non-unique N S I a proteins' shows how often the most abundantly sampled N S I a protein occurs and is an indication of the number of duplicate data points resulting from the pairing based on disorder degree similarity.

(H) Statistics of half-life differences of paired proteins with and without disordered segments
Protein pairs containing a protein without disordered segments (N S I a ) that occurs in multiple pairs were randomly removed until individual proteins occur only once. The procedure was repeated 1000 times to assess the robustness of the results. 'Pairs with unique proteins' shows the number of proteins pairs that remain after removing pairs containing nonunique N S I a proteins. Effect sizes (difference in median half-life) and P values (Wilcoxon Signed-Rank test) were calculated for the differences between half-life distributions of proteins with and without disordered segments (comparing N L I a with N S I a , top row; N S I p with N S I a , middle row; and N L I p with N S I a , bottom row). 'Original' columns show the effect sizes and P values for the full paired dataset (i.e. including non-unique N S I a proteins; Figure S3F). 'Median', 'Mean', 'Min', and 'Max' describe the effect size and P value distributions resulting from the 1000 randomizations. (I) Proteins with different numbers of internal disordered segments were matched by their overall degree of disorder Scatterplots (bottom-left triangle) and polygon overlays of the kernel density estimates (top-right triangle) of the overall disorder degree of pairs of proteins with different numbers of internal disordered segments (from zero to three or more, indicated in the diagonal squares). Pearson r statistics and P values are reported in the scatterplots. Disorder degree density estimates were calculated using a Gaussian kernel with a smoothing bandwidth given by Silverman's rule of thumb (~0.045 for all combinations) and were normalized to 1. The blue and red polygons correspond to the densities of the protein class to the left and to the bottom side of the plot, respectively. Correlations and density overlays indicate almost perfect linear correlation between the overall disorder degree of pairs of proteins from different classes, demonstrating that the matched proteins indeed have almost identical overall degree of disorder. (J) The effects of multiple internal disordered segments on protein half-life are independent of the overall degree of disorder Boxplots of the distribution of half-life values for proteins with different numbers of internal disordered segments (from zero to three or more) in S. cerevisiae. For each protein with three or more internal disordered segments (203 proteins, the minority class), one protein from each of the other classes that is closest in terms of the fraction of disordered residues was selected. P values were calculated using the Wilcoxon Signed-Rank test, which is a non-parametric test for assessing difference between two paired samples. P values and median differences are also given in the top panel of Figure S3K. (K) Statistics for the differences between half-life distributions are comparable when assessing only proteins with similar overall disorder degree or when assessing all proteins P values (bottom-left triangles) and absolute differences in half-life medians (effect size; top-right triangles) between half-life distributions of groups of proteins with different numbers of internal disordered segments. The top panel shows statistics based on matched proteins with similar overall degrees of disorder. Here, P values were calculated using the Wilcoxon Signed-Rank test, which is a non-parametric test for assessing difference between two paired samples. The bottom panel shows statistics based on all proteins. Here, P values were calculated using the Mann-Whitney U test.   Figure 4 in the main text. The number of paralogous protein pairs resulting from the ancestral yeast whole genome duplication alone and for which we have halflife and disorder data was too small to make meaningful comparisons of their ΔH and ΔL. Therefore, an analysis of whole-genome paralogs alone could not be performed. Tables   Table S1, Related to Figures 1 and 2   Table S1A. Compendium of datasets used in our study Type of information [source] Description of the method used to obtain the data Disorder predictions (Dosztanyi et al., 2005;Obradovic et al., 2005;Ward et al., 2004) The disorder status of every residue in the yeast, mouse, and human proteomes was inferred using the DISOPRED2, IUPRED, and PONDR VLS1 predictors.

Protein half-life
Yeast (Belle et al., 2006) In vivo protein half-lives were determined by first inhibiting protein synthesis with the antibiotic cycloheximide and then monitoring the abundance of each C-terminally TAP-tagged protein in the yeast genome by quantitative Western blotting at three time points. (Schwanhausser et al., 2011) In vivo protein half-lives in NIH3T3 mouse fibroblasts were derived from the ratio between the heavy and light peptides, measured using mass spectrometry at different time points after the transfer of cells from light to heavy medium.

Mouse
Human (Kristensen et al., 2013) In vivo relative degradation rates in human THP-1 myelomonocytic leukemia cells, under conditions that stimulate cell proliferation, were determined using a similar SILAC MS approach as in mouse, above. Wolfe and Shields, 1997) A list of paralogous proteins in yeast was obtained from an all-against-all protein sequence comparison followed by clustering using the program BLASTClust. This list was supplemented with more divergent pairs of paralogs that arose from the whole genome duplication event in yeast.

Degradation signals Ubiquitination (Uniprot-Consortium, 2011)
Experimentally determined ubiquitination sites were obtained from UniProtKB/Swiss-Prot. (Liu et al., 2012;Pfleger and Kirschner, 2000) KEN box and destruction box motifs were predicted using GPS-ARM 1.0. (Rice et al., 2000;Rogers et al., 1986) PEST regions were predicted using epestfind with default parameters, as included in EMBOSS 6.5.7. (Bachmair et al., 1986) Frequencies of amino acids in the second N-terminal residue (after removal of the initiator methionine) were calculated for the yeast, mouse and human proteomes. (Davey et al., 2010;Neduva et al., 2005) SLiMFinder and DiliMot were used to analyze the amino acid sequences of disordered segments of proteins with short half-life for shared sequence patterns that potentially facilitate rapid degradation. (Newman et al., 2006) Protein levels during log-phase growth of yeast were obtained by flow cytometry measurements of GFP-tagged strains. (Christie et al., 2004) Subcellular localization information (cytoplasm and nucleus) was obtained from the Saccharomyces Genome Database. Melen et al., 2003;Osterberg et al., 2006) Membrane protein topologies were previously determined by measurements of the location of the C-terminus (cytosolic or extracellular, determined using tagged constructs), which was used to constrain topology predictions by TMHMM.

Membrane proteins
Median values ( ) ± confidence interval (C.I. = , where IQR is the interquartile range and 'n' the group sample size) are reported for different groups of proteins. The IQR is calculated as the difference between the data points at the ×0.75 and ×0.25 quartiles. P values for the differences in the distributions of half-life values between the different groups ('long' versus 'short' and 'yes' versus 'no') were calculated using the Mann-Whitney U test and are reported in parentheses.

Table S1B. Summary of boxplot parameters and significance estimates for the effects of disordered segments on protein half-life in yeast (DISOPRED2)
Related to Figures 1B, 1C and 2A.

Short (≤30 min; H S ) Medium (31-70 min; H M ) Long (>70 min; H L ) Total
The relatively higher value suggests that proteins with long N-terminal disorder are likely to be rapidly degraded.
(i) long N-terminal disorder among those that have long half-life: The low value suggests that proteins that have a long halflife are unlikely to have long N-terminal disorder.
(ii) long half-life among those that have long N-terminal disorder: This suggests that proteins with long N-terminal disorder do not usually have a long half-life, indicating that such proteins are often rapidly degraded.
(i) short N-terminal disorder among those that have short half-life: This suggests that proteins with short N-terminal disorder can still be actively degraded by other mechanisms.
(ii) short half-life among those that have short N-terminal disorder: This suggests that proteins with short N-terminal disorder usually do not have short half-lives.
(i) short N-terminal disorder among those that have long half-life: The high value suggests that most long-lived proteins have short N-terminal disorder.
(ii) long half-life among those that have short N-terminal disorder: This suggests that proteins with short N-terminal disorder do not necessarily have long half-life, indicating that they can still be degraded by other mechanisms.     For information about the table content see Table S1 above.  Figure 2B.

Table S5. Summary of boxplot parameters and significance estimates for the effects of disordered segments on protein turnover in mouse and human
Note that the scale for protein half-life is in hours for mouse, rather than minutes as in yeast. Mouse values are half-lives, while human values are relative degradation rates. Thus, values are reversed for the human data: proteins with a short half-life have a high relative degradation rate and the other way around.

Extended conditional probabilities for N-terminal disorder and protein half-life in pairs of paralogs
The relatively low value suggests that regulatory mechanisms other than divergence in N-terminal disorder control the turnover of individual proteins after gene duplication.

Results S1. The effects of disordered segments on half-life are independent of the overall disorder degree (extended)
The fraction of disordered residues (i.e. overall degree of disorder), which is an estimate of the packing, folding and structural stability of a protein, by itself correlates with protein half-life, although previous studies disagree on the extent of the effect (Gsponer et al., 2008;Tompa et al., 2008;Yen et al., 2008). Proteins with a greater overall disorder degree generally contain longer terminal and internal disordered segments ( Figure S3A). To determine whether the effects of continuous stretches of disordered residues (i.e. disordered segments) on protein turnover (Figures 1 and 2) are independent of the overall degree of disorder, we matched proteins that have a similar fraction of disordered residues but have varying combinations of disordered segments (long or short N-terminal disorder and/or presence or absence of internal disordered segments; Figure S3B).
For each protein with a long N-terminal disordered segment that also has a long internal disordered segments (N L I p , 226 cases, the minority class) we selected one protein from each of the other classes that is closest to the N L I p protein in terms of the overall disorder degree (i.e. fraction of disordered residues across the full protein). This yields combinations of proteins from different classes (N L I p -N S I p -N L I a -N S I a ) with almost identical overall degrees of disorder ( Figure S3B). R values between the overall disorder degree of pairs of proteins from various classes are ~1 (almost perfect linear correlation), demonstrating that paired proteins selected using the above method indeed have almost identical overall degrees of disorder. Comparison of the half-lives of proteins from different classes with similar overall disorder degrees ( Figure S3C) reveals similar trends as the analysis that uses all proteins ( Figure 2D). Proteins with both long N-terminal and internal disordered segments (N L I p ) typically have the shortest half-lives, independent of whether all proteins are considered, or only proteins with highly similar overall disorder degree. Then come proteins with either long internal (N S I p ), or long N-terminal disordered segments (N L I a ). Proteins with no long disordered segments (N S I a ) typically have the longest half-lives.
The effect sizes of the differences between the half-life distributions are comparable when proteins are grouped based on the overall degree of disorder or when all proteins are considered ( Figure S3D, upper triangles). Furthermore, most halflife distributions are significantly different, though P values for proteins grouped by overall disorder degree are less significant than when using all proteins due to smaller sample sizes ( Figure S3D, lower triangles). Alternative comparison of proteins by the number of internal disordered segments (Figures S3I, S3J and S3K), and divided into structural classes based on the scores of the various disorder predictors (Figures S1C, S1D, S2A and S2B) confirms that the observed effects are independent of the overall degree of disorder. These results indicate that long disordered segments (continuous stretches of disordered residues) at the N-terminus or internally contribute to shorter protein halflife in living cells, and that this effect is independent of the fraction of disordered residues across the whole protein. It should however be noted that this does not rule out an additional effect of the overall degree of disorder on protein halflife, i.e. among proteins that do or do not have a disordered segment, proteins with higher degrees of overall disorder tend to have a lower half-life compared to those with a lower degree of disorder (see Discussion in the main text).

Comparing individual classes of proteins with disordered segments
We also compared the individual classes of proteins with disordered segments (N L I p , N S I p , or N L I a ) to proteins without any such segments (N S I a ). For that, we paired each N S I a protein with a protein from each of the other classes that has the closest overall degree of disorder (making N L I a -N S I a , N S I p -N S I a , and N L I p -N S I a pairs, Figure S3E and S3F). This is different from the approach above, which makes combinations of N L I p -N S I p -N L I a -N S I a proteins. We then compared the half-lives. One confounding factor in the approach for pairing proteins with disordered segments to proteins with no disordered segments (N S I a ) based on the overall disorder degree, is that a N S I a protein can be sampled more than once if it happens to be closest in terms of overall disorder degree to multiple proteins with disordered segments. This effect is most pronounced in the N S I p -N S I a and N L I p -N S I a pairs ( Figure S3G), but does not seem to be a problem since many N S I a proteins occur few (more than once, but not very often) times, rather than there being very few N S I a proteins that account for very many of the data points. E.g. in the N S I p -N S I a pairs, 10.2% of the non-unique proteins account for 34.6% of all data points, and in the N L I p -N S I a pairs, 9.6% of the non-unique proteins account for 27.8% of all data points. None of the three pairings have a single or a couple of proteins that are present an extreme number of times ( Figure S3G, 'Maximal occurrences' column).
Nevertheless, to control for potential biases caused by having duplicate data points in the group of proteins with no disordered segment, we randomly removed protein pairs containing a multiple-occurring N S I a protein, until individual proteins occur only once. We then calculated effect sizes and statistical differences between half-life distributions of proteins with and without disordered segments as before. The procedure was repeated 1000 times to estimate the robustness of the results. The means and medians of the resulting effect sizes (difference in median half-life) are close to the original values for N L I a -N S I a and N L I p -N S I a (Figure S3H). The effect sizes for the N S I p -N S I a comparison differ a lot more between the analysis that includes multiple-occurring N S I a proteins (Figure S3F), and the analysis that excludes such duplicates (Figure S3H): not a single of the 1000 N S I p -N S I a sets results in an effect size that is as big (28 minutes) as the set that includes duplicates. This has likely to do with the larger number of duplicates in the N S I p -N S I a comparison than in N L I a -N S I a and N L I p -N S I a (Figure S3G). Importantly, however, the effect sizes of the comparisons without duplicates are in all cases very similar to the values when using all data, i.e. not combined by disorder degree similarity (minimally 8.5 min. for N L I a -N S I a , 14 min. for N S I p -N S I a , 27 min. for N L I p -N S I a - Figure S3H -compared to the upper triangle of the 'All proteins' part of Figure S3D -7, 15, and 22 min., respectively).
The P values are generally less significant due to a reduction in the number of data points ( Figure S3H). About three quarters of the 1000 generated N S I a sets for the N L I a -N S I a pairing have significantly higher half-life distributions than the corresponding N L I a set at a confidence level of 5%. The smallest effect size is still 8.5 minutes, which is substantial in the context of the yeast cell cycle and comparable to the difference found in the original comparison of half-lives between proteins with long and short N-terminal disordered segments (Figure 1, 9 minutes). Furthermore, all 1000 tested N S I a sets in the N S I p -N S I a and N L I p -N S I a pairings have significantly longer half-lives compared to the paired proteins with a disordered segment, with large effect sizes ( Figure S3H). Together, these results again indicate that presence of disordered segments leads to shorter protein half-life and that this is likely to be independent of the overall degree of disorder.
Our approach for selecting protein pairs with similar overall disorder degree made sure that the differences between the overall disorder degrees of the individual proteins are fully controlled for (Figure S3B and S3E). We did however ask how often, within the protein pairs with similar overall disorder degree, the protein with a disordered segment (N L I p , N S I p , or N L I a ) has the highest fraction of disordered residues and the protein without a disordered segment (N S I a ) the lowest, because if the proteins with a disordered segment always have the higher overall disorder degree, then overall disorder degree might still be the cause of their shorter half-life (even though this is highly unlikely due to the negligible differences in overall disorder degree between protein pairs). In all three pairings, the percentage of pairs where the protein with a disordered segment has the higher overall disorder value compared to the N S I a protein is close to 50%, being 44.7% (N L I a ), 48.1% (N S I p ), and 53.5% (N L I p ). Therefore, we can discard biases towards higher overall disorder degree values in proteins with disordered segment as a possible reason for why these proteins typically have shorter halflives compared to proteins without such segments.

Results S2. The results are independent of known degradation signals
Aside from ubiquitination, several signals have been shown to regulate protein degradation. For example, short sequence motifs, such as the destruction box and the KEN box act as recognition surfaces for ubiquitin ligases and thereby signal for the destruction of specific proteins, mostly involved in the cell cycle (Pfleger and Kirschner, 2000). Another feature of proteins that has been proposed to lead to rapid degradation is the presence of regions that are enriched in proline, glutamic acid, serine, and threonine residues (PEST regions), although the mechanism is unknown (Rogers et al., 1986).
To investigate if the effects of disordered segments on protein half-life could be explained by the presence of such destruction signals, we collected data describing the presence of four known signals: experimentally determined ubiquitination sites, KEN box motifs, destruction box motifs, and PEST sequence regions. Experimentally determined ubiquitination sites were obtained from UniProtKB/Swiss-Prot release 2011_04 (Uniprot-Consortium, 2011). PEST regions were predicted using epestfind with default parameters, as included in EMBOSS 6.5.7 (Rice et al., 2000). Matches marked as "potential" were included, while lowest-confidence "poor" matches were excluded. KEN box and destruction box motifs were predicted using GPS-ARM 1.0 with default parameters (Liu et al., 2012). When determining overlap between a motif and a disordered region, partial overlap was considered sufficient.
Only 13% of experimentally determined yeast ubiquitination sites fall into the long terminal or internal disordered segments examined in our study. Similarly, only 28% of predicted KEN box motifs, 24% of destruction box motifs, and 45% of PEST sequences fall into these disordered segments. Furthermore, more than half (56% of all proteins, 54% of proteins with short half-life) of the proteins with long terminal or internal disordered segments do not contain a single destruction signal (experimentally determined ubiquitination sites, KEN box, destruction box, or PEST sequence) in these disordered segments. This number is conservative since many of the predicted motifs will not be biologically relevant, and because it describes the presence of any of these destruction signals. The fraction of proteins with no individual destruction signals in the long terminal or internal disordered segments is much higher: 99% for ubiquitination sites, 93% for KEN box, 90% for destruction box, and 66% for PEST regions. Consistent with this, the distributions of half-lives of proteins with and without predicted destruction signals within the disordered regions were not significantly different (P = 0.1 for N-terminal disorder; P = 0.2 for internal disorder; Mann-Whitney U test; data not shown). Finally, the probability that a protein has a short half-life, given that it has a long terminal or internal disordered segment is similar regardless of whether we consider the presence of destruction signals in the whole protein (0.39) or not (0.43, Table 1C). These results indicate that many disordered segments examined in our study contain no predicted destruction motifs or experimentally determined ubiquitination sites that could explain the effects of these segments on protein halflife. Thus, the short half-life of proteins with long disordered segments is likely due to the direct effects of these segments on proteasomal degradation, rather than due to indirect effects by incorporating destruction signals.
Results S3. Disordered segments of proteins with short half-life lack enriched, uncharacterized sequence motifs that could explain the rapid degradation We have shown that the effects of disordered segments on protein half-life are unlikely to result from the presence of known destruction signals (ubiquitination sites, KEN box, destruction box, or PEST sequence) in these regions (Supplemental Results S2). Another possible explanation for the short half-life of proteins with long disordered segments is that these segments are enriched for uncharacterized sequence patterns such as short linear motifs (SLiMs)  that confer susceptibility to rapid degradation. These hypothetical motifs could represent novel protein degradation biology as they might facilitate for example interaction with the proteasome or serve as docking motifs for ubiquitin ligases, leading to faster turnover. To discover such uncharacterized motifs that might be responsible for the effects of disordered regions on protein degradation, we used SLiMFinder (Davey et al., 2010) and DiliMot (Neduva et al., 2005) to analyze the amino acid sequences of long N-terminal disordered segments (210 sequences in 210 proteins) and long internal disordered segments (999 sequences in 564 proteins) of proteins with short half-life (half-life ≤30 minutes).
SLiMFinder uses BLAST  to identify short linear motifs that are shared by unrelated proteins. We used SLiMFinder version 4.5 with the following settings (parameters that are not reported were set with default values): -dismask=F. We chose not to mask structured regions as we preferred to rely on the definitions of disordered and structured protein regions by DISOPRED2 used in our other analyses (Supplemental Experimental Procedures), which were already used to select sequences in which to search for over-represented motifs (i.e. only the sequences of long disordered regions of proteins with short half-lives were searched). -ftmask=F. The whole sequence corresponding to a disordered region was considered for searching motifs, rather than masking parts of the sequences that correspond to annotated uniprot features such as transmembrane helices and protein domains. -compmask=5,8. Prevents the detection of low-complexity repeat-like motifs, such as poly-Q stretches, which are common in disordered regions (Jorda et al., 2010;Simon and Hancock, 2009). -metmask=T/F. This mask is activated for the detection of motifs in N-terminal disordered regions, because artefactual motifs starting with a methionine were reported otherwise. The mask is disabled, however, for internal disordered sequences. -consmask=F. Less conserved parts of the sequences were not masked as we considered the whole sequence corresponding to a disordered segment for searching motifs. -posmask=F. We have no reason to assume over-representation of certain position-specific amino acids. For example, we found no enrichment of alanines after the N-terminal initiator methionine in the N-terminal disordered sequences analyzed for motifs. -slimlen=10. Annotated instances of short linear motifs are usually 3-10 amino acids long Dinkel et al., 2012). -maxwild=3. The vast majority (>90%) of consecutive wildcard positions in definitions of known motif classes in the ELM database  are up to three residues in length.
The motif KR.. [DE] occurs 27 times in 201 sequence clusters of long N-terminal disordered segments from proteins with short half-life (P corrected = 3.9 × 10 -2 , enrichment after Bonferroni-like correction for testing multiple motifs). SLiMFinder also detects 25 occurrences of the overlapping motif L.{0,1}KR (which itself is not significantly enriched, P corrected = 5.7 × 10 -2 ). Together, these two motifs are present in 39 of 210 (~19%) long N-terminal disordered segments of short-lived proteins, which means that the majority of such regions (more than 80%) do not contain the enriched motifs.
Long internal disordered segments from proteins with short half-life are enriched for several groups of motifs composed of short and simple overlapping instances (typically three residues in length with all positions defined, i.e. no wildcard positions). They contain largely positively charged motifs (e.g. KRK), serine/proline-rich motifs (e.g. SLP), and several singular motifs such as the negatively charged DEE motif. These motifs are in agreement with the general sequence preferences of disordered regions: enrichment for charged and polar amino acids and depletion of hydrophobic amino acids (Romero et al., 2001). Thus, these motifs seem to reflect general sequence characteristics of protein disorder rather than being sequences that for example regulate specific interactions between the proteasome and its substrates. Furthermore, combined, the enriched motifs are present in the internal disordered segments of 462 of 999 proteins with short half-life (~46%), which again shows that most short-lived proteins do not even contain the enriched motifs in their disordered segments.
DiliMot (Neduva et al., 2005) did not detect any enriched motif for either long N-terminal or internal disordered sequences of short-lived proteins. DiliMot was set to detect motifs that are fixed in at least two positions (L parameter), could be up to 10 residues in length (W parameter), and occur in at least three of the sequences searched. We tried various combinations of settings: removing or keeping parts of the sequence that (i) overlap with known domains and (ii) show similarity with other sequences in the set, using or not using information on evolutionary conservation based on (i) only other yeast species or (ii) all available species including species that are distant from Saccharomyces cerevisiae such as human and mouse.
Taken together, sequence analysis indicates that the majority of long disordered segments from proteins with short halflives lack enriched, uncharacterized sequence motifs that could facilitate degradation. Furthermore, even the identified motifs that are enriched in the disordered sequences of short-lived proteins are unlikely to represent uncharacterized degradation motifs but rather reflect the general sequence properties of disordered regions. However, different subtypes of protein disorder exist, that could each have a different effect of protein half-life: some types might be able to interact with the proteasome to speed up degradation, while others might not (see Discussion in the main text). The broad distributions of half-lives observed in our study support this idea as they reflect the combined properties of many possible subtypes of disordered segments, some of which are able to efficiently initiate degradation, while others may not.

Results S4. Paralogous pairs with a negative half-life change have larger divergence in the length of N-terminal disorder
Paralogous proteins pairs that during evolution diverged in the length of the N-terminal disordered segments generally show changes in half-life in a manner that is in agreement with the relationship that is reported in the main text: the protein of a paralogous pair with longer N-terminal disorder usually has a shorter half-life compared with its paralog (Figure 4B).
We also calculated, for every pair of paralogs, the difference in the half-life of the proteins. Pairs where the paralog with the longer disordered N-terminus has the shorter half-life of the two are assigned to one group (ΔH ≤0 minutes), whereas pairs where the paralog with the longer disordered N-terminus has the longer half-life (ΔH >0 minutes) are assigned to the other group. Consistently, we find that paralogous pairs with a negative half-life change show significantly larger divergence in the length of N-terminal disorder (P = 1 × 10 -5 , Mann-Whitney U test, Figure S4A). This means that if the half-life of two paralogous proteins differs in a manner that is in agreement with the previous observations (i.e. longer disordered terminus, shorter half-life), then the changes in the length of N-terminal disorder are generally much bigger compared to paralogs that differ in their half-lives the other way around. Thus, it appears that divergence in N-terminal disorder does indeed result in a change in half-life of paralogous proteins in a manner consistent with what is reported in the main text.

Results S5. The reported trends are independent of confounding factors
We performed a number of control calculations to ensure that the observations are independent of confounding factors. The findings are independent of the method used to predict intrinsic disorder in proteins: the DISOPRED2 calculations were repeated using two alternative methods, IUPred and PONDR VSL1, which employ distinct prediction strategies and gave consistent results (Supplemental Experimental Procedures, Figures S1C, S1D and S2A-C, Tables S1B-D). The conclusions are also independent of different criteria and cutoffs used to group the proteins, including the overall degree of disorder (Supplemental Results S1 and Figure S3), the average disorder scores both for the entire protein ( Figures  S1C and S2A) and for the disordered regions alone (Figures S1D and S2B), and the cutoffs used for detecting terminal and internal disordered segments (Table S1E) and different half-life groups (Figures 1E, S1E and S1F). Furthermore, outlier half-life values (Supplemental Results S6), protein length (Supplemental Results S7, Figure S1G, Table S1G), protein abundance (Table S1H), subcellular localization (cytoplasm versus nucleus, Table S1I), and the removal of membrane proteins, which may be degraded in a proteasome-independent manner (Supplemental Results S8 and Table  S1J), did not affect the observed trends. The results are independent of known degradation signals (Supplemental Results S2) and uncharacterized sequence motifs that could facilitate degradation (Supplemental Results S3). An analysis of residues following the initiator methionine indicates that the nature of the N-terminal residue does not account for the global difference in half-life between proteins with long or short N-terminal disorder (Supplemental Results S9 and Figure S1H).
The observations on paralogs in yeast are similar if we do not consider paralogs that originated from the ancestral whole genome duplication ( Figure S4B). Moreover, the results on paralogous pairs are robust in different kinds of permutation tests (Table S6B and Supplemental Experimental Procedures). We could not perform the paralog analysis in mouse or human because the mass spectrometry strategy used for measuring half-life is unable to differentiate between similar proteins with identical short peptide regions, such as paralogs and different splice forms (Supplemental Experimental Procedures). Finally, though the distributions in our analyses are broad and overlap, most differences are significant with both the Mann-Whitney U test and the Kolmogorov-Smirnov test (Supplemental Experimental Procedures and Table S1K), which are two distinct non-parametric statistical tests for evaluating whether two samples of observations come from the same distribution or not. Thus, the reported trends on protein half-life appear attributable to the presence and number of sufficiently long terminal and internal disordered segments.

Results S6. Highly stable proteins with undetermined or outlier half-life are generally less disordered
We discarded 366 proteins with a half-life of exactly 300 minutes, as the original paper (Belle et al., 2006) assigned this value to stable proteins for which degradation curves could not be fitted by an exponential decay function and thus halflife could not be determined. Moreover, to make sure that clear outliers would not affect the statistics, we removed seven proteins with extremely long half-lives of >6000 minutes from the data.
To make sure that removal of these 373 proteins did not bias our analyses, we investigated the presence and length of disordered segments at various locations in these 'highly stable' proteins as well as their overall degree of disorder. We found that: 1. The removed, highly stable proteins contain significantly less disorder than the 3273 proteins in our main dataset: they tend to have shorter N-terminal disordered segments, less total internal disorder, and be more structured overall (P = 2 × 10 -2 , P = 5 × 10 -8 , P = 3 × 10 -6 , respectively, Mann-Whitney U tests, data not shown). 2. The discarded, stable proteins less often have a long (>30 residues) N-terminal disordered segment (51/373=13.7%) than proteins in our main dataset (479/3273=14.6%), although this difference is not statistically significant (odds ratio = 0.94; P = 0.7, chi-squared test). 3. There is a significant difference in the number of proteins that have a long internal disordered segment (94/373=25% of discarded proteins, 1260/3273=38% of included proteins; odds ratio = 0.66; P = 6 × 10 -7 , chisquared test). 4. As mentioned in the main text, the experimental method for measuring protein half-lives involved C-terminal tagging with a TAP-tag (Belle et al., 2006). As a result, proteins with long and short C-terminal disordered segments display similar distributions of protein half-life ( Figure 1C). The distribution of lengths of the disordered segments at the C-terminus as characterized from the original genome sequence is highly similar for proteins in our main dataset and for discarded highly stable proteins (P = 0.3, Mann-Whitney U tests, data not shown), which falls in line with the idea that the TAP-tag shields the contribution of the disordered segment at the C-terminus to half-life.
In short, the 373 highly stable proteins that we discarded in our main analyses generally contain shorter and less disordered segments, which, in line with the findings from our study, corresponds to them having long half-lives. Thus, removal of these highly stable proteins with long half-lives does not bias the results of our analyses, but rather strengthens our conclusions.

Results S7. The results are independent of protein length
Size is one of the determinants of the in vivo degradation rate of a protein, with large proteins being degraded more quickly than smaller ones (Dice et al., 1973). This observation has been confirmed by analyses of large-scale protein half-life data (Belle et al., 2006;Tompa et al., 2008). To ensure that our observations regarding the presence of disordered segments and protein half-life are not influenced by protein length, we classified the proteome into three groups of roughly equal size: (i) small proteins (≤350 residues), (ii) medium size proteins (351-600 residues), and (iii) large proteins (>600 residues). Indeed, large proteins have a significantly shorter half-life than small proteins ( Figure  S1G). For each length group, we further divided the proteins into those that contain a long or short N-terminal disordered segment, a long or short C-terminal disordered segment, or an internal disordered segment. Where sample size is sufficient, we find that, regardless of protein size, the distribution of half-life values for proteins with long N-terminal disorder is significantly smaller than that of proteins with short N-terminal disorder ( Figure S1G and Table S1G). There is generally no difference in half-life between proteins with long or short C-terminal disorder ( Figure S1G and Table  S1G). An exception to this appears to be the group of small proteins, although the number of small proteins with a long disordered C-terminus is small (92 cases). Finally, proteins that contain an internal disordered segment have a significantly shorter half-life, irrespective of their length ( Figure S1G and Table S1G). These results suggest that our observations are independent of protein length.

Results S8. Degradation of membrane proteins
The majority of transmembrane proteins in eukaryotic cells are degraded in the lumen of lysosomes, independent of the proteasome (Piper and Katzmann, 2007;Raiborg and Stenmark, 2009). Ubiquitin functions as the signal that specifies which membrane proteins should be degraded. Following ubiquitination, endocytosis brings membrane proteins inside the cell into early endosomes. Subsequently, a variety of protein sorting machines, such as the endosomal sorting complex required for transport (ESCRT), sort the ubiquitin-flagged proteins to multivesicular endosomes (MVEs) or bodies (MVBs). These then fuse with a lysosome, where proteases in the acidic lumen digest the vesicles. It should be noted that proteasome-mediated degradation of membrane proteins does occur to some extent through the process of endoplasmic reticulum-associated degradation (ERAD) (Meusser et al., 2005;Vembar and Brodsky, 2008). However, this mechanism seems to apply mainly to damaged or misfolded membrane proteins and does not seem to be responsible for degradation of the majority.
To ensure that our observations regarding the effects of disordered segments on protein half-life do indeed apply to degradation by the proteasome, we performed control calculations on datasets that lack membrane proteins. We obtained a set of yeast membrane proteins from Österberg et al. , subtracted these proteins (255) from our original dataset and redid the analysis. We find that the membrane proteins, which are degraded mainly in a proteasomeindependent manner, do not influence our conclusions (Table S1J).
Even though the effects of disordered segments on protein half-life are primarily mediated through interaction with the proteasome, which is not the main route for degradation of transmembrane proteins, we were still interested to look into a potential connection between disordered regions in membrane proteins and their turnover rates. Therefore we asked: Do membrane proteins with long terminal or internal disordered segments in their cytosolic loops have shorter half-lives? To answer this question, we obtained information on membrane protein topology from Kim et al. (Kim et al., 2006). In that study, the location of the C-terminus of multiple-spanning (two or more transmembrane helices) membrane proteins in yeast was determined using C-terminally tagged constructs. Cytosolic or extracellular location of the C-terminus was used to constrain topology predictions by TMHMM (Melen et al., 2003). We inferred the disorder status of every residue in the predicted cytosolic loops and integrated this information with the half-life data (220 proteins in total). We then grouped proteins as in the original analyses: (i) by the length of the disordered termini if these are present on the cytosolic side (short, ≤30 residues; long, >30 residues), treating the N-and C-termini separately, and (ii) by the presence or absence of long internal disordered segments (at least 40 residues) in the cytosolic loops.
Membrane proteins with a cytosolic, long disordered N-terminus (27 in total) have a significantly shorter half-life compared to the group with no cytosolic N-terminus, or with a cytosolic, short disordered N-terminus (P = 7 × 10 -3 , Mann-Whitney U test, data not shown). Half-lives of membrane proteins with cytosol-localized long disordered segments at the C-terminus (which is tagged with a TAP-tag for the half-life measurements) are similar to half-lives of membrane proteins with no cytosolic C-terminus, or with a short disordered segment at the C-terminus (P = 0.3, Mann-Whitney U test, data not shown). These results are similar to the observed effects of terminal disordered segments on half-life for non-membrane proteins. However, proteins with one or more cytosol-localized long internal disordered segment (15 in total) have similar half-lives to proteins without such segments (P = 0.76, Mann-Whitney U test, data not shown). This is in contrast to the typically strongly decreased half-life of proteins with internal disordered segments for non-membrane proteins.
As discussed above, these results might be less biologically meaningful due to the established difference in mechanisms by which membranous and intracellular proteins are degraded. It should also be noted that the numbers of membrane proteins in each 'disorder' category are very small (27, 28 and 15), and much smaller than the numbers for nonmembrane proteins. Furthermore, the disorder prediction algorithms that are the basis for calculating the presence of disordered segments have been developed for cytosolic proteins and are biased towards the amino acid composition of such proteins. Membrane proteins have different sequence compositions, which means that the confidence in the identified cytosolic disordered segments is lower. Thus far, no predictor has been developed specifically for identifying structural disorder in the loops of membrane proteins. Taking these points into account, the current observations could mean that membrane proteins with long terminal disordered regions might be more susceptibility to degradation, though these effects are not seen for internal disordered segments.

Results S9. The relationship between N-terminal disorder and half-life does not appear connected to the N-end rule
One mechanism regulating protein stability is the N-end rule, which links the identity of the N-terminal residue of a protein to its half-life. According to the N-end rule, a protein is stable if the exposed N-terminal residue is a small amino acid and unstable if it is large and bulky (Varshavsky, 2011). To establish whether the difference in half-life between proteins with long or short N-terminal disorder could be explained by differences in degradation dynamics due to the Nend rule, we compared the frequencies of amino acids in the second N-terminal residue (after removal of the initiator methionine by methionine aminopeptidases) in proteins with long or short disordered N-termini in yeast, mouse and human ( Figure S1H). We also calculated the frequency of each amino acid in the second N-terminal residue for the entire proteomes.
In all analyzed organisms, the distributions of destabilizing amino acid frequencies (primary, secondary, or tertiary destabilizing, according to the N-end rule, see Figure S1H) are not significantly different between proteins with long Nterminal disordered segments and the entire proteomes (P = 0.2 in yeast, P = 0.5 in mouse, P = 0.1 in human, chi-squared test). The same is true in yeast and mouse for the distributions of destabilizing amino acid frequencies between the groups of proteins with long or short N-terminal disorder (P = 0.1 in yeast, P = 0.2 in mouse, chi-squared test). In human, the distributions of destabilizing amino acid frequencies are different between proteins with long or short N-terminal disorder (P = 1 × 10 -2 ). It is not the case, however, that all types of destabilizing N-terminal amino acids are more common in human proteins with long N-terminal disordered regions, and thus account for their shorter half-life. In fact, several destabilizing residue types are more common in proteins with short N-terminal disorder ( Figure S1H). These results indicate that the N-end rule does not account for the global differences in half-life among proteins with long or short N-terminal disorder.
It should be noted that this analysis makes the simplifying assumption that the initiator methionine is removed from all expressed proteins. The action of the N-terminal methionine amino-peptidase pathway, or the activity of proteases such as signal peptidases and caspases, removes the first methionine of most proteins or cleaves an internal recognition site and exposes the amino-acid next to the methionine or any internal residue next to a cleavage site, respectively (Meinnel et al., 2006). Although a large fraction of the proteins in most proteomes are estimated to lose their N-terminal methionine, it is not clear which proteins are trimmed by exopeptidases after translation in vivo. Moreover, it is often unclear to what extent the N-terminus is trimmed. Recently, Lange and Overall assembled a database that includes results from several "terminomics" studies in which in vivo information about the actual protein N-and C-termini is collected (Lange and Overall, 2011). We used this database to find proteins for which there is experimental evidence for the removal of the first methionine residue. We then analyzed the distribution of amino acids at the N-terminus for these proteins with proven records for in vivo trimming. As before, we compared the groups of proteins with long or short Nterminal disordered segments for which protein half-life is available. Not enough proteins have proven records for in vivo methionine removal, have long N-terminal disordered segments, and have half-life information in yeast and human to get reliable results. However, amino acid frequency distributions could be calculated for mouse and statistical analysis revealed no significant difference between the frequencies of N-terminal amino acids after trimming between groups of proteins with long or short N-terminal disordered regions (P > 0.1, Wilcoxon signed-rank test). These results, combined with our analyses above (Figure S1H), indicate that the N-end rule is unlikely to account for the global differences in half-life of proteins with long or short N-terminal disorder.

SUPPLEMENTAL DISCUSSION
Discussion S1. Proteins without disordered segments can still be degraded quickly Not only can proteins with long disordered segments still have a long half-life, proteins without disordered segments can still be degraded quickly (Table 1C). Several factors may target a protein more efficiently to the proteasome and thus shorten its half-life. For example, post-translational modifications such as phosphorylation, methylation, N-acetylation, and ubiquitination itself may destabilize or unfold protein regions (Hagai et al., 2011;Hagai and Levy, 2010;Hwang et al., 2010;Kim et al., 2014;Lee et al., 2012) or direct the activity of accessory factors that unfold substrates and present them to the proteasome as discussed in the main text for the p97/VCP ATPase (Beskow et al., 2009). Other factors that contribute to shorter half-life of individual or groups of proteins are the availability of ubiquitinating enzymes and sequence determinants (e.g. KEN box and destruction box motifs, N-end rule, PEST sequences) (Bachmair et al., 1986;Pfleger and Kirschner, 2000;Rogers et al., 1986). Nevertheless our observations suggest that, upon recruitment of a substrate to the proteasome, terminal or internal disordered segments influence protein half-life as an underlying factor and that this can be modulated by other cellular mechanisms, sequence determinants and the structural stability of the substrate.

Discussion S2. Disordered segments could influence the dynamics and regulation of signaling pathways (extended)
Internal disordered segments in proteins can be cleaved by the proteasome to generate functionally active partial fragments that have crucial regulatory functions as demonstrated for the transcription factors NF-kB and Ci (Chen et al., 1999;Palombella et al., 1994;Piwko and Jentsch, 2006;Tian et al., 2005). If the protein functions in a homo-or a heteromeric complex, then the cleaved fragments can have important regulatory properties such as inducing a switch-like behavior by sequestering full-length proteins and acting in a dominant negative manner (Buchler and Louis, 2008). Furthermore, since the presence of a subunit containing disordered segments can target an entire protein complex for degradation (Prakash et al., 2009), variation in disordered segments of individual subunits may also influence the half-life of their interaction partners and that of the homo-or hetero-oligomeric complexes involving such proteins (Lin et al., 2000), thereby regulating the abundance of entire protein complexes.
Disordered segments could also be exploited to design signaling or transcription circuits composed of proteins with defined turnover rates, thereby generating networks with desired properties. For example, engineered kinases or transcription factors with a long terminal disordered segment may turn over rapidly and hence contribute to an ultrasensitive or all-or-none response (Alon, 2007;Kiel et al., 2010).

Yeast protein half-life data
Data on in vivo protein half-life for Saccharomyces cerevisiae was obtained from Belle et al. (Belle et al., 2006), who measured protein turnover by Western blot analysis of TAP-tagged genes as a function of time following the inhibition of protein synthesis. We discarded 366 proteins with a half-life of exactly 300 minutes, as the original paper assigned this value to stable proteins for which degradation curves could not be fitted by an exponential decay function and thus halflife could not be determined. Moreover, to make sure that clear outliers would not affect the statistics, we removed seven proteins with extremely long half-lives of >6000 minutes from the data. Removal of these 373 highly stable proteins with long half-lives does not bias the results of our analyses (see Supplemental Results S6).

Half-life groups
The yeast proteome was classified into three groups of roughly equal size based on half-life: (i) short-lived proteins (halflife ≤30 minutes), (ii) medium half-life proteins (half-life 31-70 minutes), and (iii) long-lived proteins (half-life >70 minutes). We analyzed the length of N-and C-terminal disorder within each group (Figures 1E and S1E). In order to assess the robustness of our results, we performed control calculations with variations on the half-life cutoffs: we grouped the proteome into two half-life groups, based on the median (≤42 minutes, short half-life; >42 minutes, long half-life; Figure S1F).

Disorder calculations and groups
All validated ORFs in the yeast genome were downloaded from UniProtKB/Swiss-Prot release 2010_11 (Uniprot-Consortium, 2011). Intrinsic disorder was predicted for all protein sequences using the DISOPRED2 (Ward et al., 2004) software, with default settings. DISOPRED2 is a support vector machine-based classifier, trained on missing electron density in solved crystal structures, which performs well in CASP assessments (Bordoli et al., 2007;Noivirt-Brik et al., 2009). Based on the predicted disorder, we calculated several properties relating to the length and location of disordered segments in the protein sequences. We then used this information to divide the yeast proteome into different groups:

(a) The length of the disordered segment at the protein N-and C-terminus
We counted the number of residues predicted to be disordered at the protein termini, treating the N-and C-termini separately. For this, we allowed for minor (up to three consecutive residues) stretches of structured residues. This means that we considered a disordered region ended when encountering a stretch of minimally four structured residues. Thus, continuous stretches of three or less structured residues were regarded as belonging to the disordered terminus and included in the calculation of the length of the disordered terminus (except when they were the start of stretches of four or more structured residues that would end the disordered region). Depending on the length of the disordered terminus, we classified the proteins into two groups: (i) those that have short (≤30 residues) and (ii) those that have long (>30 residues) disordered termini (Figure 1). We based the cutoff for long and short disordered termini on recent molecular models of the proteasome (da Fonseca et al., 2012;Lander et al., 2012;Lasker et al., 2012;Matyskiela et al., 2013;Sledz et al., 2013), and in vitro experimental studies using purified proteasomes showing that there is a critical minimum length of about 30 residues that allows a disordered protein terminus to efficiently initiate degradation (Inobe et al., 2011). In order to assess the robustness of our results, we performed control calculations with variations on the length cutoff (Table S1E).

(b) The presence of an internal disordered segment
The proteolytic sites are buried deep within the proteasome core particle, accessible only through a long narrow channel, and the same is true for the ATPase motor that drives protein substrates through the degradation channel (da Fonseca et al., 2012;Lander et al., 2012;Lasker et al., 2012;Matyskiela et al., 2013;Sledz et al., 2013). To investigate if the presence of internal disordered regions in a protein influences its half-life, we identified internal disordered segments as continuous stretches of at least 40 disordered amino acids in the middle of a protein (see main text). As for terminal disordered regions, we allowed for minor stretches of up to three structured residues in-between the disordered residues (see above). We discarded any N-and C-terminal disordered segments as defined above from the calculation of internal disordered segments. According to these definitions, we grouped the yeast proteome into two groups: (i) proteins that contained internal stretches of intrinsic disorder of at least 40 residues, and (ii) proteins that did not (Figure 2). In order to assess the robustness of our results, we performed control calculations with variations on the length cutoff (Table  S1E). Systematically varying the length cutoff for identifying an internal intrinsically disordered segment revealed that maximal difference in median half-life and statistical significance was obtained for a value of 40 amino acids. We also investigated the half-lives of proteins with multiple internal disordered regions (Figures 2C and S2C).

(c) The overall degree of disorder
The fraction of disordered residues (i.e. overall degree of disorder), which is an estimate of the packing, folding and structural stability of a protein, by itself correlates with protein half-life, although previous studies disagree on the extent of the effect (Gsponer et al., 2008;Tompa et al., 2008;Yen et al., 2008). Proteins with a greater overall disorder degree contain longer terminal and internal disordered segments ( Figure S3A). To determine whether the effects of continuous stretches of disordered residues (i.e. disordered segments) on protein turnover (Figures 1 and 2) are independent of the overall degree of disorder, we calculated disorder degree as the fraction of residues in the protein that was predicted to be disordered (number of disordered residues divided by sequence length). We then matched proteins that have a similar fraction of disordered residues but have varying combinations of disordered segments (long or short N-terminal disorder and/or presence or absence of internal disordered segments). The results are shown in Figure S3 and are reported in the main text (section "The effects of disordered segments on protein half-life are independent of the overall disorder degree"), and in an extended section in Supplemental Results S1. We also made classifications based on the scores reported by the disorder predictors, both for the average score of the entire protein (Figures S1C and S2A) and of the disordered regions alone (Figures S1D and S2B).
Unless otherwise noted, the disorder predictor used in our analyses is DISOPRED2. To ensure that our results are independent of the method used for intrinsic disorder prediction, we repeated the same calculations using disorder information from the IUPred (Dosztanyi et al., 2005) and PONDR VSL1 (Obradovic et al., 2005) predictors (Figures S1C-D and S2A-C, Tables S1B-D). IUPred was run in 'long' disorder prediction mode, and the disorder thresholds were adjusted to a score of 0.4 for IUPred and 0.6 for PONDR. These thresholds were chosen to maximize agreement with the average level of disorder observed in the DISPROT database release 5.7 (Sickmeier et al., 2007). We chose to complement our DISOPRED2 calculations with IUPred and PONDR because the three methods are very different, and are therefore likely to provide good control data. DISOPRED2 is a support vector machine-based classifier, trained on missing electron density in solved crystal structures, while IUPred is a sequence-based method that estimates interresidue interactions. Sequences with less favorable predicted pairwise interaction energies are more likely to be disordered, due to a lack of stabilizing contacts. The third predictor, PONDR VSL1, employs logistic regression models of various sequence attributes, and is trained on missing electron density in crystal structures and disordered regions identified by other means. Consistently, we get very similar results with the three different disorder predictors (Figures S1C-D and S2A-C, Tables S1B-D).

Data integration and description
Integration of the data on yeast protein half-life, and various other datasets (Table S1A), with the length and location of intrinsic disorder within the protein sequences resulted in a dataset of 3273 proteins. This covers about two-thirds of the complete yeast proteome (Christie et al., 2004). The data for our DISOPRED2-based analyses is available in Table S2.
The distribution of half-life values is approximately log-normal ( Figure S1A) with an enrichment for proteins with a very fast turnover rate of 2 and 3 minutes, as was also noted by Belle et al. (Belle et al., 2006). The half-lives have a mean of 98 minutes and a median of 43 minutes. N-and C-terminal disorder length distributions are similar to each other ( Figure S1A): there are a large number of proteins with very short terminal disorder, and much fewer proteins with very long terminal disorder. The means for N-and C-terminal disorder length are 19 and 13 residues, respectively. The medians are 8 and 3 residues. In both datasets the frequency of encountering a terminal disordered segment of a certain length decreases rapidly with increasing length. Linear regression analysis of the two distributions plotted in log-log fashion showed a reasonable fit (N-terminal: R 2 = 0.86; C-terminal: R 2 = 0.84, data not shown), which could suggest a power law distribution (Stumpf and Porter, 2012). Larger lengths are observed for internal disordered segments ( Figure  S1A): the mean length of an internal disordered segment is 24 residues, while the median length is 14 residues. The shape of the distribution is similar to the terminal disorder types, and linear regression analysis of a log-log plot also shows a reasonable fit (R 2 = 0.87, data not shown).

Statistical methods and estimation of significance
All analyses and statistics employed in our study were selected to be highly robust. As discussed above, most datasets are not normally distributed ( Figure S1A). Therefore we exclusively employed statistical tests that do not assume the data to come from a specific type of probability distribution and do not infer parameters of such distributions (such as the mean and variance for Gaussian distributions), i.e. we exclusively employed non-parametric statistics for estimating statistical significant (Mann-Whitney U, Wilcoxon Signed-Rank, Kruskal-Wallis, and Kolmogorov-Smirnov tests). Although parametric statistics generally have more power, which means that they have a smaller chance to commit a type II error (failure to reject a false null hypothesis), they commonly assume normally distributed data and violations of these and other assumptions can lead to misleading results. In contrast, non-parametric statistics rely of few assumptions about the data and, given that our data is not normally distributed, these are more robust than parametric statistics.
We primarily use the Mann-Whitney U test to compare half-life distributions of different groups of proteins and to estimate significance (but reach the same conclusions using the Kolmogorov-Smirnov test, Table S1K). The Mann-Whitney U test, also known as the Wilcoxon Rank-Sum test, evaluates whether two samples are likely to come from the same underlying distribution (H 0 ), and can be used to assess whether the medians of two distributions are significantly different. The Mann-Whitney U test assumes that the compared distributions have similar shapes, which is fair for our data, because the groups that we compare are always subsets of the whole data and have very similar overall shapes as demonstrated by the various boxplots throughout the paper. Boxplots are non-parametric visualizations for gaining insights into various properties of the distribution, such as the median, interquartile range, minimum and maximum, outliers, and skewness. The Kolmogorov-Smirnov test also evaluates whether two samples of observations come from the same distribution or not, and does so by determining the maximum vertical deviation between the empirical distribution functions of the two samples. The Kruskal-Wallis test extends the Mann-Whitney U test to three or more sample groups. When comparing half-life distributions of proteins paired by overall disorder degree, P values were calculated using the Wilcoxon Signed-Rank test, which is a non-parametric test for assessing difference between two paired samples.
Throughout our analyses, we report medians and interquartile ranges, which are robust measures of central tendency and dispersion, rather than means and standard deviations, which are sensitive to outliers and less resistant to errors produced by deviations from assumptions. Furthermore, we not only report P values of statistical differences between distributions, but also show that the magnitudes of the differences (effect sizes, which we report as the difference between the medians of the compared distributions) are of a biologically relevant order of magnitude.
Our primary analyses rely on binning and cutoffs for the data describing structural disorder, rather than for example on linear regression and correlation analyses, because this approach best captures the biology of protein degradation influenced by disordered segments: half-life does not depend linearly on the length of the disordered segment, which becomes only relevant from a minimal value (see main text and Figure S1B). Thus, we do not presume the existence of a linear relationship, but simply investigate whether the group of proteins with disordered segments has a different half-life when compared to the group without disordered segments by binning proteins into groups based on the critical cutoff found in biochemical studies. Grouping of data points into classes and assessing the difference between the distributions is a powerful way to identify the existence of a relationship without assuming any underlying model of correlation.
Even if one assumes a linear relationship, calculation of the best linear fit for a large number of experimentally determined data points is unlikely to yield high correlation values. Indeed, although the length of disordered segments at various positions in proteins negatively correlates with half-life, the r values are of a very small magnitude (Pearson r = -0.02 for the correlation between the length of N-terminal disorder and protein half-life, r = -0.01 for C-terminal disorder, and r = -0.05 for the longest internal disordered segment, Figure S1B). Only the correlation between internal disorder length and protein half-life is statistically significant (P = 3 × 10 -3 ). These results agree with our reported observations suggesting an inverse relationship between the presence of disordered segments and protein half-life, but, more importantly, underscore that correlation analyses and assumptions about a linear relationship are insufficient to capture the biology of protein degradation influenced by disordered segments.
Plots of the data and all statistical tests to estimate significance were carried out using the R statistical package (R Development Core Team).

Paralog calculations
A complete list of paralogous proteins in yeast was obtained in two steps: 1. First, we ran the program BLASTClust ) on the S. cerevisiae proteome. BLASTClust works by performing pairwise sequence comparisons among all yeast proteins and subsequently grouping the proteins by single-linkage clustering. It accepts various parameters affecting the stringency of clustering -in this study, sequences were registered as a pairwise match when they are at least 25% identical (parameter S) over an area covering 60% of the length (parameter L). This ensured, on the one hand, that pairs are sufficiently similar to reduce the number of false-positive paralog pairs and, on the other hand, that the genes could have diverged enough to allow for their half-lives and/or disordered regions to change. The heuristic list of paralogs was obtained by forming all possible pairs within each cluster. 2. In order to include more divergent paralogs that were not picked up by the procedure in step 1, we added to the list all known paralog pairs that resulted from the whole genome duplication in yeast (Wolfe and Shields, 1997). This additional data was obtained from the Yeast Gene Order Browser (http://wolfe.gen.tcd.ie/ygob/) Version 3 (Gordon et al., 2009). Table S7 shows all 1440 pairs for which protein half-life data is available for both paralogs. To calculate the differences in half-life ΔH and N-terminal disorder length ΔL between the individual proteins in a paralog pair, the following convention is made: ΔL is defined to be always positive (Figure 4). In other words, we define "paralog 1" to be the paralog with the longer N-terminal disorder and "paralog 2" the protein with the shorter N-terminal disorder of the paralog pair. ΔL is then obtained by subtracting the N-terminal disorder length of paralog 2 from the N-terminal disorder length of paralog 1 (ΔL = L 1 − L 2 , with L 1 ≥ L 2 ). To calculate ΔH, the order of paralogs in a pair is maintained, so that ΔH can be positive or negative (ΔH = H 1 − H 2 ). Thus, ΔH is negative whenever the relationship "longer disordered Nterminus = shorter half-life" holds true. We separated the paralog pairs into two groups according to the difference in the length of their N-terminal disordered segments: pairs where one paralog has a short and the other paralog a long disordered N-terminus, and pairs where both paralogs have short or both have long disordered N-termini. Similarly, for internal disorder, we define the difference in the number of internal disordered regions (ΔI) to be always positive (ΔI = I 1 − I 2 , with I 1 ≥ I 2 ). We separated the paralog pairs into two categories: pairs with an identical number of sites (ΔI = 0), and pairs where one paralog has one or more sites more than the other (ΔI ≥ 1). Since the ΔI = 0 pairs cannot be arranged based on the number of internal disordered segments (i.e. I 1 = I 2 and thus I 1 > I 2 is never true), we ordered the members of such pairs by the total number of residues that make up all internal disordered segments (internal disorder length, IL) in these proteins (analogous to N-terminal disorder length ordering used above). We did this twice to simulate two different evolutionary scenarios: once we subtracted the half-life of the paralog with the longest total internal disorder from the half-life of the paralog with the shorter total internal disorder (length of internal disorder increased during evolution), and once the other way around (length of internal disorder decreased during evolution). Thus, for each ΔI = 0 pair, we once calculate ΔH as H 1 − H 2 where IL 1 ≥ IL 2 , and once as H 1 − H 2 where IL 1 ≤ IL 2 We calculated the half-life and N-terminal disorder length differences for several randomized controls to ensure that the trend observed in the paralog pair analysis is not merely a product of chance. Specifically, we randomized (i) the disordered N-terminus length, (ii) the protein half-life and (iii) both values among all proteins (Table S6B). Thus, the overall distribution of values remains intact during the randomization, while the individual assignment to paralog pair groups may change. To make sure that the observed effect is paralog pair specific we also generated a random set of protein pairs from the singletons (clusters of size 1, corresponding to proteins that lack a paralog) in the BLASTClust analysis (Table S6B).

Mouse and human data
To assess the effects of terminal and internal disordered segments on protein half-life in other organisms than yeast, we performed the same analyses with mouse protein half-life data and human relative degradation rates, with the exception of the paralogs analysis (see section "The experimental design used to measure protein half-life in mouse and human does not permit a confident investigation of half-life differences among paralogous proteins", below). Reviewed protein sequences for mouse and human were downloaded from UniProtKB/Swiss-Prot release 2011_4 (Uniprot-Consortium, 2011). Data on half-life for ~4500 proteins in NIH3T3 mouse fibroblasts was obtained from Schwanhäusser et al. (Schwanhausser et al., 2011). Data on relative degradation rates for ~4000 proteins in human THP-1 myelomonocytic leukemia cells, under conditions that stimulate proliferation, was obtained from Kristensen et al. (Kristensen et al., 2013). Both studies make use of stable isotope labeling by amino acids in cell culture (SILAC) in combination with mass spectrometry. Upon transferring cells from light to heavy medium, newly synthesized proteins incorporate heavy labeled amino acids, while the pre-existing proteins remain in the light from. Protein half-lives and degradation rates were derived from the ratio between the heavy and light peptides, measured using mass spectrometry at different time points after the transfer of cells to heavy medium.

The experimental design used to measure protein half-life in mouse and human does not permit a confident investigation of half-life differences among paralogous proteins
Protein turnover in mouse (Schwanhausser et al., 2011) and human (Kristensen et al., 2013) has been measured using stable isotope labeling by amino acids in cell culture (SILAC) in combination with mass spectrometry. Upon transferring cells from light to heavy medium, newly synthesized proteins incorporate heavy labeled amino acids, while the preexisting proteins remain in the light from. Protein half-lives and degradation rates were derived from the ratio between the heavy and light peptides, measured using mass spectrometry at different time points after the transfer of cells to heavy medium.
Peptides that are detected using mass spectrometry need to be assigned to proteins. In most cases, unique peptides can be mapped to proteins with reasonable confidence. However, some proteins give rise to identical peptides, in which case it becomes hard to determine which protein the peptides originated from (Li et al., 2009;Nesvizhskii and Aebersold, 2005). Most methods for assigning MS peptides to proteins require that multiple, distinct peptides are present in order for a protein to be identified. Nevertheless, when proteins are very similar, problems arise, as they may not give rise to enough unique peptides to reliably quantify similar proteins. Since paralogous proteins are similar to each other by definition, and have in some cases hardly diverged during evolution, such sequences can be hard or impossible to differentiate using mass spectrometry. Similarly, proteins arising from alternative splicing or alternative initiation during transcription or translation are difficult to characterize. Thus, the current limitation of mass spectrometry to reliably distinguish peptides arising from proteins with similar sequences restricts us from performing an analysis of half-life differences among paralogous proteins in mouse and human.