Malleable ribonucleoprotein machine: protein intrinsic disorder in the Saccharomyces cerevisiae spliceosome

Recent studies revealed that a significant fraction of any given proteome is presented by proteins that do not have unique 3D structures as a whole or in significant parts. These intrinsically disordered proteins possess dramatic structural and functional variability, being especially enriched in signaling and regulatory functions since their lack of fixed structure defines their ability to be involved in interaction with several proteins and allows them to be re-used in multiple pathways. Among recognized disorder-based protein functions are interactions with nucleic acids and multi-target binding; i.e., the functions ascribed to many spliceosomal proteins. Therefore, the spliceosome, a multimegadalton ribonucleoprotein machine catalyzing the excision of introns from eukaryotic pre-mRNAs, represents an attractive target for the focused analysis of the abundance and functionality of intrinsic disorder in its proteinaceous components. In yeast cells, spliceosome consists of five small nuclear RNAs (U1, U2, U4, U5, and U6) and a range of associated proteins. Some of these proteins constitute cores of the corresponding snRNA-protein complexes known as small nuclear ribonucleoproteins (snRNPs). Other spliceosomal proteins have various auxiliary functions. To gain better understanding of the functional roles of intrinsic disorder, we have studied the prevalence of intrinsically disordered proteins in the yeast spliceosome using a wide array of bioinformatics methods. Our study revealed that similar to the proteins associated with human spliceosomes (Korneta & Bujnicki, 2012), proteins found in the yeast spliceosome are enriched in intrinsic disorder.


INTRODUCTION
Eukaryotic genes are typically characterized by a mosaic architecture, being organized into a line of alternating exons and introns. The EXONs are those EXpressed regiONs that become the mRNA, and the INTRONs are those INTRagenic regiONs that are located inside the gene and are removed in the process of making a mature messenger RNA (mRNA) from its precursor (pre-mRNA). Therefore, the process of eukaryotic mRNA maturation includes a very important step of splicing, which takes place after or concurrently with pre-mRNA transcription, and which ensures that introns are removed and exons are joined. Here, the pre-mRNA is spliced at splice junctions found at the extreme ends of each and every intron. Although some exons are constitutively spliced; i.e., they are present in every mRNA produced from a given pre-mRNA, there are multiple ways of how exons are joined during the RNA splicing, and many pre-mRNAs are alternatively spliced to generate variable forms of mRNA from a single pre-mRNA species.
Alternative (or differential) splicing is very ubiquitous in eukaryotes (e.g., ∼95% of multiexonic genes in humans are alternatively spliced (Pan et al., 2008)), where it is believed to contribute to the greatly increased biodiversity of proteins that can be encoded by the genome (Black, 2003). In fact, since the different mRNAs generated from a single pre-mRNA can be translated into different protein isoforms, a single gene may code for multiple proteins. For example, >500 isoforms of the calcium-activated potassium channel Slo that are translated from the different mRNAs produced by the alternative splicing of a single slo gene define the ability of ears to detect a remarkable range of frequencies (Black, 1998;Graveley, 2001;Xu et al., 2007). The Drosophila melanogaster gene Dscam (a drosophila homolog of human Down syndrome cell adhesion molecule, DSCAM) could potentially have 38,016 splice variants which are crucial for the specificity of neuronal connectivity (Schmucker et al., 2000;Celotto & Graveley, 2001;Kreahling & Graveley, 2005). In human titin, which is an extremely large elastic protein (>4,200 kDa) found in heart and skeletal muscle, over a million splice pathways can be potentially derived from the PEVK region alone (so called for its high content of proline (P), glutamate (E), valine (V), and lysine (K) residues) (Wang, 1996;Maruyama, 1997;Gregorio et al., 1999;LeWinter et al., 2007;Guo et al., 2010). Therefore, alternative splicing defines the increased diversity of eukaryotic proteomes compared to their corresponding genomes (Nilsen & Graveley, 2010). Also, aberrant pre-mRNA splicing constitutes the basis of some human diseases or contributes to the severity of other human maladies (Novoyatleva et al., 2006;Ward & Cooper, 2010).
Pre-mRNA splicing takes place in all eukaryotic organisms investigated to date, from yeast to metazoans. Although in some organisms splicing might occur spontaneously, where the pre-mRNA acts as a ribozyme, being able to fold on itself, cleave itself, and then remove the intron by itself, for the majority of eukaryotic introns, splicing of pre-mRNA is done in a series of reactions catalyzed by the multimegadalton ribonucleoprotein (RNP) complex known as spliceosome (Brow, 2002;Wahl, Will & Luhrmann, 2009). The canonical assembly of the spliceosome occurs anew on each pre-mRNA that contains specific sequence elements (such as the 5' end splice, the branch point sequence, the polypyrimidine tract, and the 3' end splice site) that are recognized and utilized during spliceosome assembly.
There are two spliceosome types, the major spliceosome, which contains five small nuclear ribonucleoproteins (snRNPs, often pronounced as snurps, the U1, U2, U4/U6, and U5 snRNPs) as the main building blocks, and which is responsible for removing the vast majority of pre-mRNA introns; and the minor spliceosome, which is present in some metazoan species and plants, and which is composed of the compositionally distinct but functionally analogous U11/U12 and U4atac/U6atac snRNPs, with the U5 snRNP shared between the machineries (Patel & Steitz, 2003). The major spliceosome is composed of five small nuclear RNA (snRNA) molecules: U1, U2, U4, U5 and U6, and a number of core proteins. A common feature of all spliceosomal snRNPs except U6 is the presence of seven mutually related Sm proteins. U6 contains a set of related "like-Sm" (Lsm) proteins (Veretnik et al., 2009). In the spliceosomal snRNPs, the Sm or Lsm proteins form a ring structure whereas a U-rich sequence in the snRNA binds in the positively charged central hole of this ring (Kambach, Walke & Nagai, 1999;Kambach et al., 1999). This core structure is further enhanced by 80-150 proteins that are abundant in the human spliceosome and are essential to the process of spliceosome-dependent splicing (Agafonov et al., 2011).
Based on the proteomic analysis of yeast spliceosome it has been concluded that the yeast splicing machinery likely contains the evolutionarily conserved core set of spliceosomal proteins that are required for constitutive splicing (Fabrizio et al., 2009). On the other hand, the number of proteins found in the yeast B, B act and C complexes was noticeably lower than that in the corresponding metazoan complex (Fabrizio et al., 2009;Will & Luhrmann, 2011). For example, there were only ∼60 proteins in yeast pre-catalytic B complexes (compared to ∼110 in humans and D. melanogaster spliceosomes), including essentially all U1, U2, and U4/U6.U5 tri-snRNP proteins together with proteins of the nineteen complex (NTC) and mRNA retention and splicing (RES) complex (Fabrizio et al., 2009). Similarly, yeast C complexes contained only ∼50 proteins compared to ∼110 in metazoan C complexes. Therefore, this analysis revealed that yeast spliceosomes contain ∼90 proteins, almost all of which have homologs in higher eukaryotes (Fabrizio et al., 2009). Many of the remaining ∼80 proteins found in human and D. melanogaster spliceosomes but not detected in yeast were shown to play a role in alternative splicing, a process that is essentially absent in yeast (Fabrizio et al., 2009). The much lower number of proteins in yeast spliceosome compared to the metazoan counterpart suggests that yeast possesses a different, or at least simplified splicing mechanism. For example, it is likely that this reduction can be related to the extremely low number of spliceable genetic material (there are only about 250 introns in S. cerevisiae).
The highly dynamic conformation and composition of the spliceosomal proteins determine the accuracy and flexibility of the splicing machinery (Will & Luhrmann, 2011). The major constituents and regulators of the spliceosome (snRNPs and related non-snRNP proteins) are mostly conserved from yeast to metazoan (Fabrizio et al., 2009). In yeast, the spliceosome assembly on its pre-mRNA substrate represents a highly ordered and regulated process that starts with recognition of the 5' end of the intron (5' splice site, 5'ss) of the pre-mRNA by the U1 snRNP. Next, the U2 snRNP binds to the pre-mRNA's branch site, forming complex A. This complex A then binds the preformed U4/U6.U5 tri-snRNP to produce penta-snRNP complex B, which contains a full set of five snRNAs in a pre-catalytic state. Complex B is then activated for catalysis by a major rearrangement of its RNA network and by global changes of its overall structure, where the association of U4 with U6 is destabilized, enabling U6 to isomerize into a base-pairing interaction with U2 to form part of the catalytic center of the spliceosome. This remodeling also includes dissociation of the U1 and U4 snRNAs and binding of a set of specific proteins leading to the formation of the activated spliceosome (B act ).
Step 1 of splicing takes place in catalytically activated complex, B * . Here, the adenosine at the branch site attacks the 5'ss site of the pre-mRNA, generating a cleaved 5'-exon and intron-3'-exon intermediate. Finally, the complex C is formed via binding another set of specific proteins. This complex C catalyzes step 2 of splicing, in which the intron is cleaved at the 3'-splice-site (3'ss) with concomitant ligation of the 5' and 3' exons (Fabrizio et al., 2009;Will & Luhrmann, 2011).
Importantly, although the RNA acts as a catalyst in snRNPs, the spliceosomal proteins are not just passive building blocks that hold the RNA in the correct configuration to stabilize it, but carry out essential recognition and catalytic functions during the assembly of the spliceosome and splicing-related catalytic reactions (Abelson, 2008;Pyle, 2008;Fabrizio et al., 2009), and also play a crucial role in the selection of intron substrates during the alternative splicing (Caceres & Kornblihtt, 2002). It is also important to remember that in addition to the five snRNAs, pre-mRNA splicing requires the activity of a large number of proteins, often called pre-mRNA processing proteins (Prps). Many spliceosomal and non-spliceosomal proteins are believed to have important activities related to the specificity, accuracy, and regulation of the spliceosome (Russell et al., 2000). Since these proteins are involved in numerous protein-protein and protein-RNA interactions, there is a great chance that at least some of them might belong to the class of intrinsically disordered proteins.
IDPs and IDPRs are the key players in various protein-protein interaction networks, being especially abundant among hub proteins and their binding partners (Dunker et al., 2005;Dosztanyi et al., 2006;Ekman et al., 2006;Haynes et al., 2006;Patil & Nakamura, 2006;Singh et al., 2006). Furthermore, regions of pre-mRNA which undergo alternative splicing commonly encode for the disordered regions (Romero et al., 2006). This association of alternative splicing and intrinsic disorder helps proteins to avoid folding difficulties and provides a novel mechanism for developing tissue-specific protein interaction networks (Romero et al., 2006;Uversky, Oldfield & Dunker, 2008).
The hypothesis that the spliceosomal proteins might be enriched in intrinsic disorder is supported by the aforementioned results of the bioinformatics analysis of the correlation between the Swiss-Prot functional keywords and protein intrinsic disorder which clearly showed that mRNA processing and mRNA splicing were among 20 top biological processes associated with protein intrinsic disorder (Xie et al., 2007a). Furthermore, the functional keyword spliceosome was at the position #4 of the top 20 cellular components strongly correlated with predicted disorder (Vucetic et al., 2007). Also, there are several case studies, where intrinsic disorder was found in some spliceosomal proteins. For example, NMR analysis revealed that the flanking N-(residues 1-20) and C-terminal regions (residues 100-125) of the protein p14 (which is a subunit of the essential splicing factor 3b (SF3b) present in both the major and minor spliceosomes Will et al., 2001;, and which is located near the catalytic center of the spliceosome and is responsible for the first catalytic step of the splicing reaction (Query, Strobel & Sharp, 1996;) are unstructured (Spadaccini et al., 2006). Serine/arginine-rich (SR) splicing factors are important spliceosomal IDPs, which, besides their significance for both constitutive and alternative splicing (Zahler et al., 1992), play key roles in the spliceosome assembly by facilitating recruitment of components of the spliceosome via protein-protein interactions (Roscigno & Garcia-Blanco, 1995) that are potentially mediated by the disordered SR domains of these splicing factors (Haynes & Iakoucheva, 2006). Finally, a recently reported systematic bioinformatics analysis of the abundance of intrinsic disorder in the proteome of the human spliceosome provided a strong support to the "disordered spliceosome" hypothesis (Korneta & Bujnicki, 2012).
Since metazoan spliceosomes are rather different from the yeast counterparts (for example, yeast spliceosomes have radically fewer proteins than metazoan spliceosomes, possessing typically less than half proteins per spliceosomal complex (Fabrizio et al., 2009)), and since the protein sequence homology between yeast and human spliceosomal proteins ranges from 36 to a little over 50% (Ben-Yehuda et al., 2000), data on the abundance of intrinsic disorder in human spliceosomal proteome cannot be directly projected to the yeast proteomes. Therefore, in the present work we have studied the prevalence of intrinsic disorder in the yeast spliceosome using a wide array of bioinformatics methods. Our study showed that similar to the proteins associated with human spliceosomes (Korneta & Bujnicki, 2012), proteins found in the yeast spliceosome are relatively enriched in intrinsic disorder.

Dataset
In this work we studied the presence of intrinsic disordered proteins (IDP) in the yeast spliceosome. The first step was to search of the UniProt database (http://www.uniprot. org) for known proteins in the baking yeast's (Saccharomyces cerevisiae) spliceosome. This query resulted in 140 proteins, from which 109 reviewed entries were selected to make sure that the proteins chosen for analysis were manually annotated and reviewed by UniProtKB curators. The amino acid sequences in FASTA format of all these 109 yeast spliceosomal proteins were retrieved from the UniProt database and used in subsequent analysis.
At the next stage, we compared this dataset with a set of yeast spliceosomal proteins found via the comprehensive proteomic analysis of the yeast spliceosomal complex B, activated B act , and step 1 complex C (Fabrizio et al., 2009). This experimentally determined set contained 89 proteins directly assigned to different spliceosomal components and complexes. Table 1 groups these proteins according to their functional/structural annotations and also lists 20 extra spliceosomal proteins found via the UniProt search.

Analysis of the amino acid composition of yeast spliceosomal proteins
To gain insight into the relationships between sequence and disorder, amino acid compositions of different datasets were compared using an approach recently developed for IDPs (Dunker et al., 2001;Vacic et al., 2007a). To this end, the fractional difference

Notes.
Proteins that have information on 3-D structures for entire proteins or some of their parts are highlighted in light blue. Highly disordered proteins selected for detailed functional and structural analysis are highlighted in light red. Highly disordered proteins selected for detailed functional analysis that have information on 3-D structures are highlighted in light pink.
in composition between a given set of proteins and a set of reference proteins (either a set of yeast spliceosomal proteins or a set of disordered proteins from DisProt database (Vucetic et al., 2005;Sickmeier et al., 2007)) was calculated for each amino acid residue. The fractional difference was calculated as (C X − C order )/C order , where C X is the content of a given amino acid in a query protein set, and C order is the corresponding content in a set of ordered proteins and plotted for each amino acid. In corresponding plots, the amino acids were arranged from the most order-promoting to the most disorder-promoting (Radivojac et al., 2007).

Per residue disorder scores
The intrinsic disorder propensities of the spliceosomal proteins were evaluated by several different disorder predictors, such as PONDR R VLXT (Dunker et al., 2001), PONDR R VSL2 (Peng et al., 2005), PONDR R VL3 (Peng et al., 2006), FoldIndex (Prilusky et al., 2005), IUPred (Dosztanyi et al., 2005a), TopIDP (Campen et al., 2008), RONN (Yang et al., 2005), and PONDR R FIT (Xue et al., 2010). These predictors are briefly described below. PONDR R VLXT applies various compositional probabilities and hydrophobic measures of amino acid as the input features of artificial neural networks for the prediction (Romero et al., 2001). PONDR R VLXT applies three different neural networks, one for each terminal region and one for the internal region of the sequence. Each neural network is trained by a specific dataset containing only the amino acid residues of that specific region. The final prediction result uses the individual predictors in their respective regions. The transition from one predictor to another is accomplished by computing the average scores of the two predictors for a short region of overlap at the boundary between the two regions. The input features of neural networks include selected compositions and profiles from the primary sequences. PONDR R VLXT may underestimate the occurrence of long disordered regions in proteins. Although it is no longer the most accurate predictor, it is very sensitive to the local compositional biases. Hence, this method has significant advantages in finding potential binding sites (Oldfield et al., 2005a;Cheng et al., 2007).
PONDR R VL3 employs ten neural networks and selects the final prediction by simple major voting. The input features of these predictors are various sequence profiles. This predictor has higher accuracy in predicting longer disordered regions (Peng et al., 2006). PONDR R VSL2 is a combination of neural network predictors for both short and long disordered regions. A length limit of 30 residues divides short and long disordered regions. Each individual predictor is trained by the dataset containing sequences of that specific length. And the final prediction is a weighted average determined by a second layer predictor. PONDR R VSL2 applies not only the sequence profile, but also the result of sequence alignments from PSI-blast and secondary structure prediction from PHD and PSI-pred. This predictor is one the most accurate predictor in the PONDR family (Peng et al., 2005).
IUPred assumes that globular proteins have larger numbers of effective inter-residue interactions (negative free energy) than disordered proteins due to the different types of amino acids involved in possible residue contacts. Based on this idea, a composition-based pair-wise interaction matrix was shown to give values similar to those obtained from a structure-based interaction matrix. Structured and disordered proteins were compared by this approach, with the structured proteins found to have a significantly lower free energy estimate, thus giving a means to predict whether a protein is structured or disordered using amino acid sequence as input (Dosztanyi et al., 2005a).
FoldIndex is a method developed from charge-hydropathy plots (Uversky, Gillespie & Fink, 2000) by rearranging the terms in the basic equation and by adding the technique of sliding windows (Prilusky et al., 2005). The charge-hydropathy plot was designed to determine if a protein is disordered or not. By applying a sliding window of 21 amino acids centered at a specific residue, the position of this segment on charge-hydrophobicity plot can be calculated, and the distance of this position away from the boundary line is taken as an indication whether the central residue is disordered or not (Prilusky et al., 2005).
TopIDP is a numerical scale giving the order-disorder propensity for each amino acid. This scale was determined by maximizing the differences in conditional probabilities for structured versus disordered regions of proteins for the central residues in windows of 21 residues (Campen et al., 2008).
RONN is the regional order neural network software that applies the "biobasis function neural network" pattern recognition algorithm for the detection of natively disordered regions in proteins. It predicts disordered structures based on the sequence alignments (Yang et al., 2005).
Binary disorder predictions. Cumulative distribution function curves or CDF curves (Oldfield et al., 2005b) were generated for each dataset using PONDR R FIT scores for each of the spliceosomal proteins. CDF analysis discriminates between order and disorder by means of a boundary value (Xue et al., 2009). This value can be interpreted as a measure of proportion of residues with low and high disorder predictions. Additionally, charge-hydropathy distributions (CH-plots) were also analyzed for these proteins using methods as described in Uversky, Gillespie & Fink (2000). α-MoRF predictions. The predictor of α-helix forming Molecular Recognition Features, α-MoRF, is based on observations that predictions of order in otherwise highly disordered proteins corresponds to protein regions that mediate interaction with other proteins or nucleic acids. This predictor focuses on short binding regions within long regions of disorder that are likely to form helical structure upon binding (Oldfield et al., 2005a). It uses a stacked architecture, where PONDR R VLXT is used to identify short predictions of order within long predictions of disorder and then a second level predictor determines whether the order prediction is likely to be a binding site based on attributes of both the predicted ordered region and the predicted surrounding disordered region. An α-MoRF prediction indicates the presence of a relatively short (20 residues), loosely structured helical region within a largely disordered sequence (Oldfield et al., 2005a;Cheng et al., 2007). Such regions gain functionality upon a disorder-to-order transition induced by binding to partners (Mohan et al., 2006;Vacic et al., 2007b). ANCHOR analysis. In addition to MoRF identifiers, potential binding sites in disordered regions can be identified by the ANCHOR algorithm (Dosztanyi, Meszaros & Simon, 2009;Meszaros, Simon & Dosztanyi, 2009). This approach relies on the pairwise energy estimation approach developed for the general disorder prediction method IUPred (Dosztanyi et al., 2005a;Dosztanyi et al., 2005b), being based on the hypothesis that long regions of disorder contain localized potential binding sites that cannot form enough favorable intrachain interactions to fold on their own, but are likely to gain stabilizing energy by interacting with a globular protein partner (Dosztanyi, Meszaros & Simon, 2009;Meszaros, Simon & Dosztanyi, 2009). Here we are using the term ANCHOR-indicated binding site (AIBS) to identify a region of a protein suggested by the ANCHOR algorithm to have significant potential to be a binding site for an appropriate but typically unidentified partner protein.

Structural and functional annotation of selected proteins
We selected the 24 most disordered spliceosomal proteins according to an average between the disorder scores calculated by different predictors for more focused analysis of their structures, disorder propensities, and functions. In addition to the level of predicted intrinsic disorder, these proteins were chosen to represent all the major components and complexes comprising the yeast spliceosome. These proteins were researched for their function, structures, location within the spliceosome, etc. This information was obtained from the UniProtKB, and validated through the literature search.

Evaluation of the abundance of intrinsic disorder in yeast spliceosomal proteins
To test for a correlation between the yeast spliceosomal proteins and intrinsic disorder, a dataset of 109 proteins associated with the yeast spliceosome was extracted from UniProt as described in Materials and Methods. Next, this set of proteins was analyzed using a broad spectrum of computational tools for the evaluation of intrinsic disorder in proteins. Results of this analysis are discussed below.
Analysis of the compositional biases. Since the amino acid sequences and compositions of IDPs and IDPRs are significantly different from those of ordered proteins and folded domains, a simple analysis of the amino acid composition biases can provide interesting information on the nature of a protein. For example, the amino acid compositions of extended IDPs (i.e., those disordered proteins that do not have almost any residual structure and behave as native coils and native pre-molten globules (Dunker et al., 2001;Uversky, 2002a;Uversky, 2002b;Uversky, 2003;Uversky & Dunker, 2010)) are characterized by low mean hydropathy and high mean net charge, which define the highly unstructured and extended state of these proteins, since high net charge leads to strong electrostatic repulsion, and low hydropathy prevents efficient compaction (Uversky, Gillespie & Fink, 2000). Overall, IDPs/IDPRs are known to be significantly depleted in so-called order-promoting amino acids, C, W, I, Y, F, L, H, V, and N, and substantially enriched in disorder-promoting residues, A, G, R, T, S, K, Q, E, and P (Dunker et al., 2001;Romero et al., 2001;Williams et al., 2001;Radivojac et al., 2007;Vacic et al., 2007a). Therefore, the evaluation of the amino acid biases in a set of proteins can be used as a fast and informative way to evaluate their intrinsically disordered nature. This analysis can be done using a computational tool, Composition Profiler (Vacic et al., 2007a), which is based on the calculation of a normalized composition of a given protein or protein dataset in the (C x − C order )/C order form, where C x is a content of a given residue in a query dataset, and C order is the corresponding value for the set of ordered proteins from PDB Select 25 (Berman et al., 2000).
Results of this analysis are shown in Fig. 1A, which illustrates that, in comparison with typical ordered proteins, yeast spliceosomal proteins are moderately depleted in some order-promoting residues (e.g., C, W, Y, F, H, and V, see orange bars in Fig. 1A) and are moderately enriched in some major disorder-promoting residues (e.g., D, K, Q, S and E). On the other hand, some order-promoting residues (I, L and M) are rather common in these proteins, whereas some disorder-promoting residues (G, A, and P) are clearly underrepresented in yeast spliceosome. Both depletion in major order-promoting residues and enrichment in major disorder-promoting residues suggest that the yeast spliceosomal proteins might contain multiple signatures characteristic for the disordered proteins.
Abundance of long disordered regions in yeast spliceosomal proteins. Previous study revealed that intrinsic disorder is very abundant in signaling proteins, and this abundance can be evaluated by estimating the fraction of proteins with long disordered regions (Iakoucheva et al., 2002). In fact, the application of PONDR R VLXT (Romero et al., 2001) showed that 66% of cell-signaling proteins contain predicted regions of disorder of 30 residues or longer (Iakoucheva et al., 2002). Therefore, we applied similar approach and systematically analyzed the intrinsic disorder tendencies in four protein datasets: (1) 109 yeast spliceosomal proteins (spliceosome); (2) 2,329 signaling proteins collected by the Alliance for Cellular Signaling (AfCS); (3) 53,630 eukaryotic proteins from UniProt (EU UP); and (4) a set of 1,138 non-homologous protein segments with well-defined 3-D structure from the Protein Data Bank Select 25 (O PDB S25). Figure 1B illustrates that intrinsic disorder is prevalent in the yeast spliceosomal proteins, being comparable with the prevalence observed for signaling and eukaryotic proteins. In fact, the percentages of proteins with 30 or more consecutive residues predicted to be disordered were 53% for the spliceosomal proteins, 66% for AfCS, 47% for EU SW, and 13% for O PDB S25. In other words, the fraction of yeast spliceosomal proteins with long regions of predicted disorder is 4-fold higher than that of non-homologous ordered proteins from PDB (Iakoucheva et al., 2002), being also a bit higher than the corresponding fraction in eukaryotic proteins.
x Figure 1 Evaluation of abundance of intrinsic disorder in the yeast spliceosome. A. Fractional difference in the amino acid composition between the different yeast spliceosomal proteins and a set of completely ordered proteins calculated for each amino acid residue (compositional profiles). The fractional difference was evaluated as (C x − C order )/C order , where C x is the content of a given amino acid in a query set, and C order is the corresponding content in the dataset of fully ordered proteins. Composition profile of typical IDPs from the DisProt database is shown for comparison (black bars). Positive bars correspond to residues found more abundantly in histones, whereas negative bars show residues, in which histones are depleted. Amino acid types are ranked according to their increasing disorder-promoting potential (Radivojac et al., 2007). B. Abundance of predicted long disordered regions in yeast spliceosomal proteins (black bars) in comparison with long disordered regions in 2,329 proteins involved in cellular signaling (AfCS, red bars), 53,630 eukaryotic proteins from SWISS-PROT (EU SW, green bars), and 1,138 sequences corresponding to ordered parts of proteins from PDB Seect 25 (O PDB S25, yellow bars).

Disorder propensity of yeast spliceosomal proteins studied by the binary disorder predictors.
Sequences of the 109 yeast spliceosomal proteins were used to predict whether these proteins are likely to be mostly disordered using two binary predictors of intrinsic disorder: charge-hydropathy plot (CH-plot) (Uversky, Gillespie & Fink, 2000;Oldfield et al., 2005b) and cumulative distribution function analysis (CDF) (Oldfield et al., 2005b). Both these methods perform binary classification of whole proteins as either mostly disordered or mostly ordered, where mostly ordered indicates proteins that contain more ordered residues than disordered residues and mostly disordered indicates proteins that contain more disordered residues than ordered residues (Oldfield et al., 2005b). Here, the coordinates of each point were calculated as a distance of the corresponding protein in the CH-plot from the boundary (Y-coordinate) and an average distance of the respective CDF curve from the CDF boundary (X-coordinate). The four quadrants correspond to the following predictions: Q1, proteins predicted to be disordered by CH-plots, but ordered by CDFs; Q2, ordered proteins; Q3, proteins predicted to be disordered by CDFs, but compact by CH-plots (i.e., putative molten globules or mixed proteins); Q4, proteins predicted to be disordered by both methods (i.e., proteins with extended disorder). Figure 2 represents the results of the combined CH-CDF analysis of the spliceosomal proteins and shows that ∼50% of these proteins are mostly disordered. In this plot, the coordinates of each spot are calculated as a distance of the corresponding protein in the CH-plot (charge-hydropathy plot) from the boundary (Y-coordinate) and an average distance of the respective cumulative distribution function (CDF) curve from the CDF boundary (X-coordinate) (Mohan et al., 2008;Xue et al., 2009;Huang et al., 2012). The primary difference between these two binary predictors (i.e., predictors which evaluate the predisposition of a given protein to be ordered or disordered as a whole) is that the CH-plot is a linear classifier that takes into account only two parameters of the particular sequence (charge and hydropathy), whereas CDF analysis is dependent on the output of the PONDR R predictor, a nonlinear classifier, which was trained to distinguish order and disorder based on a significantly larger feature space. According to these methodological differences, CH-plot analysis is predisposed to discriminate proteins with substantial amount of extended disorder (random coils and pre-"molten globules") from proteins with compact conformations ("molten globule"-like and rigid well-structured proteins). On the other hand, PONDR-based CDF analysis may discriminate all disordered conformations, including molten globules and mixed proteins containing both disordered and ordered regions, from rigid well-folded proteins. Therefore, this discrepancy in the disorder prediction by CDF and CH-plot provides a computational tool to discriminate proteins with extended disorder from potential molten globules and mixed proteins.
Positive and negative Y values in Fig. 2 correspond to proteins predicted within CH-plot analysis to be natively unfolded or compact, respectively. On the other hand, positive and negative X values are attributed to proteins predicted within the CDF analysis to be ordered or intrinsically disordered, respectively. Thus, the resultant quadrants of CDF-CH phase space correspond to the following expectations: Q1, proteins predicted to be disordered by CH-plots, but ordered by CDFs; Q2, ordered proteins; Q3, proteins predicted to be disordered by CDFs, but compact by CH-plots (i.e., putative molten globules or mixed proteins); Q4, proteins predicted to be disordered by both methods (i.e., proteins with extended disorder). Figure 2 shows that ∼50% of the yeast spliceosomal proteins are predicted to be disordered as a whole, with 33% and 13.8% of them being found in quadrants Q4 and Q3, respectively, and are therefore expected to behave as native coils or native pre-molten globules or native molten globules or mixed proteins in their unbound states. The fact that 46.7% of the spliceosomal proteins are expected to be mostly disordered (being located within quadrants Q3 and Q4) is a very important observation since this value noticeably exceeds the corresponding value evaluated for the yeast proteins in general (13.3%) (Mohan et al., 2008).
Combined analysis of intrinsic disorder propensity by several computational tools. It was emphasized that the combined analysis of the intrinsic disorder propensity by several computational tools (especially by tools that utilizes different attributes) provides additional advantages (Ferron et al., 2006;Bourhis, Canard & Longhi, 2007;He et al., 2009), allowing, for example, better visualization of the differences between the various protein groups (Uversky et al., 2006). Figure 3A illustrates the power of this approach and represents a plot where disorder contents in the yeast spliceosomal proteins were evaluated by PONDR-FIT, which is a meta-predictor that provides more accurate disorder content predictions when compared to several other recent disorder predictors (Xue et al., 2010), and PONDR R VLXT (Romero et al., 2001), which is no longer the most accurate predictor, but is very sensitive to the local compositional biases and is capable of identifying potential molecular interaction motifs (Oldfield et al., 2005a;Cheng et al., 2007). In our analysis, we used two arbitrary cutoffs for the levels of intrinsic disorder to classify proteins as highly ordered ([IDP score] < 10%), moderately disordered (30% > [IDP score] > 10%) and highly disordered ([IDP score] > 30%) (Rajagopalan et al., 2011). According to this separation, just 9% of the proteins were predicted to be highly ordered by PONDR-FIT, with 48% and 52% of proteins classified as moderately and highly disordered, respectively (see Fig. 3A). This grouping suggests that most of the proteins in the spliceosome are intrinsically disordered.
Since PONDR-FIT is a metapredictor that includes PONDR R VLXT as one of its components, a linear relationship between the results of these two predictors was expected. Therefore, we used a more complex analysis, where the outputs of three truly independent approaches were compared. Figure 3B represents the results of this analysis and shows the 3D disorder distribution plot, where the outputs of PONDR-FIT, RONN and FoldIndex are used as three dimensions. This representation clearly shows that the outputs of three very different computational tools (see Materials and Methods for the description of these tools) are generally agree with each other, since the points corresponding to the different spliceosomal proteins are mostly located on the diagonal of the FIT-RONN-FoldIndex space.

Figure 3 Combined analysis of intrinsic disorder propensities of the yeast spliceosomal proteins using the outputs of different disorder prediction tools.
A. PONDR-FIT vs. PONDR ® VLXT plot representing the correlation between the disorder content evaluated by PONDR-FIT (y-axis) (Xue et al., 2010) and by PONDR ® VLXT (x-axis) (Romero et al., 2001). Two arbitrary cutoffs for the levels of intrinsic disorder were used to classify proteins as highly ordered ([IDP score] < 10%, blue field), moderately disordered (30% > [IDP score] > 10%, pink field) and highly disordered ([IDP score] > 30%, red field) (Rajagopalan et al., 2011). Color coding of spliceosomal proteins reflects their relation to different components and complexes. B. 3D disorder distribution plot representing the PONDR-FIT vs. RONN vs. FoldIndex dependence.

Functions of IDPs and IDPRs in yeast spliceosome
Distribution of IDPs in different components of the yeast spliceosome. The spliceosome of any organism is a protein-rich molecular machine (Fabrizio et al., 2009). In fact, the major spliceosome contains five uridine-rich small nuclear RNAs (U1, U2, U4, U5, and U6) that are responsible for the catalysis of the pre-mRNA splicing and that are assisted by a wide array of proteins, number of which ranges from ∼100 (in yeast) to more than 200 (in metazoan). Depending on their involvement in the formation of snRNPs, spliceosomal proteins can be grouped into two major categories, proteins associated with snRNPs and non-snRNP spliceosomal proteins. Since the spliceosome is a highly dynamic machine, the number of the spliceosome's protein complement varies substantially from one stage of the splicing cycle to another (Fabrizio et al., 2009). For example, the transition from the complex B to complex C is accompanied not only by the dissociation of U1 and U4 snRNAs from the spliceosme but by the dramatic perturbation in the protein composition, where ∼35 proteins are removed and new 12 spliceosomal proteins are added to the complex (Bessonov et al., 2008;Fabrizio et al., 2009). Figure 4 illustrates compositional changes that take place at the different stages of the spliceosome assembly and action and shows the protein compositions of the yeast B, B act , and C complexes determined by mass spectrometry (Fabrizio et al., 2009). Here, the involved proteins are color coded according to their intrinsic disorder content evaluated by PONDR-FIT, with highly ordered (ID score < 10%), moderately disordered (30% > ID score >10%) and highly disordered proteins (DP score > 30%) being shown as blue, pink and red bars, respectively. Details of this analysis are further summarized in Table 1, which in addition to the major structural properties of the spliceosomal proteins lists their intrinsic disorder scores evaluated by four different disorder predictors.
Application of the α-MoRF predictors reveals that molecular recognition features are highly abundant in yeast spliceosomal proteins, and Table 1 shows that ∼61% spliceosomal proteins contain α-MoRFs. This value is almost 3-fold larger than the corresponding value evaluated for the yeast proteins in general (21.1%) (Mohan et al., 2008). On average, each protein associated with east spliceosome contains 1.75 α-MoRFs, which is noticeably larger than 0.39 α-MoRFs per yeast protein in general (Mohan et al., 2008). Also, on average, each protein in the yeast proteome that was predicted to possess α-MoRFs was shown to have 1.84 molecular recognition features (Mohan et al., 2008). In spliceosome, moderately disordered (30% > [IDP score] > 10%) and highly disordered proteins ([IDP score] > 30%) being shown as blue, pink and red bars, respectively. Gray bars correspond to proteins that are present in the complex B only and are not seeing at subsequent stages; i.e., excluded from complexes B act and C.
MoRF-possessing proteins contain 2.58 α-MoRFs per protein (see Table 1). Importantly, some long, highly disordered spliceosomal proteins have multiple predicted α-MoRF regions ( Table 1) that may potentially serve as binding sites for multiple proteins. For example, Snu66 (687 amino acid residues) has 11 predicted α-MoRFs, whereas there are 7, 6, and 5 predicted α-MoRFs in Prp3 (469 amino acid residues), Spp381 (191 amino acids), and Yju2 (278 residues) respectively. All this suggests that the spliceosomal proteins are extremely enriched in disorder-based binding sites and therefore are involved in extensive interaction networks.
Predictions of potential disorder-based binding sites, AIBSs. In addition to the PONDRbased MoRF identifiers which find disorder-driven binding sites using the peculiarities of predicted disorder propensity distribution within a protein sequence, potential binding sites in disordered regions can be identified by the ANCHOR algorithm (Dosztanyi, Meszaros & Simon, 2009;Meszaros, Simon & Dosztanyi, 2009). In order to predict disordered binding regions, ANCHOR identifies segments (ANCHOR-identified binding sites, AIBSs) that reside in disordered regions, cannot form enough favorable intrachain interactions to fold on their own, and are likely to gain stabilizing energy by interacting with a globular protein partner (Dosztanyi, Meszaros & Simon, 2009;Meszaros, Simon & Dosztanyi, 2009). Therefore, methodologically and logistically, ANCHOR is very different from the MoRF identifiers. Table 1 represents the results of the ANCHOR-based analysis of the yeast spliceosomal proteins and shows AIBSs are very common in these proteins. In fact, of the 109 yeast spliceosomal proteins analyzed in this study 77 contained at least one AIRS. Therefore, AIBSs were found in ∼71% yeast spliceosomal proteins. Analysis data shown in Table 1 shows that there is generally a good agreement between the results of binding sites predictions by MoRF identifiers and ANCHOR. For proteins containing disorder-based binding sites, there are typically more AIBSs than MoRFs. This is an expected result since MoRF identifiers are designed to find disordered regions that fold into α-helices at interaction with the binding partners, whereas ANCHOR is a more general method which is not biased toward any type of the protein secondary structure in the bound state.

Structures and functions of some highly disordered spliceosomal proteins
Spliceosome assembly is a multistep process that involves sequential binding of snRNPs to the pre-mRNA in an order of U1, U2, then U4/U6 and U5 as a preformed tri-snRNP particle. A subsequent conformational rearrangement results in dissociation of U1 and U4, accompanied by new base pair formation between U2 and U6 and between U6 and the 5' splice site, leading to the formation of the active spliceosome on which the catalytic reactions take place (Chen et al., 2001). snRNAs (which are the central structural and functional units of spliceosomal snRNPs) have important roles in recognition and alignment of splice sites mediated through base pair interactions between snRNAs and the intron sequences during spliceosome assembly (Chen et al., 2001). Furthermore, it is believed that snRNAs of these snRNPs act as ribozymes, being responsible for the catalysis of the intron excision (Abelson, 2008;Pyle, 2008;Fabrizio et al., 2009). However, all the steps related to the spliceosome assembly and actions are known to be accompanied by the dramatic rearrangements of the spliceosomal protein composition. This suggests that protein-based interactions are crucial for the spliceosome function.
Pre-mRNA-splicing factor Cwc21 or complexed with Cef1 protein 21 (UniProt ID: Q03375). Cwc21 protein is a part of the U2-type spliceosome complex and its putative role is the stabilization of the catalytic site or the position of RNA substrate during the splicing process. In S. cerevisiae, Cwc21 binds to two key splicing factors, namely, Prp8 and Snu114, and docks directly to U5 snRNP. It was demonstrated that SRm300, the only SR-related protein known to be at the core of human catalytic spliceosomes, is a functional ortholog of Cwc21, which also interacts directly with Prp8 and Snu114 (Grainger et al., 2009). Thus, the function of Cwc21 is likely to be conserved from yeast to humans. Cwc21 also shows affinity for the protein Isy1, a splicing fidelity factor, indicating that, even though it is not an essential protein for the function and formation of the spliceosome (Hogg, McGrail & O'Keefe, 2010), it is required for the correct splicing (Khanna et al., 2009). Cwc21 is a small highly basic protein (pI 9.67, 135 residues), that interacts with Prp8 via SCwid domain (53-97 region) and Snu114 (via C-terminus) (Grainger et al., 2009). Figure 5A and Table 1 show that Cwc21 is predicted to be highly disordered by PONDR-FIT and possesses two α-MoRFs, one of which partially overlaps with the experimentally established Prp8 and Snu114 binding sites.
Pre-mRNA-splicing factor Ntc20 or Prp19-associated complex protein 20 (UniProt ID:P38302) and pre-mRNA-splicing factor Isy1 or Ntc30 (UniProt ID:P21374). The yeast S. cerevisiae Prp19 protein is an essential splicing factor and an important spliceosomal component. It is not tightly associated with small nuclear RNAs (snRNAs) but represents a core of a protein complex (NTC complex) consisting of at least eight proteins. Two of this NTC/Prp19-associated complex, proteins Ntc30 and Ntc20, associate to the spliceosome to mediate conformational rearrangement or to stabilize the structure of the spliceosome after U4 snRNA dissociation, which leads to spliceosome maturation (Ben-Yehuda et al., 2000;Chen et al., 2001;Chen et al., 2002;Chan et al., 2003). Null NTC30 . For all these proteins, the disorder propensity was evaluated by PONDR ® FIT (red curves); PONDR ® VLXT (blue curves); PONDR ® VSL2B (dark green curves); and IUPred (dark cyan curves). Shadow around PONDR ® FIT curves represents distribution of statistical errors. Bold pink lines correspond to the predicted α-MoRFs.
or NTC20 mutants do not show obvious growth phenotype. However, simultaneous deletion of both genes impaired yeast growth resulting in accumulation of precursor mRNA, suggesting that Ntc30 and Ntc20 are auxiliary splicing factors the functions of which may be related to the modulation of the NTC complex function required for stable association of U5 and U6 with the spliceosome after U4 is dissociated (Chen et al., 2001).
Ntc20 is a small acidic protein (pI 5.93, 140 residues), whereas Ntc30 (also known as Isy1) is an average size basic protein (pI 9.35, 235 residues). Ntc20 interacts with Cef1, Clf1, Isy1/Ntc30, Prp46, and Syf1 proteins, which are components of the NTC complex (Ben-Yehuda et al., 2000;Chen et al., 2001). Exact locations of the potential binding sites are known, but Ntc20 was shown to be phosphorylated at position Ser139 (Albuquerque et al., 2008). Ntc30 interacts with Cef1, Cwc2, Clf1, and Syf1 (Dix et al., 1999;Ben-Yehuda et al., 2000;Chen et al., 2001). Both Ntc30 and Ntc20 are predicted to contain significant amount of disorder (see Table 1 and Figs. 5B and 5C). -mRNA-processing protein 45, Prp45 (UniProt ID: P28004). Prp45 is the yeast ortholog of the human Snw1/Skip transcription co-regulator, which regulates transcription elongation and alternative splicing, and was shown to genetically interacts with alleles of the NTC family members Syf1, Clf1/Syf3, Ntc20, and Cef1, and the second step splicing factors Slu7, Prp17, Prp18, and Prp22 (Gahura et al., 2009). Prp45 was suggested to contribute to splicing efficiency of substrates non-conforming to the consensus via its interaction with the second step-proofreading helicase Prp22 (Gahura et al., 2009). The functional equivalency of Prp45 and Skip was verified by the rescue of the Prp45 deleted lethal mutants by the insertion of a functional copy of the Skip gene in yeast (Figueroa & Hayman, 2004). It was shown that Prp45 interacts with Prp46 in vitro, demonstrating that these proteins are spliceosome-associated throughout the splicing process and both are essential for pre-mRNA splicing (Albers et al., 2003). Prp45 is known to be associated with the spliceosome throughout the splicing reactions, until after the second catalytic step (Martinkova et al., 2002;Albers et al., 2003). Prp45 is a basic protein (pI 9.15) that consists of 379 residues. It is predicted to contain significant amount of intrinsic disorder and contain three α-MoRFs (see Table 1 and Fig. 5D).

Pre
66 kDa U4/U6.U5 small nuclear ribonucleoprotein component (UniProt ID: Q12420). The yeast U4/U6.U5 tri-snRNP is a 25S snRNP particle similar in size, composition, and morphology to its counterpart in human cells (Stevens & Abelson, 1999). Stevens and Abelson purified this complex and showed that there are at least 24 proteins stably associated with this particle. In addition to the seven canonical core Sm proteins, there are a set of U6 snRNP specific Sm proteins, eight previously described U4/U6.U5 snRNP proteins, and four novel proteins. Two of the novel proteins have likely RNA binding properties, one has been implicated in the cell cycle, and one has no identifiable sequence homologues or functional motifs. One of the proteins associated with U4/U6.U5 tri-snRNP is Snu66, which is required for pre-mRNA splicing (van Nues & Beggs, 2001) being involved in interactions with the pre-mRNA-splicing helicase Brr2 and the ubiquitin-like modifier Hub1 (van Nues & Beggs, 2001;Wilkinson et al., 2004). Snu66 is a relatively large slightly acidic protein (with pI 6.35) that consists of 587 residues. Figure 5E and Table 1 shows that this protein is predicted to be highly disordered and possesses large number of α-MoRFs, clearly indicating that this disordered protein evolved to be involved in a large number of protein-protein interactions. In agreement with this hypothesis, recent study showed that the N-terminal region of Snu66 contains two Hub1 binding motifs, which are highly similar HIND elements (72% identity) arranged in tandem (Mishra et al., 2011). The crystal structures of Hub1 in complexes with HIND-I (residues 1-31) and HIND-II (32-62) elements of Snu66 were solved (Mishra et al., 2011). Figures 6A and 6B show that both HIND-I and HIND-II elements adopt α-helical structure in the bound form, therefore providing experimental support to the α-MoRF computationally identified in this region.
Pre-mRNA-splicing factor Spp381 (UniProt ID: P38282). Over-expression of Spp381 has been shown to rescue temperature-sensitive mutants of the gene Prp38, which plays an important role is the U4 subunit release from the spliceosome (Lybarger et al., 1999). An over-expressed Spp381 however does not rescue a null Prp38 allele, indicating that these two proteins cooperate but are not interchangeable. Spp381 is believed to interact with both the spliceosome and the RNA to be spliced. Immuno-precipitation experiments showed that, similar to Prp38, Spp381 is present in the U4/U6.U5 tri-snRNPs particle and two-hybrid analyses support the view that the C-terminal half of Spp381 directly interacts with the Prp38 protein (Lybarger et al., 1999). There is also a putative PEST motif within Spp381, which is one of the hallmarks of IDPs that are known to require tight regulation of their intracellular concentrations (Singh et al., 2006). Figure 5G shows that Spp381 (an acidic protein (pI 5.52) consisting of 291 residues) is predicted to be highly disordered and contain 6 potential α-MoRFs.
Pre-mRNA-splicing factor Syf2 (UniProt ID: P53277). This protein is involved in pre-mRNA splicing and cell cycle control. It is another component of the NTC complex (or Prp19-associated complex), associates to the spliceosome to mediate conformational rearrangement and/or to stabilize the structure of the spliceosome after U4 snRNA dissociation, which leads to spliceosome maturation (Russell et al., 2000). Cells with defective Syf2 proteins suffer from cell cycle arrest, possibly due to the inefficient splicing of α-tubulin (Tub1) (Dahan & Kupiec, 2002). Syf2 was shown to interact with other spliceosomal proteins, such as Cef1, Clf1, Ntc20, Prp19, and Syf1. No crystal structure has been determined as of yet for this protein, and Syf2 is known to possess 4 phosphoserines. Syf2 has 215 residues, pI of 9.34, high level of intrinsic disorder and four α-MoRFs (see Table 1 and Fig. 5H).
Pre-mRNA-splicing factor Cwc26 (UniProt ID: P46947). This protein belongs to the pre-mRNA retention and splicing complex (Vincent et al., 2003), RES, a protein complex that is required for efficient splicing, and prevents leakage of unspliced pre-mRNAs from the nucleus (named for pre-mRNA REtention and Splicing) (Dziembowski et al., 2004). In yeast, the complex consists of Ist3p, Bud13p, and Pml1p. It has no posttranslational modification sites and no known crystal structure. It has been shown to interact with the protein Ist3 and Pml1 (Dziembowski et al., 2004). Cwc26 is also known as Bud13 protein, since it may also be involved in positioning the proximal bud pole signal (Zahner, Harkins & Pringle, 1996;Ni & Snyder, 2001;Vincent et al., 2003;Dziembowski et al., 2004). It has 266 residues and is highly basic (pI 9.31). Its N-terminal half is predicted to be very disordered and is expected to contain two α-MoRFs (see Table 1 and Fig. 5I).
Pre-mRNA-splicing factor Slu7 (UniProt ID: Q02775). This is an essential protein which is involved in the second catalytic step of the pre-mRNA splicing, participating in the selection of 3'-type splice sites. This selection could be done via a 3'-splice site-binding factor, Prp16 (Frank & Guthrie, 1992;Ansari & Schwer, 1995;James, Turner & Schwer, 2002). The order of recruitment is believed to be Slu7, Prp18 and then Prp22. All three proteins are released from the spliceosome after step 2 concomitantly with the release of mature mRNA. Slu7 protein contains two functionally important domains: a zinc knuckle ( 122 CRNCGEAGHKEKDC 135 ) and a Prp18-interaction domain ( 215 EIELMKLELY 224 ) (Frank & Guthrie, 1992;Ansari & Schwer, 1995;James, Turner & Schwer, 2002). It has three phosphoserines and does not have a crystal structure determined. Slu7 consists of 382 residues and is characterized by a pI of 8.89. Figure 5J shows that Slu7 is rather disordered and contains a number of α-MoRFs located in its N-terminal half. It is important to emphasize here that two of the predicted α-MoRFs (located at regions 111-128 and 213-230) significantly overlap with the aforementioned functional domains of Slu7 protein.
Protein Cwc16 (UniProt ID : P28320). Similar to Cwc15 discussed above, Cwc16 (also known as Yju2) is a part of the CWC complex. It was shown that splicing factor Yju2 participates in spliceosome assembly, is associated with the components of the Prp19-associated complex (NineTeen Complex [NTC])) and is required for pre-mRNA splicing (Liu et al., 2007). NTC is known to be essential for pre-mRNA splicing, being required for the spliceosome activation by specifying interactions of U5 and U6 with pre-mRNA on the spliceosome after the release of U4. NTC contains at least eight protein components, including two tetratricopeptide repeat (TPR)-containing proteins, Ntc90 and Ntc77 (Chang, Chen & Cheng, 2009). Although Yju2 interacts with the spliceosome at almost the same time as NTC during the spliceosome assembly, these two spliceosome components are not entirely in association with each other (Liu et al., 2007). Furthermore, Yju2 is not required for the NTC binding to the spliceosome or for NTC-mediated spliceosome activation (Liu et al., 2007). However, Yju2 was shown to promote the first catalytic reaction of pre-mRNA splicing after Prp2-mediated structural rearrangement of the spliceosome (Liu et al., 2007). It is believed that Yju2 is recruited to spliceosome by the Ntc90 protein (Chang, Chen & Cheng, 2009). Cwc16/Yju2 is a medium-size, highly basic protein (pI 9.41, 278 residues) that is predicted to be highly disordered and contain five α-MoRFs (see Table 1 and Fig. 5K). Cwc16 is involved in interaction with Syf2 and is predicted to have two nuclear localization signals (NLSs,. Importantly, these NLSs coincide with the two C-terminal α-MoRFs. Pre-mRNA-splicing factor Ntr2 (UniProt ID: P36118). Ntr2 is a part of the NTR complex (NTC-related complex), which is composed of Ntr1, Ntr2 and Prp43. Ntr2 is known to interact with Clf1, Ntr1 and Prp43, and, along with Ntr1, is involved in the pre-mRNA splicing and spliceosome disassembly, promoting the release of excised intron from the spliceosome by acting as a receptor for Prp43, possibly assisted by the Ntr1 protein (Tsai et al., 2005;Boon et al., 2006). This specific Prp43 targeting leads to the disassembly of the spliceosome with the separation of the U2, U5, U6 snRNPs and the NTC complex (Tsai et al., 2005;Boon et al., 2006). Ntr2 has two phosphoserines and no known crystal structure. This is a medium-size acidic protein (pI 5.51, 322 residues) that is predicted to be very disordered and to contain three α-MoRFs (see Table 1 and Fig. 5L).
Nucleolar protein 3 (UniProtID: Q01560). Npl3 contains two RRM (RNA recognition motifs) at the positions 125-195 and 200-275, indicating that it interacts directly with the Poly(A) regions mRNA (Wilson et al., 1994;Burkard & Butler, 2000). It has 5 phosphoserines and Arg/Gly-rich region at position 280-398. Nlp3 can interact with the riboexonuclease Rrp6, which plays a role in 5.8S rRNA 39-end processing and whose defective mutants suppress the growth defect associated with an mRNA polyadenylation defect (Burkard & Butler, 2000). Npl3 consists of 414 residues and has a pI of 5.38. It is predicted to be mostly disordered and is expected to contain five α-MoRFs (see Table 1 and Fig. 5M). Solution structures of two domains containing  have been determined using a novel expressed protein ligation protocol (Skrisovska & Allain, 2008). The resulting structures are shown in Figs. 6C and 5D.
Pre-mRNA-splicing factor Spp2 (UniProt ID: Q02521). Pre-mRNA processing occurs by assembly of splicing factors on the substrate pre-mRNA to form the spliceosome followed by two consecutive RNA cleavage-ligation reactions. The Spp2 protein belongs to the CWC complex (or CEF1-associated complex) and interacts with Prp2 (Silverman et al., 2004). Spp2 is important for the pre-mRNA splicing, playing a role at the final stages of the spliceosome maturation by promoting the first step of splicing (Roy et al., 1995). Although this first reaction is controlled by the Prp2 protein that hydrolyzes ATP, a model was proposed in which Spp2 binds to the spliceosome complex I (composed of mRNA, U1, U2, U4, U5, and U6 smRNPs) in the absence of Prp2p or ATP. This would be followed by Prp2p binding and subsequent ATP hydrolysis leading to the catalytic reaction resulting in the formation of complex II and the release of both proteins from the spliceosome (Roy et al., 1995). The Spp2 protein has one phosphoserine and no known crystal structure. Spp2 is a small moderately basic protein (pI 8.79, 185 residues) that possesses a G-patch domain (residues 100-149) and is predicted to have one α-MoRF and be mostly disordered (see Table 1 and Fig. 5N).
Bud site selection protein 31, Bud31 (UniProt ID: P25337). Bud31 is one of the NTC-related proteins which also a component of the Cef1p sub-complex. Although it is better known for its role in the bud site selection in yeast replication, Bud31 also appears to play a role in the yeast spliceosome through interaction with the protein Cef1, as well as interaction with the precatalytic B complex, and interaction with catalytically active complexes with stably bound U2, U5, and U6 smRNPs (Saha et al., 2012b). Recently, Bud31 was shown to be important for the efficient progression to the first catalytic step and to be required for the second catalytic step in reactions at higher temperatures (Saha et al., 2012b). Bud31 plays a role in both cell cycle transitions and pre-mRNA splicing. It was shown recently that Bud31 promotes transition through the G1-S regulatory point (Start) but is not needed for G2-M transition or for exit from mitosis (Saha et al., 2012a). By analyzing the splicing status of transcripts that encode proteins involved in yeast budding, Bud31 was shown to facilitate the efficient splicing of only some of these pre-mRNAs (Saha et al., 2012a). Bud31 is a small basic protein (pI 9.64, 157 residues) that contains an N-terminally located NLS (residues 2-11), has no posttranslational modification sites and no known crystal structure. This protein is predicted to be moderately disordered and to possess one α-MoRF (see Table 1 and Fig. 5O).
smRNP-associated protein B,SmB (UniProt ID: P40018). SmB protein is also referred to as snRNP-associated protein B, snRNP-B. SmB is involved in pre-mRNA splicing, along with other Sm core proteins: SmB' , SmD1, SmD2, SmD3, SmE, SmF, and SmG. It binds to U1, U2, U4, U5 snRNA, all containing a highly conserved region, referred to as the Sm binding site. It belongs to the SmB and SmN family, and is located in the cell nucleus. Sm core proteins have an important role during the formation of snRNPs. The SmB protein is an important part of the Sm core complex, as it is found in immunoprecipitates of U1, U2, U4, and U5 snRNAs (Camasses et al., 1998). Along with other Sm proteins, SmB contains a common sequence motif, which helps forming the globular core of the spliceosome snRNPs (U1, U2, U5, and U4/U6) (Walke et al., 2001). SmB possesses a nuclear localization signal (NLS) located in the C-terminal half of the protein (region 105-132). When this portion of the sequence is either deleted or mutated, SmB function is lost, suggesting that the C-terminal part of this Sm protein has been evolutionary conserved, and its function determines nuclear localization (Bordonne, 2000). This protein consists of 196 residues, has a pI of 10.37, contains one α-MoRF, and shows high levels of disorder, especially in it C-terminal part (see Table 1 and Fig. 5P). When analyzed by seven disorder predictors, PONDR R FIT, PONDR R VLXT, PONDR R VL3, PONDR R VSL2B, IUPred, Foldindex, and TopIDP, its corresponding levels of disorder are 0. 643,0.648,0.724,0.760,0.571,0.628,and 0.719,respectively. U1 snRNP protein C,Yhc1 (UniProt ID: Q05900). Yhc1 (also known as U1-C protein) is an important component of the spliceosome subcomplex U1 snRNP (Tang et al., 1997), which is composed of the 7 core Sm proteins common to all spliceosomal snRNPs, and at least 10 particle-specific proteins (see Table 1 and Fig. 4), and which is essential for recognition of the pre-mRNA 5' splice-site and the subsequent assembly of the spliceosome (Fabrizio et al., 2009). The major functional role of Yhc1 is the initial 5' splice-site recognition for both constitutive and alternative splicing. Yhc1 interacts with the U1 snRNA and the 5' splice-site region of the pre-mRNA, therefore stimulating the commitment complex formation by stabilizing the base pairing of the 5' end of the U1 snRNA and the 5' splice-site region (Tang et al., 1997;Zhang & Rosbash, 1999). It was shown that Yhc1 can recognize the 5' splice-site in the absence of base-pairing between the pre-mRNA and the U1 snRNA (Du & Rosbash, 2002). Yhc1 is a highly basic protein (pI 10.11) that consists of 231 residues and contains a matrin-type zinc finger domain (residues 4-36). Yhc1 is predicted to be moderately disordered and is expected to contain two α-MoRFs (see Table 1 and Fig. 5Q).
U2 snRNP protein Cus1 (UniProt ID: Q02554). Cus1, also known as cold sensitive U2 snRNA suppressor, is a 436 residues long protein that is required for the U2 snRNP binding to pre-mRNA during spliceosome assembly (Pauling, McPheeters & Ares, 2000). Cus1 is a homologue of the human Sap145 protein that is present in the 17S form of the human U2 snRNP. Yeast Cus1 interacts with U2 snRNA, with Hsh49 via the 82-amino-acid-long region located between positions 229 and 311 and with Hsh155 (Pauling, McPheeters & Ares, 2000). Based on these observations it was proposed that Cus1, Hsh49, and Hsh155 form a stable protein complex which can exchange with a core U2 snRNP and which is necessary for U2 snRNP function in pre-spliceosome assembly (Pauling, McPheeters & Ares, 2000). Although Cus1 is a moderately basic protein (pI 8.67), one of its characteristic features is a highly acidic nature of its C-terminal tail, where nearly half of the last 59 residues are acidic (23 are E or D) (Pauling, McPheeters & Ares, 2000). Both N-terminal and C-terminal tails of Cus1 are predicted to be highly disordered and contain a number of potential disorder-based binding sites (see Table 1 and Fig. 5R).
U5 snRNP protein Lin1 (UniProt ID: P38852). Lin1 is a multifunctional protein involved in several different processes. Compartmentalization of Lin1 with U5 snRNP was inferred from a direct assay (Stevens et al., 2001). Based on its association with the Irr1/Scc3 component of the cohesin complex involved in cohesion and separation of chromosomes during mitosis and its interaction with Prp8, Slx5, Siz2, Wss1, Rfc1, and YIL149w proteins, which are known to participate in mRNA splicing, DNA replication, chromosome condensation, chromatid separation and alternative cohesion, Lin1 was proposed to serve as a functional and physical link among these processes (Bialkowska & Kurlandzka, 2002). Lin1 is an acidic protein (pI 5.01) consisting of 340 residues. Figure 5S show that the N-terminal half of the Lin1 protein is predicted to be very disordered and is expected to have four α-MoRFs (see also Table 1), whereas the C-terminal half is expected to be ordered. The last sixty residues of Lin1 (residues 282-340) correspond to a glycine-tyrosine-phenylalanine (GYF) domain which contains a conserved GP[YF]xxxx[MV]xxWxxx[GN]YF motif which can be involved in the recognition of proline-rich sequences (Freund et al., 1999). Since many proline-rich proteins are IDPs, Lin1 utilizes two different modes of intrinsic disorder-based protein-protein recognition, where it relies on the intrinsic disorder of its N-terminal half to interact with some partners and also uses intrinsic disorder of other partners to interact with ordered C-terminal region.
U4/U6 snRNP protein Prp3 (UniProt ID: Q03338). Prp3 is large moderately basic protein (pI 8.69, 469 residues), which is a component of the yeast U4/U6 snRNP and is also present in the U4/U6.U5 tri-snRNP (Anthony, Weidenhammer & Woolford, 1997). It was shown that Prp3 is necessary for both the formation of stable U4/U6 snRNPs and for the assembly of the U4/U6.U5 tri-snRNP from its component snRNPs. In fact, the Prp3 inactivation diminishes the spliceosome assembly from the pre-spliceosome due to the absence of intact U4/U6.U5 tri-snRNPs (Anthony, Weidenhammer & Woolford, 1997). Homology between the yeast Prp3 protein and the human protein 90K (which is a component of the human U4/U6 snRNPs) represents an illustrative example of the conservation of splicing factors between yeast and metazoans (Anthony, Weidenhammer & Woolford, 1997). Prp3 is predicted to contain significant amount of disorder (especially in its first 350 residues) and is expected to be a promiscuous binder, since it has seven α-MoRFs (see Table 1 and Fig. 5T).
U6 snRNA-associated Sm-like Protein LSm4 (UniProt ID: P40070). Sm-like (LSm) heptameric complex is one of the important spliceosomal components, which exists in two different forms, the nuclear form and the cytoplasmic form, each comprising of different subunits (Reijns, Auchynnikava & Beggs, 2009). The nuclear form, LSm2-8 complex, consists of subunits from LSm2 to LSm8, is closely associated with the U6 snRNP, interacts with the Prp24, and works together with the neighboring proteins to create a functional spliceosome. The cytoplasmic form is the composed of LSm1 to LSm7 and is involved in mRNA turnover and also promotes the mRNA decapping and decay (Spiller et al., 2007). One of the roles of the LSm2-8 complex is to promote the U4/U6 di-snRNP assembly (Reijns, Auchynnikava & Beggs, 2009). It is also involved in the processing and stabilization of ribosomal RNAs and determines the nuclear localization of the U6 snRNP (Spiller et al., 2007). LSm4 is a component of both LSm1-7 and LSm2-8 complexes. Among different functions ascribed to LSm4 are specific binding to the 3'-terminal U-tract of U6 snRNA, participation in processing of pre-tRNAs, pre-rRNAs and U3 snoRNA, and involvement in maturing of the precursor of the RNA component of RNase P (pre-P RNA) (Bouveret et al., 2000;Tharun et al., 2000;Kufel et al., 2002;Kufel et al., 2003;Kufel et al., 2004). LSm4 is a small basic protein (pI 9.45, 187 residues) with highly disordered C-terminal domain that contains one α-MoRF and one phosphoserine at position 181 (Albuquerque et al., 2008) (see Table 1 and Fig. 5U).
Early splicing factor Prp5 (UniProt ID: P21372). Prp5 is a large slightly basic (pI 8.22) ATP-dependent RNA helicase consisting of 850 residues (O'Day, Dalbadie-McFarland & Abelson, 1996). Prp5 is involved in spliceosome assembly, nuclear splicing, and catalysis of the ATP-dependent conformational change of U2 snRNP (Ruby, Chang & Abelson, 1993;Wells & Ares, 1994;O'Day, Dalbadie-McFarland & Abelson, 1996;Abu Dayyeh et al., 2002). It is believed that this protein might be involved in bridging U1 and U2 snRNPs and might promote stable interaction between the U2 snRNP and intron RNA . Prp5 contains a helicase domain (residues 287-661) which is divided in the helicase ATP-binding and helicase C-terminal subdomains (residues 287-467 and 502-661, respectively). There are also several functionally important motifs in Prp5, such as nucleotide binding motif (residues 300-307), coiled-coil (residues 13-81), NLS (residues 90-96), Q motif (residues 255-284) and the DEAD-box motif (residues 415-418). Despite the fact that Prp5 is an enzyme and therefore is expected to be mostly ordered, Table 1 and Fig. 5V shows that this protein is predicted to have significant amount of disorder (mostly located in the first N-terminal 200 residues) and also to possess six α-MoRFs.
CBP protein Cbc2 (UniProt ID: Q08920). Cbc2 is a component of the nuclear cap-binding complex (CBC), which is a heterodimer that co-transcriptionally interacts with the cap of pre-mRNAs and is composed of the Sto1/Cbc1 and Cbc2 proteins. CBC complex is crucial for the efficient pre-mRNA splicing through its participation in the formation of the commitment complex and spliceosome. It is involved in maturation, export and degradation of nuclear mRNAs (Lewis, Gorlich & Mattaj, 1996;Fortes et al., 1999). Cbc2 binds the m7G cap of the RNA and a large CBC subunit Sto1 that interacts with karyopherins, and is believed to be responsible for splicing control during meiosis (Qiu et al., 2012). Cbc2 is an acidic protein (pI 5.02) that is composed of 208 residues and contains RRM domain that is involved in single-stranded RNA binding (residues 46-124) and three mRNA cap-binding regions (residues 118-122, 129-133, and 139-140). Figure 5W shows that Cbc2 is predicted to have long disordered tails and two α-MoRF located within these intrinsically disordered N-and C-termini (see also Table 1).
Msl5 protein (UniProt ID: Q12186). Msl5 is the branch point-bridging protein, which is required for the pre-spliceosome formation, playing a role in the creation of the commitment complex 2 (CC2) where it binds to the snRNP U1-associated protein Prp40, bridging the U1 snRNP-associated 5'-splice site and the Msl5-associated branch point 3' intron splice site (Abovich & Rosbash, 1997;Rutz & Seraphin, 1999). As a part of the CC2 complex, Msl5 is involved in the nuclear retention of pre-mRNA (Rutz & Seraphin, 2000). It interacts with Mud2 and Prp40 (Abovich & Rosbash, 1997;Rutz & Seraphin, 1999), and the proline-rich region of Msl5 (residues 363-474) binds to the GYF domains of Smy2 and Syh1 (Kofler, Motzny & Freund, 2005). Figure 5X shows that the Msl5 region responsible for the interaction with the GYF domains of Smy2 and Syh1 is a part of the long, highly disordered tail. There are two α-MoRFs in this basic (pI 9.72), 476 residue-long protein (see Table 1 and Fig. 5X).

Highly disordered spliceosomal proteins might act as important hubs
Protein-protein interaction networks contain many proteins with only a few links and a few proteins with many links. These highly connected or promiscuous proteins are known as hubs, the binding mechanisms of which can be reasonably explained based on the molecular recognition via disorder-to-order transitions upon binding (Dunker et al., 2005). With respect to timing issues, some proteins have multiple, simultaneous interactions ("party hubs") (Han et al., 2004) while others have multiple sequential interactions ("date hubs") (Han et al., 2004). Perhaps date hubs connect biological modules to each other (Hartwell et al., 1999) while party hubs form scaffolds that enable the assembly of functional modules (Silverman et al., 2004). The overall importance of intrinsic disorder for function of hub proteins was analyzed in several recent bioinformatics publications (Dosztanyi et al., 2006;Ekman et al., 2006;Haynes et al., 2006;Patil & Nakamura, 2006;Singh et al., 2006). Disorder appears to be more clearly associated with date hubs (Ekman et al., 2006;Singh et al., 2006) than with party hubs. However, some protein complexes clearly use long regions of disorder as a scaffold for assembling an interacting group of proteins (Hohenstein & Giles, 2003;Jaffe, Aspenstrom & Hall, 2004;Luo & Lin, 2004;Rui et al., 2004;Wong & Scott, 2004;Jaffe & Hall, 2005;Marinissen & Gutkind, 2005;Salahshor & Woodgett, 2005;Carpousis, 2007).
Due to their malleable nature, IDPs and IDPRs are predisposed to be hubs. In fact, they are commonly involved in one-to-many and in many-to-one binding scenarios. Both of these interaction modes are specific cases of the date hubs, which can bind different proteins, but not at the same time. In the first mechanism, one unfolded segment is used by a protein to interact with multiple unrelated binding partners. In the second mechanism, many unrelated unfolded fragments are used by unrelated proteins to interact with the same partner (Dunker et al., 1998;Oldfield et al., 2008).
To check the set of highly disordered spliceosomal proteins for "hubness", we utilized the STRING database, which acts as a 'one-stop shop' for all information on functional links between proteins (Szklarczyk et al., 2011). Version 9.0 of STRING (accessible at http://string-db.org) covers more than 1100 completely sequenced organisms, including Saccharomyces cerevisiae. Figure 7 represents results of the STRING'ing for the 24 yeast spliceosomal proteins considered in a previous section. Here, the interactome of each of these proteins is shown as an interaction network, where proteins are represented by spheres (note that in each network, the red sphere corresponds to a query protein) and connections between two proteins are shown by lines. The fundamental unit stored in STRING is the "functional association"; i.e., the specific and biologically meaningful functional connection between two proteins (Szklarczyk et al., 2011). These functional associations are based on the seven types of evidence, such as fusion evidence, neighborhood evidence, co-occurrence evidence, experimental evidence, text mining evidence, database evidence, and co-expression evidence (Szklarczyk et al., 2011). These different types of evidence are shown by the lines of different color. It is necessary to emphasize that Fig. 7 is used here with a strictly illustrative purpose; i.e., to show that all of the analyzed spliceosomal proteins are involved in multiple interactions and therefore can be considered as hubs. Since these 24 proteins contain significant amount of predicted disorder and since almost all of them interacts with other spliceosomal proteins many of which are also predicted to be mostly disordered, Fig. 7 suggests that hubness of spliceosomal proteins is related to their intrinsically disordered nature and/or by the intrinsic disorder of their partners.

CONCLUDING REMARKS
In this work we have studied the prevalence of intrinsic disorder in the yeast spliceosome in order to test if this complex ribonucleoprotein machine had an enhanced predisposition for intrinsic disorder in comparison with the average proteome. Our results showed that the prevalence of IDPs/IDPRs in the spliceosome was not significantly different from the averaged disorderedness of the eukaryotic proteins. However, being compared with the behavior of an averaged yeast protein, yeast spliceosomal proteins were noticeably more disordered. For example, 46.7% of the spliceosomal proteins were shown to be mostly disordered, whereas the entire yeast proteome contained significantly smaller amount of such proteins (13.3%). Furthermore, ∼61% spliceosomal proteins were shown to possess α-MoRFs, while there were 21.1% of MoRF-containing proteins in the entire yeast proteome. This suggests that the spliceosomal proteins are often engaged in interactions with their protein and RNA partners via disordered regions. More detailed analysis of the most disordered spliceosomal proteins revealed that they are in fact involved in multiple interactions and therefore can be considered as disordered hubs. ; and X. Msl5. STRING database is the online database resource Search Tool for the Retrieval of Interacting Genes, which provides both experimental and predicted interaction information (Szklarczyk et al., 2011). For each protein, STRING produces the network of predicted associations for a particular group of proteins. The network nodes are proteins. The edges represent the predicted functional associations. An edge may be drawn with up to 7 differently colored lines -these lines represent the existence of the seven types of evidence used in predicting the associations. A red line indicates the presence of fusion evidence; a green line -neighborhood evidence; a blue line -co-occurrence evidence; a purple line -experimental evidence; a yellow line -text mining evidence; a light blue line -database evidence; a black line -co-expression evidence (Szklarczyk et al., 2011). Our findings are in a good agreement with the earlier published results on the peculiarities of intrinsic disorder distribution and functions in known human spliceosomal proteins (Korneta & Bujnicki, 2012). The authors of that study concluded that about half of the residues in the human spliceosomal proteome are expected to be intrinsically disordered. Furthermore, a correlation was found between the type of protein disorder and its function and localization within the spliceosome, with the spliceosomal components involved in earlier stages of the splicing process being more disordered than components acting at the later stages (Korneta & Bujnicki, 2012). This enrichment of early proteins in disorder was proposed to play a significant functional role, since proteins of the components of the spliceosome that act earlier in the process are crucial for the establishing a network of interactions (Korneta & Bujnicki, 2012). In agreement with these conclusions Fig. 4 and Table 1 show that yeast spliceosomal proteins related to the complex B are expected to be more disordered than proteins related to the spliceosomal components engaged at the later stages. Therefore, intrinsic disorder is abundant in the yeast spliceosome and is important to assembly and action of this malleable ribonucleoprotein machine.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported in part by the Programs of the Russian Academy of Sciences for the "Molecular and Cellular Biology" (to VNU). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors: Programs of the Russian Academy of Sciences for the "Molecular and Cellular Biology".

Competing Interests
Vladimir N. Uversky and Bin Xue are Academic Editors for PeerJ. We do not have other competing interests.

Author Contributions
• Maria de Lourdes Coelho Ribeiro and Julio Espinosa performed the experiments, analyzed the data, wrote the paper.
• Bin Xue performed the experiments, analyzed the data, contributed reagents/materials/analysis tools.
• Vladimir N. Uversky conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, wrote the paper.