Evaluation of the abundance of intrinsic disorder in yeast spliceosomal proteins
To test for a correlation between the yeast spliceosomal proteins and intrinsic disorder, a dataset of 109 proteins associated with the yeast spliceosome was extracted from UniProt as described in Materials and Methods. Next, this set of proteins was analyzed using a broad spectrum of computational tools for the evaluation of intrinsic disorder in proteins. Results of this analysis are discussed below.
Analysis of the compositional biases
. Since the amino acid sequences and compositions of IDPs and IDPRs are significantly different from those of ordered proteins and folded domains, a simple analysis of the amino acid composition biases can provide interesting information on the nature of a protein. For example, the amino acid compositions of extended IDPs (i.e., those disordered proteins that do not have almost any residual structure and behave as native coils and native pre-molten globules (Dunker et al., 2001
; Uversky, 2002a
; Uversky, 2002b
; Uversky, 2003
; Uversky & Dunker, 2010
)) are characterized by low mean hydropathy and high mean net charge, which define the highly unstructured and extended state of these proteins, since high net charge leads to strong electrostatic repulsion, and low hydropathy prevents efficient compaction (Uversky, Gillespie & Fink, 2000
). Overall, IDPs/IDPRs are known to be significantly depleted in so-called order-promoting amino acids, C, W, I, Y, F, L, H, V, and N, and substantially enriched in disorder-promoting residues, A, G, R, T, S, K, Q, E, and P (Dunker et al., 2001
; Romero et al., 2001
; Williams et al., 2001
; Radivojac et al., 2007
; Vacic et al., 2007a
). Therefore, the evaluation of the amino acid biases in a set of proteins can be used as a fast and informative way to evaluate their intrinsically disordered nature. This analysis can be done using a computational tool, Composition Profiler (Vacic et al., 2007a
), which is based on the calculation of a normalized composition of a given protein or protein dataset in the (Cx
form, where Cx
is a content of a given residue in a query dataset, and Corder
is the corresponding value for the set of ordered proteins from PDB Select 25 (Berman et al., 2000
Results of this analysis are shown in , which illustrates that, in comparison with typical ordered proteins, yeast spliceosomal proteins are moderately depleted in some order-promoting residues (e.g., C, W, Y, F, H, and V, see orange bars in ) and are moderately enriched in some major disorder-promoting residues (e.g., D, K, Q, S and E). On the other hand, some order-promoting residues (I, L and M) are rather common in these proteins, whereas some disorder-promoting residues (G, A, and P) are clearly underrepresented in yeast spliceosome. Both depletion in major order-promoting residues and enrichment in major disorder-promoting residues suggest that the yeast spliceosomal proteins might contain multiple signatures characteristic for the disordered proteins.
Evaluation of abundance of intrinsic disorder in the yeast spliceosome.
Abundance of long disordered regions in yeast spliceosomal proteins
. Previous study revealed that intrinsic disorder is very abundant in signaling proteins, and this abundance can be evaluated by estimating the fraction of proteins with long disordered regions (Iakoucheva et al., 2002
). In fact, the application of PONDR®
VLXT (Romero et al., 2001
) showed that 66% of cell-signaling proteins contain predicted regions of disorder of 30 residues or longer (Iakoucheva et al., 2002
). Therefore, we applied similar approach and systematically analyzed the intrinsic disorder tendencies in four protein datasets: (1) 109 yeast spliceosomal proteins (spliceosome); (2) 2,329 signaling proteins collected by the Alliance for Cellular Signaling (AfCS); (3) 53,630 eukaryotic proteins from UniProt (EU_UP); and (4) a set of 1,138 non-homologous protein segments with well-defined 3-D structure from the Protein Data Bank Select 25 (O_PDB_S25). illustrates that intrinsic disorder is prevalent in the yeast spliceosomal proteins, being comparable with the prevalence observed for signaling and eukaryotic proteins. In fact, the percentages of proteins with 30 or more consecutive residues predicted to be disordered were 53% for the spliceosomal proteins, 66% for AfCS, 47% for EU_SW, and 13% for O_PDB_S25. In other words, the fraction of yeast spliceosomal proteins with long regions of predicted disorder is 4-fold higher than that of non-homologous ordered proteins from PDB (Iakoucheva et al., 2002
), being also a bit higher than the corresponding fraction in eukaryotic proteins.
Disorder propensity of yeast spliceosomal proteins studied by the binary disorder predictors
. Sequences of the 109 yeast spliceosomal proteins were used to predict whether these proteins are likely to be mostly disordered using two binary predictors of intrinsic disorder: charge-hydropathy plot (CH-plot) (Uversky, Gillespie & Fink, 2000
; Oldfield et al., 2005b
) and cumulative distribution function analysis (CDF) (Oldfield et al., 2005b
). Both these methods perform binary classification of whole proteins as either mostly disordered or mostly ordered, where mostly ordered indicates proteins that contain more ordered residues than disordered residues and mostly disordered indicates proteins that contain more disordered residues than ordered residues (Oldfield et al., 2005b
represents the results of the combined CH-CDF analysis of the spliceosomal proteins and shows that ~50% of these proteins are mostly disordered. In this plot, the coordinates of each spot are calculated as a distance of the corresponding protein in the CH-plot (charge-hydropathy plot) from the boundary (Y-coordinate) and an average distance of the respective cumulative distribution function (CDF) curve from the CDF boundary (X-coordinate) (Mohan et al., 2008
; Xue et al., 2009
; Huang et al., 2012
). The primary difference between these two binary predictors (i.e., predictors which evaluate the predisposition of a given protein to be ordered or disordered as a whole) is that the CH-plot is a linear classifier that takes into account only two parameters of the particular sequence (charge and hydropathy), whereas CDF analysis is dependent on the output of the PONDR®
predictor, a nonlinear classifier, which was trained to distinguish order and disorder based on a significantly larger feature space. According to these methodological differences, CH-plot analysis is predisposed to discriminate proteins with substantial amount of extended disorder (random coils and pre-“molten globules”) from proteins with compact conformations (“molten globule”-like and rigid well-structured proteins). On the other hand, PONDR-based CDF analysis may discriminate all disordered conformations, including molten globules and mixed proteins containing both disordered and ordered regions, from rigid well-folded proteins. Therefore, this discrepancy in the disorder prediction by CDF and CH-plot provides a computational tool to discriminate proteins with extended disorder from potential molten globules and mixed proteins.
CH-CDF analysis of the yeast spliceosomal proteins.
Positive and negative Y values in correspond to proteins predicted within CH-plot analysis to be natively unfolded or compact, respectively. On the other hand, positive and negative X values are attributed to proteins predicted within the CDF analysis to be ordered or intrinsically disordered, respectively. Thus, the resultant quadrants of CDF-CH phase space correspond to the following expectations: Q1, proteins predicted to be disordered by CH-plots, but ordered by CDFs; Q2, ordered proteins; Q3, proteins predicted to be disordered by CDFs, but compact by CH-plots (i.e., putative molten globules or mixed proteins); Q4, proteins predicted to be disordered by both methods (i.e., proteins with extended disorder).
shows that ~50% of the yeast spliceosomal proteins are predicted to be disordered as a whole, with 33% and 13.8% of them being found in quadrants Q4 and Q3, respectively, and are therefore expected to behave as native coils or native pre-molten globules or native molten globules or mixed proteins in their unbound states. The fact that 46.7% of the spliceosomal proteins are expected to be mostly disordered (being located within quadrants Q3 and Q4) is a very important observation since this value noticeably exceeds the corresponding value evaluated for the yeast proteins in general (13.3%) (Mohan et al., 2008
Combined analysis of intrinsic disorder propensity by several computational tools
. It was emphasized that the combined analysis of the intrinsic disorder propensity by several computational tools (especially by tools that utilizes different attributes) provides additional advantages (Ferron et al., 2006
; Bourhis, Canard & Longhi, 2007
; He et al., 2009
), allowing, for example, better visualization of the differences between the various protein groups (Uversky et al., 2006
). illustrates the power of this approach and represents a plot where disorder contents in the yeast spliceosomal proteins were evaluated by PONDR-FIT, which is a meta-predictor that provides more accurate disorder content predictions when compared to several other recent disorder predictors (Xue et al., 2010
), and PONDR®
VLXT (Romero et al., 2001
), which is no longer the most accurate predictor, but is very sensitive to the local compositional biases and is capable of identifying potential molecular interaction motifs (Oldfield et al., 2005a
; Cheng et al., 2007
). In our analysis, we used two arbitrary cutoffs for the levels of intrinsic disorder to classify proteins as highly ordered ([IDP score] < 10%), moderately disordered (30% > [IDP score] > 10%) and highly disordered ([IDP score] > 30%) (Rajagopalan et al., 2011
). According to this separation, just 9% of the proteins were predicted to be highly ordered by PONDR-FIT, with 48% and 52% of proteins classified as moderately and highly disordered, respectively (see ). This grouping suggests that most of the proteins in the spliceosome are intrinsically disordered.
Combined analysis of intrinsic disorder propensities of the yeast spliceosomal proteins using the outputs of different disorder prediction tools.
Since PONDR-FIT is a metapredictor that includes PONDR® VLXT as one of its components, a linear relationship between the results of these two predictors was expected. Therefore, we used a more complex analysis, where the outputs of three truly independent approaches were compared. represents the results of this analysis and shows the 3D disorder distribution plot, where the outputs of PONDR-FIT, RONN and FoldIndex are used as three dimensions. This representation clearly shows that the outputs of three very different computational tools (see Materials and Methods for the description of these tools) are generally agree with each other, since the points corresponding to the different spliceosomal proteins are mostly located on the diagonal of the FIT-RONN-FoldIndex space.
Structures and functions of some highly disordered spliceosomal proteins
Spliceosome assembly is a multistep process that involves sequential binding of snRNPs to the pre-mRNA in an order of U1, U2, then U4/U6 and U5 as a preformed tri-snRNP particle. A subsequent conformational rearrangement results in dissociation of U1 and U4, accompanied by new base pair formation between U2 and U6 and between U6 and the 5’ splice site, leading to the formation of the active spliceosome on which the catalytic reactions take place (Chen et al., 2001
). snRNAs (which are the central structural and functional units of spliceosomal snRNPs) have important roles in recognition and alignment of splice sites mediated through base pair interactions between snRNAs and the intron sequences during spliceosome assembly (Chen et al., 2001
). Furthermore, it is believed that snRNAs of these snRNPs act as ribozymes, being responsible for the catalysis of the intron excision (Abelson, 2008
; Pyle, 2008
; Fabrizio et al., 2009
). However, all the steps related to the spliceosome assembly and actions are known to be accompanied by the dramatic rearrangements of the spliceosomal protein composition. This suggests that protein-based interactions are crucial for the spliceosome function.
From the 109 proteins studies in this work, 24 highly disordered spliceosomal proteins (Cwc21, Ntc20, Isy1/Ntc30, Prp45, Snu66, Cwc15, Spp381, Syf2, Cwc26, Slu7, Yju2/Cwc16, Ntr2, Npl3, Spp2, Bud31, SmB, Yhc1, Cus1, Lin1, Prp3, Lsm4, Prp5, Cbc2, and Msl5) were selected for more focused analysis of their structures, disorder propensities, functions, post-translational modifications, and the presence or lack of 3-D structures solved for the entire proteins or for some of their parts. In addition to the level of predicted intrinsic disorder, these proteins were chosen to represent all the major components of the yeast spliceosome.
Pre-mRNA-splicing factor Cwc21 or complexed with Cef1 protein 21 (UniProt ID: Q03375)
. Cwc21 protein is a part of the U2-type spliceosome complex and its putative role is the stabilization of the catalytic site or the position of RNA substrate during the splicing process. In S. cerevisiae
, Cwc21 binds to two key splicing factors, namely, Prp8 and Snu114, and docks directly to U5 snRNP. It was demonstrated that SRm300, the only SR-related protein known to be at the core of human catalytic spliceosomes, is a functional ortholog of Cwc21, which also interacts directly with Prp8 and Snu114 (Grainger et al., 2009
). Thus, the function of Cwc21 is likely to be conserved from yeast to humans. Cwc21 also shows affinity for the protein Isy1, a splicing fidelity factor, indicating that, even though it is not an essential protein for the function and formation of the spliceosome (Hogg, McGrail & O’Keefe, 2010
), it is required for the correct splicing (Khanna et al., 2009
). Cwc21 is a small highly basic protein (pI 9.67, 135 residues), that interacts with Prp8 via SCwid domain (53-97 region) and Snu114 (via C-terminus) (Grainger et al., 2009
). and show that Cwc21 is predicted to be highly disordered by PONDR-FIT and possesses two α-MoRFs, one of which partially overlaps with the experimentally established Prp8 and Snu114 binding sites.
Analysis of disorder distribution in illustrative spliceosomal proteins.
Pre-mRNA-splicing factor Ntc20 or Prp19-associated complex protein 20 (UniProt ID:P
38302) and pre-mRNA-splicing factor Isy1 or Ntc30 (UniProt ID:P
21374). The yeast S. cerevisiae
Prp19 protein is an essential splicing factor and an important spliceosomal component. It is not tightly associated with small nuclear RNAs (snRNAs) but represents a core of a protein complex (NTC complex) consisting of at least eight proteins. Two of this NTC/Prp19-associated complex, proteins Ntc30 and Ntc20, associate to the spliceosome to mediate conformational rearrangement or to stabilize the structure of the spliceosome after U4 snRNA dissociation, which leads to spliceosome maturation (Ben-Yehuda et al., 2000
; Chen et al., 2001
; Chen et al., 2002
; Chan et al., 2003
). Null NTC30
mutants do not show obvious growth phenotype. However, simultaneous deletion of both genes impaired yeast growth resulting in accumulation of precursor mRNA, suggesting that Ntc30 and Ntc20 are auxiliary splicing factors the functions of which may be related to the modulation of the NTC complex function required for stable association of U5 and U6 with the spliceosome after U4 is dissociated (Chen et al., 2001
Ntc20 is a small acidic protein (pI 5.93, 140 residues), whereas Ntc30 (also known as Isy1) is an average size basic protein (pI 9.35, 235 residues). Ntc20 interacts with Cef1, Clf1, Isy1/Ntc30, Prp46, and Syf1 proteins, which are components of the NTC complex (Ben-Yehuda et al., 2000
; Chen et al., 2001
). Exact locations of the potential binding sites are known, but Ntc20 was shown to be phosphorylated at position Ser139 (Albuquerque et al., 2008
). Ntc30 interacts with Cef1, Cwc2, Clf1, and Syf1 (Dix et al., 1999
; Ben-Yehuda et al., 2000
; Chen et al., 2001
). Both Ntc30 and Ntc20 are predicted to contain significant amount of disorder (see and ).
Pre-mRNA-processing protein 45, Prp45 (UniProt ID: P28004)
. Prp45 is the yeast ortholog of the human Snw1/Skip transcription co-regulator, which regulates transcription elongation and alternative splicing, and was shown to genetically interacts with alleles of the NTC family members Syf1, Clf1/Syf3, Ntc20, and Cef1, and the second step splicing factors Slu7, Prp17, Prp18, and Prp22 (Gahura et al., 2009
). Prp45 was suggested to contribute to splicing efficiency of substrates non-conforming to the consensus via its interaction with the second step-proofreading helicase Prp22 (Gahura et al., 2009
). The functional equivalency of Prp45 and Skip was verified by the rescue of the Prp45 deleted lethal mutants by the insertion of a functional copy of the Skip gene in yeast (Figueroa & Hayman, 2004
). It was shown that Prp45 interacts with Prp46 in vitro, demonstrating that these proteins are spliceosome-associated throughout the splicing process and both are essential for pre-mRNA splicing (Albers et al., 2003
). Prp45 is known to be associated with the spliceosome throughout the splicing reactions, until after the second catalytic step (Martinkova et al., 2002
; Albers et al., 2003
). Prp45 is a basic protein (pI 9.15) that consists of 379 residues. It is predicted to contain significant amount of intrinsic disorder and contain three α-MoRFs (see and ).
66 kDa U4/U6.U5 small nuclear ribonucleoprotein component (UniProt ID: Q12420)
. The yeast U4/U6.U5 tri-snRNP is a 25S snRNP particle similar in size, composition, and morphology to its counterpart in human cells (Stevens & Abelson, 1999
). Stevens and Abelson purified this complex and showed that there are at least 24 proteins stably associated with this particle. In addition to the seven canonical core Sm proteins, there are a set of U6 snRNP specific Sm proteins, eight previously described U4/U6.U5 snRNP proteins, and four novel proteins. Two of the novel proteins have likely RNA binding properties, one has been implicated in the cell cycle, and one has no identifiable sequence homologues or functional motifs. One of the proteins associated with U4/U6.U5 tri-snRNP is Snu66, which is required for pre-mRNA splicing (van Nues & Beggs, 2001
) being involved in interactions with the pre-mRNA-splicing helicase Brr2 and the ubiquitin-like modifier Hub1 (van Nues & Beggs, 2001
; Wilkinson et al., 2004
). Snu66 is a relatively large slightly acidic protein (with pI 6.35) that consists of 587 residues. and shows that this protein is predicted to be highly disordered and possesses large number of α-MoRFs, clearly indicating that this disordered protein evolved to be involved in a large number of protein–protein interactions. In agreement with this hypothesis, recent study showed that the N-terminal region of Snu66 contains two Hub1 binding motifs, which are highly similar HIND elements (72% identity) arranged in tandem (Mishra et al., 2011
). The crystal structures of Hub1 in complexes with HIND-I (residues 1-31) and HIND-II (32-62) elements of Snu66 were solved (Mishra et al., 2011
). show that both HIND-I and HIND-II elements adopt α-helical structure in the bound form, therefore providing experimental support to the α-MoRF computationally identified in this region.
3D-structures of fragments and domains of two highly disordered spliceosomal proteins, Snu66 (plots A and B) and Npl13 (plots C and D).
Pre-mRNA-splicing factor Cwc15 (UniProt ID: Q03772)
. Cwc15 belongs to the CWC complex (or Cef1-associated complex), which is a spliceosome sub-complex similar to the late-stage spliceosome composed of the U2, U5 and U6 snRNAs and a set of at least 43 spliceosomal proteins, such as Bud13, Brr2, Cdc40, Cef1, Clf1, Cus1, Cwc2, Cwc15, Cwc21, Cwc22, Cwc23, Cwc24, Cwc25, Cwc27, Ecm2, Hsh155, Ist3, Isy1, Lea1, Msl1, Ntc20, Prp8, Prp9, Prp11, Prp19, Prp21, Prp22, Prp45, Prp46, Slu7, Smb1, Smd1, Smd2, Smd3, Smx2, Smx3, Snt309, Snu114, Spp2, Syf1, Syf2, Rse1, and Yju2. Although the exact function of Cwc15 is still poorly understood, previous studies revealed that this protein positively contributes to Cdc5p/Cef1p function (Ohi et al., 2002
), suggesting that Cwc15 is potentially associated with the U2 snRNP. Cwc15 is a small highly basic protein (pI 9.06, 175 residues) which is predicted to be highly disordered and contain two α-MoRFs, further strengthening its potential role in protein–protein interactions (see and ).
Pre-mRNA-splicing factor Spp381 (UniProt ID: P38282)
. Over-expression of Spp381 has been shown to rescue temperature-sensitive mutants of the gene Prp38, which plays an important role is the U4 subunit release from the spliceosome (Lybarger et al., 1999
). An over-expressed Spp381 however does not rescue a null Prp38 allele, indicating that these two proteins cooperate but are not interchangeable. Spp381 is believed to interact with both the spliceosome and the RNA to be spliced. Immuno-precipitation experiments showed that, similar to Prp38, Spp381 is present in the U4/U6.U5 tri-snRNPs particle and two-hybrid analyses support the view that the C-terminal half of Spp381 directly interacts with the Prp38 protein (Lybarger et al., 1999
). There is also a putative PEST motif within Spp381, which is one of the hallmarks of IDPs that are known to require tight regulation of their intracellular concentrations (Singh et al., 2006
). shows that Spp381 (an acidic protein (pI 5.52) consisting of 291 residues) is predicted to be highly disordered and contain 6 potential α-MoRFs.
Pre-mRNA-splicing factor Syf2 (UniProt ID: P53277)
. This protein is involved in pre-mRNA splicing and cell cycle control. It is another component of the NTC complex (or Prp19-associated complex), associates to the spliceosome to mediate conformational rearrangement and/or to stabilize the structure of the spliceosome after U4 snRNA dissociation, which leads to spliceosome maturation (Russell et al., 2000
). Cells with defective Syf2 proteins suffer from cell cycle arrest, possibly due to the inefficient splicing of α-tubulin (Tub1) (Dahan & Kupiec, 2002
). Syf2 was shown to interact with other spliceosomal proteins, such as Cef1, Clf1, Ntc20, Prp19, and Syf1. No crystal structure has been determined as of yet for this protein, and Syf2 is known to possess 4 phosphoserines. Syf2 has 215 residues, pI of 9.34, high level of intrinsic disorder and four α-MoRFs (see and ).
Pre-mRNA-splicing factor Cwc26 (UniProt ID: P46947)
. This protein belongs to the pre-mRNA retention and splicing complex (Vincent et al., 2003
), RES, a protein complex that is required for efficient splicing, and prevents leakage of unspliced pre-mRNAs from the nucleus (named for pre-mRNA REtention and Splicing) (Dziembowski et al., 2004
). In yeast, the complex consists of Ist3p, Bud13p, and Pml1p. It has no posttranslational modification sites and no known crystal structure. It has been shown to interact with the protein Ist3 and Pml1 (Dziembowski et al., 2004
). Cwc26 is also known as Bud13 protein, since it may also be involved in positioning the proximal bud pole signal (Zahner, Harkins & Pringle, 1996
; Ni & Snyder, 2001
; Vincent et al., 2003
; Dziembowski et al., 2004
). It has 266 residues and is highly basic (pI 9.31). Its N-terminal half is predicted to be very disordered and is expected to contain two α-MoRFs (see and ).
Pre-mRNA-splicing factor Slu7 (UniProt ID: Q02775)
. This is an essential protein which is involved in the second catalytic step of the pre-mRNA splicing, participating in the selection of 3’-type splice sites. This selection could be done via a 3’-splice site-binding factor, Prp16 (Frank & Guthrie, 1992
; Ansari & Schwer, 1995
; James, Turner & Schwer, 2002
). The order of recruitment is believed to be Slu7, Prp18 and then Prp22. All three proteins are released from the spliceosome after step 2 concomitantly with the release of mature mRNA. Slu7 protein contains two functionally important domains: a zinc knuckle (122
) and a Prp18-interaction domain (215
) (Frank & Guthrie, 1992
; Ansari & Schwer, 1995
; James, Turner & Schwer, 2002
). It has three phosphoserines and does not have a crystal structure determined. Slu7 consists of 382 residues and is characterized by a pI of 8.89. shows that Slu7 is rather disordered and contains a number of α-MoRFs located in its N-terminal half. It is important to emphasize here that two of the predicted α-MoRFs (located at regions 111-128 and 213-230) significantly overlap with the aforementioned functional domains of Slu7 protein.
Protein Cwc16 (UniProt ID : P28320)
. Similar to Cwc15 discussed above, Cwc16 (also known as Yju2) is a part of the CWC complex. It was shown that splicing factor Yju2 participates in spliceosome assembly, is associated with the components of the Prp19-associated complex (NineTeen Complex [NTC])) and is required for pre-mRNA splicing (Liu et al., 2007
). NTC is known to be essential for pre-mRNA splicing, being required for the spliceosome activation by specifying interactions of U5 and U6 with pre-mRNA on the spliceosome after the release of U4. NTC contains at least eight protein components, including two tetratricopeptide repeat (TPR)-containing proteins, Ntc90 and Ntc77 (Chang, Chen & Cheng, 2009
). Although Yju2 interacts with the spliceosome at almost the same time as NTC during the spliceosome assembly, these two spliceosome components are not entirely in association with each other (Liu et al., 2007
). Furthermore, Yju2 is not required for the NTC binding to the spliceosome or for NTC-mediated spliceosome activation (Liu et al., 2007
). However, Yju2 was shown to promote the first catalytic reaction of pre-mRNA splicing after Prp2-mediated structural rearrangement of the spliceosome (Liu et al., 2007
). It is believed that Yju2 is recruited to spliceosome by the Ntc90 protein (Chang, Chen & Cheng, 2009
). Cwc16/Yju2 is a medium-size, highly basic protein (pI 9.41, 278 residues) that is predicted to be highly disordered and contain five α-MoRFs (see and ). Cwc16 is involved in interaction with Syf2 and is predicted to have two nuclear localization signals (NLSs, residues 242-258 and 260-278). Importantly, these NLSs coincide with the two C-terminal α-MoRFs.
Pre-mRNA-splicing factor Ntr2 (UniProt ID: P36118)
. Ntr2 is a part of the NTR complex (NTC-related complex), which is composed of Ntr1, Ntr2 and Prp43. Ntr2 is known to interact with Clf1, Ntr1 and Prp43, and, along with Ntr1, is involved in the pre-mRNA splicing and spliceosome disassembly, promoting the release of excised intron from the spliceosome by acting as a receptor for Prp43, possibly assisted by the Ntr1 protein (Tsai et al., 2005
; Boon et al., 2006
). This specific Prp43 targeting leads to the disassembly of the spliceosome with the separation of the U2, U5, U6 snRNPs and the NTC complex (Tsai et al., 2005
; Boon et al., 2006
). Ntr2 has two phosphoserines and no known crystal structure. This is a medium-size acidic protein (pI 5.51, 322 residues) that is predicted to be very disordered and to contain three α-MoRFs (see and ).
Nucleolar protein 3 (UniProtID: Q01560)
. Npl3 contains two RRM (RNA recognition motifs) at the positions 125-195 and 200-275, indicating that it interacts directly with the Poly(A) regions mRNA (Wilson et al., 1994
; Burkard & Butler, 2000
). It has 5 phosphoserines and Arg/Gly-rich region at position 280-398. Nlp3 can interact with the riboexonuclease Rrp6, which plays a role in 5.8S rRNA 39-end processing and whose defective mutants suppress the growth defect associated with an mRNA polyadenylation defect (Burkard & Butler, 2000
). Npl3 consists of 414 residues and has a pI of 5.38. It is predicted to be mostly disordered and is expected to contain five α-MoRFs (see and ). Solution structures of two domains containing RRMs (residues 114-201 and 193-282) have been determined using a novel expressed protein ligation protocol (Skrisovska & Allain, 2008
). The resulting structures are shown in and .
Pre-mRNA-splicing factor Spp2 (UniProt ID: Q02521)
. Pre-mRNA processing occurs by assembly of splicing factors on the substrate pre-mRNA to form the spliceosome followed by two consecutive RNA cleavage-ligation reactions. The Spp2 protein belongs to the CWC complex (or CEF1-associated complex) and interacts with Prp2 (Silverman et al., 2004
). Spp2 is important for the pre-mRNA splicing, playing a role at the final stages of the spliceosome maturation by promoting the first step of splicing (Roy et al., 1995
). Although this first reaction is controlled by the Prp2 protein that hydrolyzes ATP, a model was proposed in which Spp2 binds to the spliceosome complex I (composed of mRNA, U1, U2, U4, U5, and U6 smRNPs) in the absence of Prp2p or ATP. This would be followed by Prp2p binding and subsequent ATP hydrolysis leading to the catalytic reaction resulting in the formation of complex II and the release of both proteins from the spliceosome (Roy et al., 1995
). The Spp2 protein has one phosphoserine and no known crystal structure. Spp2 is a small moderately basic protein (pI 8.79, 185 residues) that possesses a G-patch domain (residues 100-149) and is predicted to have one α-MoRF and be mostly disordered (see and ).
Bud site selection protein 31, Bud31 (UniProt ID: P25337)
. Bud31 is one of the NTC-related proteins which also a component of the Cef1p sub-complex. Although it is better known for its role in the bud site selection in yeast replication, Bud31 also appears to play a role in the yeast spliceosome through interaction with the protein Cef1, as well as interaction with the precatalytic B complex, and interaction with catalytically active complexes with stably bound U2, U5, and U6 smRNPs (Saha et al., 2012b
). Recently, Bud31 was shown to be important for the efficient progression to the first catalytic step and to be required for the second catalytic step in reactions at higher temperatures (Saha et al., 2012b
). Bud31 plays a role in both cell cycle transitions and pre-mRNA splicing. It was shown recently that Bud31 promotes transition through the G1-S regulatory point (Start) but is not needed for G2-M transition or for exit from mitosis (Saha et al., 2012a
). By analyzing the splicing status of transcripts that encode proteins involved in yeast budding, Bud31 was shown to facilitate the efficient splicing of only some of these pre-mRNAs (Saha et al., 2012a
). Bud31 is a small basic protein (pI 9.64, 157 residues) that contains an N-terminally located NLS (residues 2-11), has no posttranslational modification sites and no known crystal structure. This protein is predicted to be moderately disordered and to possess one α-MoRF (see and ).
smRNP-associated protein B, SmB (UniProt ID: P40018)
. SmB protein is also referred to as snRNP-associated protein B, snRNP-B. SmB is involved in pre-mRNA splicing, along with other Sm core proteins: SmB’, SmD1, SmD2, SmD3, SmE, SmF, and SmG. It binds to U1, U2, U4, U5 snRNA, all containing a highly conserved region, referred to as the Sm binding site. It belongs to the SmB and SmN family, and is located in the cell nucleus. Sm core proteins have an important role during the formation of snRNPs. The SmB protein is an important part of the Sm core complex, as it is found in immunoprecipitates of U1, U2, U4, and U5 snRNAs (Camasses et al., 1998
). Along with other Sm proteins, SmB contains a common sequence motif, which helps forming the globular core of the spliceosome snRNPs (U1, U2, U5, and U4/U6) (Walke et al., 2001
). SmB possesses a nuclear localization signal (NLS) located in the C-terminal half of the protein (region 105-132). When this portion of the sequence is either deleted or mutated, SmB function is lost, suggesting that the C-terminal part of this Sm protein has been evolutionary conserved, and its function determines nuclear localization (Bordonne, 2000
). This protein consists of 196 residues, has a pI of 10.37, contains one α-MoRF, and shows high levels of disorder, especially in it C-terminal part (see and ). When analyzed by seven disorder predictors, PONDR®
VSL2B, IUPred, Foldindex, and TopIDP, its corresponding levels of disorder are 0.643, 0.648, 0.724, 0.760, 0.571, 0.628, and 0.719, respectively.
U1 snRNP protein C, Yhc1 (UniProt ID: Q05900)
. Yhc1 (also known as U1-C protein) is an important component of the spliceosome subcomplex U1 snRNP (Tang et al., 1997
), which is composed of the 7 core Sm proteins common to all spliceosomal snRNPs, and at least 10 particle-specific proteins (see and ), and which is essential for recognition of the pre-mRNA 5’ splice-site and the subsequent assembly of the spliceosome (Fabrizio et al., 2009
). The major functional role of Yhc1 is the initial 5’ splice-site recognition for both constitutive and alternative splicing. Yhc1 interacts with the U1 snRNA and the 5’ splice-site region of the pre-mRNA, therefore stimulating the commitment complex formation by stabilizing the base pairing of the 5’ end of the U1 snRNA and the 5’ splice-site region (Tang et al., 1997
; Zhang & Rosbash, 1999
). It was shown that Yhc1 can recognize the 5’ splice-site in the absence of base-pairing between the pre-mRNA and the U1 snRNA (Du & Rosbash, 2002
). Yhc1 is a highly basic protein (pI 10.11) that consists of 231 residues and contains a matrin-type zinc finger domain (residues 4-36). Yhc1 is predicted to be moderately disordered and is expected to contain two α-MoRFs (see and ).
U2 snRNP protein Cus1 (UniProt ID: Q02554)
. Cus1, also known as cold sensitive U2 snRNA suppressor, is a 436 residues long protein that is required for the U2 snRNP binding to pre-mRNA during spliceosome assembly (Pauling, McPheeters & Ares, 2000
). Cus1 is a homologue of the human Sap145 protein that is present in the 17S form of the human U2 snRNP. Yeast Cus1 interacts with U2 snRNA, with Hsh49 via the 82-amino-acid-long region located between positions 229 and 311 and with Hsh155 (Pauling, McPheeters & Ares, 2000
). Based on these observations it was proposed that Cus1, Hsh49, and Hsh155 form a stable protein complex which can exchange with a core U2 snRNP and which is necessary for U2 snRNP function in pre-spliceosome assembly (Pauling, McPheeters & Ares, 2000
). Although Cus1 is a moderately basic protein (pI 8.67), one of its characteristic features is a highly acidic nature of its C-terminal tail, where nearly half of the last 59 residues are acidic (23 are E or D) (Pauling, McPheeters & Ares, 2000
). Both N-terminal and C-terminal tails of Cus1 are predicted to be highly disordered and contain a number of potential disorder-based binding sites (see and ).
U5 snRNP protein Lin1 (UniProt ID: P38852)
. Lin1 is a multifunctional protein involved in several different processes. Compartmentalization of Lin1 with U5 snRNP was inferred from a direct assay (Stevens et al., 2001
). Based on its association with the Irr1/Scc3 component of the cohesin complex involved in cohesion and separation of chromosomes during mitosis and its interaction with Prp8, Slx5, Siz2, Wss1, Rfc1, and YIL149w proteins, which are known to participate in mRNA splicing, DNA replication, chromosome condensation, chromatid separation and alternative cohesion, Lin1 was proposed to serve as a functional and physical link among these processes (Bialkowska & Kurlandzka, 2002
). Lin1 is an acidic protein (pI 5.01) consisting of 340 residues. show that the N-terminal half of the Lin1 protein is predicted to be very disordered and is expected to have four α-MoRFs (see also ), whereas the C-terminal half is expected to be ordered. The last sixty residues of Lin1 (residues 282-340) correspond to a glycine-tyrosine-phenylalanine (GYF) domain which contains a conserved GP[YF]xxxx[MV]xxWxxx[GN]YF motif which can be involved in the recognition of proline-rich sequences (Freund et al., 1999
). Since many proline-rich proteins are IDPs, Lin1 utilizes two different modes of intrinsic disorder-based protein–protein recognition, where it relies on the intrinsic disorder of its N-terminal half to interact with some partners and also uses intrinsic disorder of other partners to interact with ordered C-terminal region.
U4/U6 snRNP protein Prp3 (UniProt ID: Q03338)
. Prp3 is large moderately basic protein (pI 8.69, 469 residues), which is a component of the yeast U4/U6 snRNP and is also present in the U4/U6.U5 tri-snRNP (Anthony, Weidenhammer & Woolford, 1997
). It was shown that Prp3 is necessary for both the formation of stable U4/U6 snRNPs and for the assembly of the U4/U6.U5 tri-snRNP from its component snRNPs. In fact, the Prp3 inactivation diminishes the spliceosome assembly from the pre-spliceosome due to the absence of intact U4/U6.U5 tri-snRNPs (Anthony, Weidenhammer & Woolford, 1997
). Homology between the yeast Prp3 protein and the human protein 90K (which is a component of the human U4/U6 snRNPs) represents an illustrative example of the conservation of splicing factors between yeast and metazoans (Anthony, Weidenhammer & Woolford, 1997
). Prp3 is predicted to contain significant amount of disorder (especially in its first 350 residues) and is expected to be a promiscuous binder, since it has seven α-MoRFs (see and ).
U6 snRNA-associated Sm-like Protein LSm4 (UniProt ID: P40070)
. Sm-like (LSm) heptameric complex is one of the important spliceosomal components, which exists in two different forms, the nuclear form and the cytoplasmic form, each comprising of different subunits (Reijns, Auchynnikava & Beggs, 2009
). The nuclear form, LSm2-8 complex, consists of subunits from LSm2 to LSm8, is closely associated with the U6 snRNP, interacts with the Prp24, and works together with the neighboring proteins to create a functional spliceosome. The cytoplasmic form is the composed of LSm1 to LSm7 and is involved in mRNA turnover and also promotes the mRNA decapping and decay (Spiller et al., 2007
). One of the roles of the LSm2-8 complex is to promote the U4/U6 di-snRNP assembly (Reijns, Auchynnikava & Beggs, 2009
). It is also involved in the processing and stabilization of ribosomal RNAs and determines the nuclear localization of the U6 snRNP (Spiller et al., 2007
). LSm4 is a component of both LSm1-7 and LSm2-8 complexes. Among different functions ascribed to LSm4 are specific binding to the 3’-terminal U-tract of U6 snRNA, participation in processing of pre-tRNAs, pre-rRNAs and U3 snoRNA, and involvement in maturing of the precursor of the RNA component of RNase P (pre-P RNA) (Bouveret et al., 2000
; Tharun et al., 2000
; Kufel et al., 2002
; Kufel et al., 2003
; Kufel et al., 2004
). LSm4 is a small basic protein (pI 9.45, 187 residues) with highly disordered C-terminal domain that contains one α-MoRF and one phosphoserine at position 181 (Albuquerque et al., 2008
) (see and ).
Early splicing factor Prp5 (UniProt ID: P21372)
. Prp5 is a large slightly basic (pI 8.22) ATP-dependent RNA helicase consisting of 850 residues (O’Day, Dalbadie-McFarland & Abelson, 1996
). Prp5 is involved in spliceosome assembly, nuclear splicing, and catalysis of the ATP-dependent conformational change of U2 snRNP (Ruby, Chang & Abelson, 1993
; Wells & Ares, 1994
; O’Day, Dalbadie-McFarland & Abelson, 1996
; Abu Dayyeh et al., 2002
). It is believed that this protein might be involved in bridging U1 and U2 snRNPs and might promote stable interaction between the U2 snRNP and intron RNA (Xu et al., 2004
). Prp5 contains a helicase domain (residues 287-661) which is divided in the helicase ATP-binding and helicase C-terminal subdomains (residues 287-467 and 502-661, respectively). There are also several functionally important motifs in Prp5, such as nucleotide binding motif (residues 300-307), coiled-coil (residues 13-81), NLS (residues 90-96), Q motif (residues 255-284) and the DEAD-box motif (residues 415-418). Despite the fact that Prp5 is an enzyme and therefore is expected to be mostly ordered, and shows that this protein is predicted to have significant amount of disorder (mostly located in the first N-terminal 200 residues) and also to possess six α-MoRFs.
CBP protein Cbc2 (UniProt ID: Q08920)
. Cbc2 is a component of the nuclear cap-binding complex (CBC), which is a heterodimer that co-transcriptionally interacts with the cap of pre-mRNAs and is composed of the Sto1/Cbc1 and Cbc2 proteins. CBC complex is crucial for the efficient pre-mRNA splicing through its participation in the formation of the commitment complex and spliceosome. It is involved in maturation, export and degradation of nuclear mRNAs (Lewis, Gorlich & Mattaj, 1996
; Fortes et al., 1999
). Cbc2 binds the m7G cap of the RNA and a large CBC subunit Sto1 that interacts with karyopherins, and is believed to be responsible for splicing control during meiosis (Qiu et al., 2012
). Cbc2 is an acidic protein (pI 5.02) that is composed of 208 residues and contains RRM domain that is involved in single-stranded RNA binding (residues 46-124) and three mRNA cap-binding regions (residues 118-122, 129-133, and 139-140). shows that Cbc2 is predicted to have long disordered tails and two α-MoRF located within these intrinsically disordered N- and C-termini (see also ).
Msl5 protein (UniProt ID: Q12186)
. Msl5 is the branch point-bridging protein, which is required for the pre-spliceosome formation, playing a role in the creation of the commitment complex 2 (CC2) where it binds to the snRNP U1-associated protein Prp40, bridging the U1 snRNP-associated 5’-splice site and the Msl5-associated branch point 3’ intron splice site (Abovich & Rosbash, 1997
; Rutz & Seraphin, 1999
). As a part of the CC2 complex, Msl5 is involved in the nuclear retention of pre-mRNA (Rutz & Seraphin, 2000
). It interacts with Mud2 and Prp40 (Abovich & Rosbash, 1997
; Rutz & Seraphin, 1999
), and the proline-rich region of Msl5 (residues 363-474) binds to the GYF domains of Smy2 and Syh1 (Kofler, Motzny & Freund, 2005
). shows that the Msl5 region responsible for the interaction with the GYF domains of Smy2 and Syh1 is a part of the long, highly disordered tail. There are two α-MoRFs in this basic (pI 9.72), 476 residue-long protein (see and ).