In eukaryotic cells and certain viruses that infect them, the coding sequences (exons) of most protein-coding genes are interrupted by noncoding regions (introns). Following the transcription of an entire gene into a precursor messenger RNA (pre-mRNA), the introns are excised and the exons are spliced together to form a functional mRNA. The splicing reaction is catalyzed by a large macromolecular ribonucleoprotein (RNP) machine termed the spliceosome. The most common form of the spliceosome is composed primarily of five small nuclear RNA (snRNA) molecules: U1, U2, U4, U5 and U6, and 45 proteins, arranged into snRNP particles. Seven mutually related Sm proteins are common to all spliceosomal snRNP apart from the U6, which contains a set of related “like-Sm” (Lsm) proteins
[1]. The Sm or Lsm proteins form a ring structure that acts as a platform to support the snRNA
[2]. Apart from Sm and Lsm heptamers, all other proteins in the human snRNP subunits are unique (review:
[3]).
Apart from the snRNP proteins, approximately 80 proteins are abundant in the human spliceosome and reported to be essential to the process of spliceosome-dependent splicing
[4], while results of proteomics analyses
[4]–
[7] yield up to over 200 proteins
in toto. Non-snRNP splicing factors are divided into independent protein splicing factors and proteins that combine into multiprotein complexes auxiliary to the spliceosome: the hPrp19/CDC5L (NTC) complex, the exon-junction complex (EJC), the cap-binding complex (CBP), the retention-and-splicing complex (RES), and the transcription-export complex (TREX). Spliceosomal proteins are richly phosphorylated, as well as undergo other types of post-translational modifications (review:
[8]).
A rare class of introns exists (<1% of all introns in human) that are excised by the so-called minor spliceosome
[9]. This low-abundance spliceosome variant contains a U5 snRNP identical to the one from the major spliceosome and four snRNPs with snRNAs U11, U12, U4atac, and U6atac snRNAs that are distinct from, but structurally and functionally analogous to, U1, U2, U4, and U6 snRNAs, respectively. Some proteins specific to the minor spliceosome have been found
[10].
The primary activity of the spliceosome, i.e. the excision of introns and ligation of exons, requires the correct working of several additional functionalities of the spliceosomal machinery: recognition of the 5′ and 3′ splice sites (intron/exon definition), mutual recognition of spliceosome subunits and correct spliceosome assembly, spliceosome remodeling and regulation (review:
[11]). In the course of the splicing reaction, the snRNP subunits combine and detach from one another and from the pre-mRNA, forming in turn the so-called E (entry), A, B, B* (B-activated), and C complexes. For the major spliceosome, the U1 and the U2 snRNPs perform the initial scanning of the pre-mRNA for intron sites, while the actual two-step splicing reaction occurs after the addition of a U4/U6.U5 tri-snRNP entity and the elimination of the U1 and U4 snRNPs from the complex, at the assembled interface of the pre-mRNA substrate and U2, U5, and U6 snRNAs (complex C). For the minor spliceosome, the U11/U12 di-snRNP performs the role of the U1 and U2 snRNPs, while the U4atac/U6atac di-snRNP performs the role of the U4/U6 di-snRNP (review:
[12]). The early recognition and assembly of the splicing reaction (E/A complex formation) rely on the use of multiple weak binary interactions to ensure flexibility. On the other hand, later stages of the splicing reaction (B, B-act, C complexes) involve enzymatic catalysis
[11]. Each of the stages of the splicing reaction has its own set of associated non-snRNP proteins
[4].
Splicing has been associated with intrinsic protein disorder
[13]. Intrinsically disordered regions (IDRs) lack stable, well-defined three-dimensional structure (review:
[14]). IDRs frequently contain low-complexity regions and repeats, although they may also contain conserved linear motifs embedded in the less conserved regions (ELMs;
[15]). IDRs are not necessarily completely unfolded. In particular, some IDRs may contain stable preformed secondary structure elements in isolation
[16], while others may switch from disorder to order (i.e. exhibit “dual personality”) depending on the environment, for instance upon binding to other proteins
[17],
[18].
As they lack tertiary structure under many or all conditions, IDRs are more flexible and plastic than the rigid structures of globular domains. Disorder may increase the speed of intermolecular binding and unbinding and make interactions weaker
[14]. As a result of these properties, IDRs are found in a variety of molecular functions, which include forming linkers between structured domains, being sites of post-translational modifications, and sites of protein-protein and protein-RNA recognition
[19]. The large interaction capacity of IDRs predisposes them to organizing the assembly of complexes; disorder is a characteristic feature of “hub” proteins that interact with many partners, and, notably for spliceosome research, disordered proteins are common in large complexes
[20]. Among RNP complexes, the ribosome in particular illustrates an RNA-related structural function for disordered proteins. Many ribosomal proteins contain long disordered extensions attached to ordered globular bodies
[21] that, upon the formation of the ribosome complex, become ordered and penetrate into the macromolecule core formed by the rRNA
[22],
[23]. In other words, the long disordered extensions become the “mortar” of the macromolecule that fills in gaps in the rRNA and stabilizes it.
The subject of intrinsic disorder of the spliceosome has not yet been systematically analyzed for the entirety of the spliceosomal proteome. As an essential step towards broadening our understanding of the functioning of the spliceosome, we have carried out a bioinformatics analysis of intrinsic disorder within the human spliceosomal proteome. We discovered that almost half of the residues within the human spliceosomal proteins are disordered, and that the distribution of intrinsic disorder is uneven across the spliceosome. The spliceosome is divided into three layers: a rigid inner core that performs the precise operations required to effect splicing catalysis, a middle layer of disorder that acquires structure in spliceosome-bound proteins, and a fluid outer layer of disordered regions that do not acquire structure and that are responsible for the establishment of a matrix of weak interactions in the initial stages of the splicing process.