PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Protein Cell. Author manuscript; available in PMC 2014 January 15.
Published in final edited form as:
PMCID: PMC3893067
NIHMSID: NIHMS492393

Spliceosomal genes in the D. discoideum genome: a comparison with those in H. sapiens, D. melanogaster, A. thaliana and S. cerevisiae

Abstract

Little is known about pre-mRNA splicing in Dictyostelium discoideum although its genome has been completely sequenced. Our analysis suggests that pre-mRNA splicing plays an important role in D. discoideum gene expression as two thirds of its genes contain at least one intron. Ongoing curation of the genome to date has revealed 40 genes in D. discoideum with clear evidence of alternative splicing, supporting the existence of alternative splicing in this unicellular organism. We identified 160 candidate U2-type spliceosomal proteins and related factors in D. discoideum based on 264 known human genes involved in splicing. Spliceosomal small ribonucleoproteins (snRNPs), PRP19 complex proteins and late-acting proteins are highly conserved in D. discoideum and throughout the metazoa. In non-snRNP and hnRNP families, D. discoideum orthologs are closer to those in A. thaliana, D. melanogaster and H. sapiens than to their counterparts in S. cerevisiae. Several splicing regulators, including SR proteins and CUG-binding proteins, were found in D. discoideum, but not in yeast. Our comprehensive catalog of spliceosomal proteins provides useful information for future studies of splicing in D. discoideum where the efficient genetic and biochemical manipulation will also further our general understanding of pre-mRNA splicing.

Keywords: pre-mRNA splicing, spliceosomal genes, Dictyostelium discoideum, comparative genomics, splicing regulators

INTRODUCTION

The amoeboid protozoan Dictyostelium discoideum is a eukaryotic model organism that has been extensively used in studying signal transduction, cell motility and cell differentiation. It occupies a unique phylogenetic position and belongs to the group of mycetozoans that branches out after plants but before metazoans and fungi (Baldauf et al., 2000). Little is known about the RNA processing machinery in D. discoideum.

Pre-mRNA splicing is the process that removes intervening sequences (introns) from the nascent pre-mRNA transcripts to form functional mRNAs. This process is a critical step in eukaryotic gene expression and occurs in the multi-component macromolecular machine named the spliceosome (e.g., Calarco. et al., 2011; Hoskins et al., 2011; Ramani. et al., 2011 and references within). This large RNA-protein complex contains, in addition to the pre-mRNA substrate, several uridine-rich small nuclear ribonucleoprotein (snRNP) particles as well as a number of associated proteins. To process the majority of introns (the major class, also called the U2-type introns), the spliceosome contains U1, U2, U4/6 and U5 snRNPs. The splicing of the minor class of introns (also called the U12-type) occurs in the spliceosome containing U11 and U12 in addition to U4atac, U6atac and U5 snRNPs (for review, see (Patel and Steitz, 2003; Will Lührmann, 2005)). Biochemical and molecular studies have revealed major components of the splicing machinery, especially the U2-type spliceosome.

The completion of the D. discoideum genome (Eichinger et al., 2005) provides an opportunity for us to systematically examine pre-mRNA splicing and the splicing machinery in this model organism. We queried the D. discoideum genome available at dictyBase (http://dictybase.org; (Chisholm et al., 2006)) to determine the presence of introns in the coding sequences of the primary protein sequence set at dictyBase. The analysis revealed that among 13,527 predicted and known protein-coding genes in D. discoideum, 9232 (68%) contain at least one intron. This indicates that pre-mRNA splicing plays an important role in the expression of a majority of D. discoideum genes. Furthermore, in our comparison of genomic and expressed sequence tag (EST) sequences, we found that a number of D. discoideum genes undergo alternative pre-mRNA splicing, suggesting that alternative splicing regulation may play a role in the biology of this unicellular organism.

To identify genes encoding D. discoideum spliceosomal components, we searched dictyBase using sequences of spliceosomal proteins present in Homo sapiens (human). Our search criteria for D. discoideum orthologs included sequence similarity, reciprocal matches, the presence of the relevant domain(s), manual review and independent phylogenetic analysis. In general, we found that spliceosomal proteins and related factors in D. discoideum have higher similarity to those in the plant (Arabidopsis thaliana), fly (Drosophila melanogaster) and human (Homo sapiens) genomes than to their yeast (Saccharomyces cerevisiae) orthologs.

RESULTS AND DISCUSSION

D. discoideum, human, fly, plant and yeast genomes and their splicing features

D. discoideum has a genome size of 34 Mb, which is smaller than the human (2851 Mb), fly (180 Mb) and plant (157 Mb) genomes, but about 2.6 times the size of the yeast genome (13 Mb). The D. discoideum genome contains 13,527 predicted protein-coding genes, which is similar to those in fly (13,676) and plant (13,029), but significantly higher than those in yeast (5538) (Eichinger et al., 2005). We queried the D. discoideum genome sequence at dictyBase and found that in D. discoideum, 9210 (68%) contain at least one intron. It is known that 77% of fly genes (Crosby et al., 2007) and only 5% of yeast genes contain intron(s). The mean numbers of introns in spliced genes are 1.0, 1.9, 4.0 and 8.1 in yeast, D. discoideum, fly and human, respectively (Eichinger et al., 2005). We examined gene models and genomic sequences in comparison with ESTs and cDNA sequences. To date, this has led to the identification of 40 genes that have clear evidence of alternative splicing (Table 1). This is contrary to the previous belief that no regulated alternative splicing exists in any unicellular organism (Barbosa-Morais et al., 2006).

Table 1
Alternatively spliced genes in D. discoideum

Based on published studies, we inspected protein sequences that have been reported as spliceosomal proteins or proteins with experimental evidence for their roles in pre-mRNA splicing. A collection of 264 human sequences for spliceosome associated proteins (Hartmuth et al., 2002; Zhou et al., 2002; Wu et al., 2004; Collins and Penny, 2005; Barbosa-Morais et al., 2006; Matlin and Moore, 2007; Bessonov et al., 2008) were retrieved from the RefSeq database and used to query the dictyBase database. Figure 1 shows a flow chart for our general search procedure. As a result, the vast majority of non-redundant homologs to human spliceosomal proteins and related factors (154) were identified in the D. discoideum genome. Furthermore, we identified several putative homologs by second-pass individual analyses (see METHODS). This increased the total number of putative spliceosomal proteins to 160. It demonstrates that 61% (160/264) of the human spliceosomal proteins have predicted orthologs in D. discoideum. The D. discoideum spliceosomal proteins and related factors are described below in several groups: the snRNP proteins, non-snRNP proteins, hnRNP and associated proteins, and alternative splicing regulators (Tables 25). “No hit” in Tables 25 indicates that the identified D. discoideum spliceosomal proteins did not hit their corresponding orthologs in fly, plant and yeast in the RefSeq database using our search criteria. In some cases, the fly, plant or yeast orthologs do exist, but are not identified using D. discoideum proteins because of the sequence divergence between D. discoideum and fly, plant or yeast.

Figure 1
A schematic diagram of the search strategies used
Table 2
Spliceosomal snRNP proteins
Table 5
Additional proteins involved in alternative splicing regulation

Spliceosomal snRNP genes of D. discoideum are highly similar to their human orthologs

The snRNP proteins are further classified into Sm/Lsm core proteins, U1, U2, U5, U4/U6-specific proteins and tri-snRNP specific proteins (Table 2). Among snRNP proteins, all orthologs to 49 human proteins were identified in D. discoideum, plant, and fly genomes, but only 43 yeast orthologs were available (Table 2). Spliceosomal snRNP proteins in D. discoideum are highly conserved, and similar to those in higher eukaryotes. The Sm/LSm core proteins in the D. discoideum genome have almost one-to-one correspondences to their human counterparts. Such close relationship is illustrated by LSm6 (LSM6) and LSm7 (LSM7) in the phylogenetic tree (Fig. 2A).

Figure 2
Phylogenetic analysis of the orthologs of spliceosomal genes in D. discoideum, H. sapiens, D. melanogaster, A. thaliana and S. cerevisiae genomes

When there are two or more closely related proteins in human, D. discoideum often has fewer, or just one ortholog. For example, searches with SmB/B’ (SNRPB) or SmN (SNRPN) led to the same hit, DDB0233178 in D. discoideum. Similarly, only one gene with sequence similarity to both SNRPB and SNRPN has been identified in the D. melanogaster, A. thaliana and S. cerevisiae genomes. This relationship has been demonstrated in the phylogenetic analysis (Fig. 2B). These orthologs in the fly, plant, D. discoideum and yeast genomes are arranged in the expected evolutionary position.

There is only one gene (DDB0233135) identified in D. discoideum with significant sequence similarity to both human U1-specific protein A (U1A, SNRPA) and U2-specific protein B2 (U2B”, SNRPB2). This finding is similar to what is found in the fly and plant. Interestingly, this D. discoideum ortholog only identifies U2B” (Msl1p), but not U1A (Mud1p) in yeast. It is possible that other U1A like genes exist in D. discoideum, but with more divergent sequences. Three proteins, SmE (SNRPE), the U5-specific 52 kDa (CD2BP2) and tri-snRNP 27 kDa (SNRNP27), were identified by second-pass blast and domain analyses (see METHODS). The small (92 amino acids) SNRPE homolog (DDB0302415) was not present in D. discoideum gene predictions but was identified using the human SNRPE protein sequence and the tBlastn program. The putative Dictyostelium CD2BP2 homo-log (DDB0233538) contains a > 50 polyasparagine stretch. Homopolymers, especially polyglutamine and polyasparagine stretches are abundant in D. discoideum (Eichinger et al., 2005). These proteins can be classified by customized blast searches (see METHODS). The possible SNRNP27 homolog (DDB0238807) in D. discoideum, which is present in fly but absent in yeast, was also identified using the tBlastn program. This revealed a gene where the original gene structure was incorrect. After curation at dictyBase, the predicted homolog aligns well with the human SNRNP27 at the C-terminus, where both proteins contain a DUF1777 domain whose function remains unclear. The D. discoideum protein is almost twice as long as its human counterpart (296 versus 155 amino acids). However, this difference in length occurs in the repetitive arginine-rich N-terminal sequences that both proteins share to a different degree.

Non-snRNP proteins associated with spliceosomal assembly and splicing

Non-snRNP proteins associated with spliceosomal assembly and pre-mRNA splicing are classified into several groups: SR and SR-related proteins, PRP19 complex proteins, catalytic step II and late-acting proteins, exon junction complex (EJC) proteins and other splicing factors. We searched the D. discoideum proteome using corresponding human proteins with the same criteria as described above. The D. discoideum non-snRNP proteins are more similar to A. thaliana, D. melanogaster and H. sapiens orthologs than are those of S. cerevisiae. In non-snRNP spliceosomal proteins, the majority of the human proteins have orthologs in D. discoideum, A. thaliana and D. melanogaster but not in S. cerevisiae. For the convenience of description, we list them in groups as shown in Table 3.

Table 3
Non-snRNP spliceosomal proteins

SR and SR-related proteins are characterized by two structural motifs, RNA recognition motif (RRM) of RNP type and RS domain containing arginine-serine rich sequences. Sixteen members of SR and SR-related proteins have been identified in human. These proteins play important roles in both constitutive splicing and alternative splicing regulation (reviewed in (Blencowe, 2000; Black, 2003; Wu et al., 2004; Sanford et al., 2005; Lin and Fu, 2007; Matlin and Moore, 2007)). Interestingly, three distinct SR protein orthologs with RRMs and an RS domain were identified in the D. discoideum genome. These proteins are DDB0233327, DDB0233352 and DDB0233351, corresponding to human 9G8 (SFRS7), Tra2-beta (SFRS10) and Tra2-alpha (TRA2A), respectively (Table 3A). Several classical SR proteins in mammals do not have orthologs in the D. discoideum genome, including SC35 and ASF/SF2 (Table S1). On the other hand, some SR protein genes in D. discoideum seem to have expanded in numbers. For example, two genes were identified as possible homologs of human SRp75 (SFRS4): DDB0233308 and DDB0233309. In such cases, only those with the highest level of sequence homology were included in Table 3A. It is also interesting to note that the RS domains in D. discoideum SR proteins appear to be more enriched in the RDR/RDRS motif rather than in the typical RS/SR sequences found in mammalian SR proteins. For example, in DDB0233308 and DDB0233327, there are long stretches of RDR/RDRS peptides, whose functional significance remains to be investigated.

All seven well-documented PRP19 complex associated proteins have orthologs in the Dictyostelium, human, fly, plant and yeast genomes (Table 3B). Several proteins known to act during the late stage of spliceosomal assembly and splicing were also found to be highly conserved in Dictyostelium, human, fly, plant and yeast, including Prp22 (DHX8), Prp43 (DHX15), Prp16 (DHX38), Slu7 (SLU7), Prp17 (CDC40) and Prp18 (PRPF18) (Table 3C).

The EJC assembly is a splicing-dependent process and serves to mark the RNA for downstream processing steps such as export, translation and nonsense-mediated decay (Tange et al., 2004; Lejeune and Maquat, 2005). The conservation of EJC proteins is high in the D. discoideum genome (Table 3D). Five EJC proteins corresponding to human SRRM1, BAT1, RNPS1, RBM8A and MAGOH are found in D. discoideum, whereas yeast has only one ortholog to human BAT1. This suggests that RNA processing could be more complex in D. discoideum than in yeast, although further experimental data are required for this generalization.

We identified a number of other spliceosomal proteins in D. discoideum that contain various motifs present in known splicing factors, including DExD, cyclophilins, WD40s, cap binding proteins, polyadenylation machinery proteins, zinc finger motif and other uncharacterized motifs (Table 3E and 3F). DExD/H containing proteins play important roles in pre-mRNA splicing (Staley and Guthrie, 1998; Cordin et al., 2006). It is interesting to note that almost all of the human spliceosomal proteins with the DExD/H motif have D. discoideum orthologs. Cyclophilins catalyze cis-trans propyl bond isomerization and facilitate protein conformational changes. All five orthologs of human splicing-related cyclophilins are present in D. discoideum. Searching S. cerevisiae with D. discoideum proteins identified two positive hits (NP_013633 and NP_013317; Table 3E).

hnRNP and related proteins

Forty-six heterogeneous nuclear ribonucleoproteins (hnRNPs) and other H complex associated proteins in the human genome were used to query dictyBase. Of these 46 sequences, we identified 9 non-redundant orthologs in D. discoideum (Table 4). The human hnRNP L (HNRNPL) hits the D. discoideum gene (DDB0233648) and both human hnRNP R (HNRNPR) and hnRNP Q (SYNCRIP) hit one D. discoideum protein (DDB0214833). None of these three hnRNPs has the yeast orthologs. This relationship was confirmed in the phylogenetic analysis (Fig. 2C). In the heat shock proteins, the human query sequences (HSPA1A and HSPA8) identified two groups of the orthologs in the fly genome, but only one cluster of the orthologs from A. thaliana, D. discoideum and S. cerevisiae (Fig. 2D). These clusters are not specific to either HSPA1A or HSPA8 and different from the one-to-one relationship as found in snRNPs (Fig. 2A and 2B). When these 9 non-redundant proteins were used to search the RefSeq database, all of them corresponded to the initial 16 human query sequences. The search with these putative D. discoideum hnRNP and related proteins also led to the identification of 15, 16 and 7 non-redundant proteins in the fly, plant and yeast genomes, respectively. In comparison with their yeast counterparts, Dictyostelium hnRNP protein orthologs are again more similar to those in the fly, plant and human genomes.

Table 4
hnRNP proteins associated with the spliceosome

Alternative splicing regulators

Alternative splicing is a powerful mechanism for generating genetic diversity (More and Silver, 2008; Nilsen and Graveley, 2010).

Several groups of alternative splicing regulators have been reported in mammalian and fly genomes. These include hnRNP proteins, the SR protein super-family (SR proteins and SR-related proteins), CUGBP and ETR-like factors (CELF), DExD/H box containing proteins, RNA-binding proteins containing the heterogeneous nuclear ribonucleoprotein K-type homology (KH) or RRM domains, and other RNA binding proteins. A number of proteins are involved in both spliceosomal assembly and alternative splicing regulation.

Alternative splicing regulators of the hnRNP protein family often bind to exonic or intronic splicing regulatory sequences and influence splice site selection (reviewed in (Blencowe, 2000; Black, 2003; Wu et al., 2004; Sanford et al., 2005; Lin and Fu, 2007; Matlin and Moore, 2007)). HnRNP protein orthologs have been described in the previous section. SR proteins play important roles in both constitutive and alternative splicing (Blencowe, 2000; Cartegni et al., 2002; Wu et al., 2004; Sanford et al., 2005; Lin and Fu, 2007). Both hnRNP and SR protein orthologs have also been described in previous sections (Table 3A).

DExD/H box-containing proteins and other RNA-binding proteins also play a role in alternative splicing regulation (e.g., Wu et al., 2006; Fushimi et al., 2008; Kar et al., 2011 and references within). Two orthologs of DExD/H box containing regulators, p68 (DDX5) and p72 (DDX17), were found in D. discoideum (Table 3E).

The CELF family of splicing regulators interacts with CUG-containing splicing regulatory elements and control alternative splicing of a number of genes (Ladd et al., 2001). RNA transcripts containing expanded CUG/CCUG repeats can bind and sequester CUG-binding proteins and cause aberrant splicing (Ebralidze et al., 2004). Altered expression of CUG-binding proteins has been associated with myotonic dystrophy ((Kanadia et al., 2003) and reviewed in (Wang and Cooper, 2007)). Two putative CELF family members were identified in D. discoideum (DDB0233674 and DDB0233675), which correspond to six human CELF family members, CELF1–6 (Table 5). Three fly proteins (NP_788039, NP_609559 and NP_723739) and three plant proteins (NP_171845, NP_567249 and NP_973752) are similar to the two D. discoideum proteins, which are related to the above six human CELF family members (Table 5). These CELF orthologs are similar to those heat shock proteins and do not have one-to-one relationships to the human CELF proteins. No CELF proteins were found in the yeast genome.

Our sequence analyses of genomic and EST databases strongly support earlier findings (Grant and Tsang, 1990; Bain et al., 1991; Greenwood and Tsang, 1991; Escalante et al., 2003) that D. discoideum has bona fide alternative splicing. To date, we have examined nearly all 13,527 genes individually and compared them with the available EST and cDNAs. This led to the identification of 40 genes that clearly show alternative splicing isoforms (Table 1). With only 50% of the 13,527 estimated genes in D. discoideum having at least some EST coverage, the actual number of alternatively spliced genes may be much higher than the 40 genes in this study. These results strongly suggest that alternative splicing could be important in the biology of this unicellular model organism. Consistent with this notion, a number of alternative splicing regulators have been identified by our sequence searches. Interestingly, all of the major families of alternative splicing regulators reported in mammals and D. melanogaster have been identified in D. discoideum. These include the SR protein super-family, CELF family, hnRNP protein family, DExD box containing proteins and other RNA binding proteins (see individual descriptions in the sections above). SR proteins are among the earliest acting proteins in spliceosome assembly. These proteins can interact with the exonic splicing regulatory elements and are related to the increased protein complexity. The CUG-binding proteins play a role in RNA processing and can regulate alternative splicing of different transcripts (Ladd et al., 2001). The expanded CUG/CCUG-containing transcripts can bind and sequester CUG-binding proteins and cause aberrant splicing (Wang and Cooper, 2007). Altered expression of CUG-binding proteins has been associated with myotonic dystrophy (Kanadia et al., 2003; Wang and Cooper, 2007). The presence of alternatively spliced genes and splicing regulators in the D. discoideum genome provides opportunities for studying alternative splicing in this simple model organism.

Spliceosomal snRNAs

D. discoideum snRNA genes were identified using a motif-search algorithm written in the Perl program (see METHODS section). There are five genes coding for U1 snRNAs, seven for U2 snRNAs, three for U4 snRNAs, two for U5 snRNAs, and one for U6 snRNA. Searches for U11, U12, U4atac and U6atac did not reveal convincing homologs with significant sequence similarity (data not shown), suggesting that D. discoideum may not have the U12 type minor class of spliceosomes. Our results of D. discoideum spliceosomal snRNAs are similar to the findings published by Aspegren and colleagues (Aspegren et al., 2004; Hinas et al., 2006). Taken together, it suggests that our approach can be applied in different genomes for the identification of snRNA genes.

In this study we identified 160 candidate spliceosomal proteins in the model organism D. discoideum. 68% of the predicted and known protein-coding genes in D. discoideum contain one or more introns and these genes have to undergo pre-mRNA splicing to generate functional mRNA transcripts. Therefore, pre-mRNA splicing is critical for gene expression in D. discoideum. In addition to all spliceosomal snRNAs (U1, U2, U4, U5 and U6), we identified 100 non-redundant sequences in the D. discoideum genome that are likely functional homologs of human non-snRNP spliceosomal proteins. D. discoideum can be used as a model system for studying the spliceosome and its components. The identification of this comprehensive set of spliceosomal proteins in D. discoideum should facilitate studies of pre-mRNA splicing in this model system.

The entire set of spliceosomal snRNP core proteins, the PRP19 complex proteins and late-acting splicing proteins are very highly conserved in yeast, Dictyostelium, plant, fly and human. Such widespread conservation suggests that these proteins play critical roles in fundamental process of pre-mRNA splicing. D. discoideum branches from the metazoan lineage before yeast. Our analyses show that many metazoan splicing factors that are missing in yeast are present in D. discoideum, indicating that these splicing-associated proteins are more ancient than previously thought. Further study will shed light on the early evolution of the metazoan splicing machinery.

Mutations in several spliceosomal protein genes, PPRC3, PRPF8 and PRPF31, cause human retinal degeneration (reviewed in (Pacione et al., 2003; Mordes et al., 2006)). It is interesting to note that all these disease-associated spliceosomal proteins are conserved in D. discoideum. Our comprehensive catalog of Dictyostelium discoideum spliceosomal proteins and related factors presented here will be useful for future experiments to elucidate splicing mechanisms and the underlying molecular pathways leading to human disease.

METHODS

Human spliceosomal proteins were collected from published studies (Hartmuth et al., 2002; Zhou et al., 2002; Wu et al., 2004; Barbosa-Morais et al., 2006; Matlin and Moore, 2007; Bessonov et al., 2008) and were used as the primary source to query dictyBase (http://www.dictybase.org/; (Chisholm et al., 2006)) using the BLASTp BLOSUM62 matrix with SEG filter (for filtering low-complexity subsequences) (Altschul et al., 1990). The D. discoideum protein primary features database and an E-value < 10−6 was used. The local alignments between the human and D. discoideum genes were manually reviewed to identify the structural motif regions present in the human spliceosomal proteins. Peptide sequences with only regional sequence homology but without the known motif(s) characteristic of the corresponding splicing proteins were excluded. Finally, a reciprocal BLAST search was performed using the identified D. discoideum hits as queries to search the RefSeq database (http://www.ncbi.nlm.nih.gov/RefSeq/). If a putative D. discoideum sequence matched the corresponding spliceosomal related gene in the human, fly, plant and yeast proteomes with an E-value < 10−5, it was accepted as the D. discoideum ortholog.

When we did not identify orthologs with the above described method, dictyBase curators performed individual tblastn and blastp searches at dictyBase combined with domain analyses. As a general rule, blastp results with [gt-or-equal, slanted] 25% identity over [gt-or-equal, slanted] 70% overall length were considered as orthologs. This second-pass approach often identified those orthologs whose automatic gene prediction has been either incorrect or absent, which resulted in the correctly annotated genes not being present in the dictyBase primary sequence dataset. These genes were then added manually and are now publicly available. In some cases, similarity was masked by highly repetitive sequences in the D. discoideum gene. These are common in D. discoideum. In this case, blast searches and domain analyses were performed with partial deletions of repetitive strings comparing results with those obtained with the full-length protein. Phylogenetic analysis was performed to identify and confirm the ortholog proteins in different species. The protein sequences in each group were aligned using Clustal W version 2.0 (Larkin et al., 2007), to generate a character matrix in NEXUS file format. Phylogenetic analysis was then performed again on each of the protein alignments with MrBayes (Ronquist and Huelsenbeck, 2003), using Markov chain Monte Carlo to approximate the posterior probabilities of each tree. The .con file generated from MrBayes includes two consensus trees, which have been used to generate a graphical representation in the program TreeView (Page, 2002).

To identify spliceosomal snRNAs in the D. discoideum genome, a motif-search algorithm was applied, which was specially designed for this task and written in Perl language. In order to get evolutionarily extra-conservative short sequence segments (motifs) within snRNAs, known snRNA genes were compared among human, fly and plant (A. thaliana) using the ClustalW program. For every spliceosomal snRNA, sequence motifs were identified that contain nucleotide sequences identical among the three species studied. These motifs and the observed distances between them were used as input for writing our Perl program that was used to scan the D. discoideum genome. Additional 10%–20% sequence variations were permitted within motifs and 10%–20% length variations in distances between the motifs because some of the D. discoideum snRNA genes are very divergent from their animal and plant orthologs. The secondary structures of the predicted D. discoideum genes were examined using the M-fold program.

Alternatively spliced genes are discovered as an ongoing effort at dictyBase where each gene model is individually inspected and compared with all available EST and cDNA data.

Acknowledgments

We thank members of Wu lab for helpful suggestions and critical reading of the manuscript. This work was supported by grants to J.Y. W from NIH (EY014576 and GM070967), to A.F. from NSF Career award MCB-0643542 and to R.L.C. from NIH (GM64426 and HG02273).

ABBREVATIONS

CELF
CUG binding protein and ETR-like factors
EJC
exon junction complex
hnRNP
heterogeneous nuclear ribonucleoprotein
RRM
RNA recognition motif
snRNA
uridine-rich small ribonucleic acid
snRNP
small ribonucleoprotein
SR protein
arginine-serine rich protein

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [PubMed]
  • Aspegren A, Hinas A, Larsson P, Larsson A, Söderbom F. Novel non-coding RNAs in Dictyostelium discoideum and their expression during development. Nucleic Acids Res. 2004;32:4646–4656. [PMC free article] [PubMed]
  • Bain G, Grant CE, Tsang A. Isolation and characterization of cDNA clones encoding polypeptides related to a Dictyostelium discoideum cyclic AMP binding protein. J Gen Microbiol. 1991;137:501–508. [PubMed]
  • Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF. A kingdom-level phylogeny of eukaryotes based on combined protein data. Science. 2000;290:972–977. [PubMed]
  • Barbosa-Morais NL, Carmo-Fonseca M, Aparício S. Systematic genome-wide annotation of spliceosomal proteins reveals differential gene family expansion. Genome Res. 2006;16:66–77. [PubMed]
  • Bessonov S, Anokhina M, Will CL, Urlaub H, Lührmann R. Isolation of an active step I spliceosome and composition of its RNP core. Nature. 2008;452:846–850. [PubMed]
  • Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2003;72:291–336. [PubMed]
  • Blencowe BJ. Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci. 2000;25:106–110. [PubMed]
  • Calarco JA, Zhen M, Blencowe BJ. Networking in a global world: Establishing functional connections between neural splicing regulators and their target transcripts. RNA. 2011;17:775–791. [PubMed]
  • Cartegni L, Chew SL, Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. 2002;3:285–298. [PubMed]
  • Chisholm RL, Gaudet P, Just EM, Pilcher KE, Fey P, Merchant SN, Kibbe WA. dictyBase, the model organism database for Dictyostelium discoideum. Nucleic Acids Res. 2006;34:D423–D427. [PMC free article] [PubMed]
  • Collins L, Penny D. Complex spliceosomal organization ancestral to extant eukaryotes. Mol Biol Evol. 2005;22:1053–1066. [PubMed]
  • Cordin O, Banroques J, Tanner NK, Linder P. The DEAD-box protein family of RNA helicases. Gene. 2006;367:17–37. [PubMed]
  • Crosby MA, Goodman JL, Strelets VB, Zhang P, Gelbart WM. FlyBase Consortium. FlyBase: genomes by the dozen. Nucleic Acids Res. 2007;35:D486–D491. [PubMed]
  • Ebralidze A, Wang Y, Petkova V, Ebralidse K, Junghans RP. RNA leaching of transcription factors disrupts transcription in myotonic dystrophy. Science. 2004;303:383–387. [PubMed]
  • Eichinger L, Pachebat JA, Glöckner G, Rajandream MA, Sucgang R, Berriman M, Song J, Olsen R, Szafranski K, Xu Q, et al. The genome of the social amoeba Dictyostelium discoideum. Nature. 2005;435:43–57. [PMC free article] [PubMed]
  • Escalante R, Moreno N, Sastre L. Dictyostelium discoideum developmentally regulated genes whose expression is dependent on MADS box transcription factor SrfA. Eukaryot Cell. 2003;2:1327–1335. [PMC free article] [PubMed]
  • Fushimi K, Ray P, Kar A, Wang L, Sutherland LC, Wu JY. Up-regulation of the proapoptotic caspase 2 splicing isoform by a candidate tumor suppressor, RBM 5. Proc Natl Acad Sci USA. 2008;105:15708–15713. [PubMed]
  • Grant CE, Tsang A. Cloning and characterization of cDNAs encoding a novel cyclic AMP-binding protein in Dictyostelium discoideum. Gene. 1990;96:213–218. [PubMed]
  • Greenwood M, Tsang A. Sequence and expression of annexin VII of Dictyostelium discoideum. Biochim Biophys Acta. 1991;1088:429–432. [PubMed]
  • Hartmuth K, Urlaub H, Vornlocher HP, Will CL, Gentzel M, Wilm M, Lührmann R. Protein composition of human prespliceosomes isolated by a tobramycin affinity-selection method. Proc Natl Acad Sci U S A. 2002;99:16719–16724. [PubMed]
  • Hinas A, Larsson P, Avesson L, Kirsebom LA, Virtanen A, Söderbom F. Identification of the major spliceosomal RNAs in Dictyostelium discoideum reveals developmentally regulated U2 variants and polyadenylated snRNAs. Eukaryot Cell. 2006;5:924–934. [PMC free article] [PubMed]
  • Hoskins AA, Friedman LJ, Gallagher SS, Crawford DJ, Anderson EG, Wombacher R, Ramirez N, Cornish VW, Gelles J, Moore MJ. Ordered and dynamic assembly of single spliceosomes. Science. 2011;331:1289–1289. [PMC free article] [PubMed]
  • Kanadia RN, Johnstone KA, Mankodi A, Lungu C, Thornton CA, Esson D, Timmers AM, Hauswirth WW, Swanson MS. A muscleblind knockout model for myotonic dystrophy. Science. 2003;302:1978–1980. [PubMed]
  • Kar A, Fushimi K, Zhou X, Ray P, Shi C, Chen X, Liu Z, Chen S, Wu JY. RNA helicase p68 (DDX5) regulates tau exon 10 splicing by modulating a stem-loop structure at the 5′ splice site. Mol Cell Biol. 2011;31:1812–1821. [PMC free article] [PubMed]
  • Ladd AN, Charlet N, Cooper TA. The CELF family of RNA binding proteins is implicated in cell-specific and developmentally regulated alternative splicing. Mol Cell Biol. 2001;21:1285–1296. [PMC free article] [PubMed]
  • Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al. Clustal W and Clustal X version 2.0. Bioinformatics. 2007;23:2947–2948. [PubMed]
  • Lejeune F, Maquat LE. Mechanistic links between nonsense-mediated mRNA decay and pre-mRNA splicing in mammalian cells. Curr Opin Cell Biol. 2005;17:309–315. [PubMed]
  • Lin S, Fu XD. SR proteins and related factors in alternative splicing. Adv Exp Med Biol. 2007;623:107–122. [PubMed]
  • Matlin AJ, Moore MJ. Spliceosome assembly and composition. Adv Exp Med Biol. 2007;623:14–35. [PubMed]
  • Moore MJ, Silver PA. Global analysis of MRNA splicing. RNA. 2008;14:197–203. [PubMed]
  • Mordes D, Luo X, Kar A, Kuo D, Xu L, Fushimi K, Yu G, Sternberg P, Jr, Wu JY. Pre-mRNA splicing and retinitis pigmentosa. Mol Vis. 2006;12:1259–1271. [PMC free article] [PubMed]
  • Nilsen TW, Graveley BR. Expansion of the eukaryotic proteome by alternative splicing. Nature. 2010;463:457–463. [PMC free article] [PubMed]
  • Pacione LR, Szego MJ, Ikeda S, Nishina PM, McInnes RR. Progress toward understanding the genetic and biochemical mechanisms of inherited photoreceptor degenerations. Annu Rev Neurosci. 2003;26:657–700. [PubMed]
  • Page RD. Visualizing phylogenetic trees using TreeView. Curr Protoc Bioinformatics. 2002;Chapter 6(Unit 62) [PubMed]
  • Patel AA, Steitz JA. Splicing double: insights from the second spliceosome. Nat Rev Mol Cell Biol. 2003;4:960–970. [PubMed]
  • Ramani AK, Calarco JA, Pan Q, Mavandadi S, Wang Y, Nelson AC, Lee LJ, Morris Q, Blencowe BJ, Zhen M, Fraser AG. Genome-wide analysis of alternative splicing in Caenorhabditis elegans. Genome Res. 2011;21:342–348. [PubMed]
  • Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. [PubMed]
  • Sanford JR, Ellis J, Cáceres JF. Multiple roles of arginine/serine-rich splicing factors in RNA processing. Biochem Soc Trans. 2005;33:443–446. [PubMed]
  • Staley JP, Guthrie C. Mechanical devices of the spliceosome: motors, clocks, springs, and things. Cell. 1998;92:315–326. [PubMed]
  • Tange TO, Nott A, Moore MJ. The ever-increasing complexities of the exon junction complex. Curr Opin Cell Biol. 2004;16:279–284. [PubMed]
  • Wang GS, Cooper TA. Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet. 2007;8:749–761. [PubMed]
  • Will CL, Lührmann R. Splicing of a rare class of introns by the U12-dependent spliceosome. Biol Chem. 2005;386:713–724. [PubMed]
  • Wu JY, Havlioglu N, Yuan L. Alternatively spliced genes. In: Meyers RA, editor. Encyclopedia of Molecular Cell Biology and Molecular Medicine. 2. Vol. 1. New York: Wiley-VCH; 2004.
  • Wu JY, Kar A, Kuo D, Yu B, Havlioglu N. SRp54 (SFRS11), a regulator for tau exon 10 alternative splicing identified by an expression cloning strategy. Mol Cell Biol. 2006;26:6739–6747. [PMC free article] [PubMed]
  • Zhou Z, Licklider LJ, Gygi SP, Reed R. Comprehensive proteomic analysis of the human spliceosome. Nature. 2002;419:182–185. [PubMed]