PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (434774)

Clipboard (0)
None

Related Articles

1.  GC content around splice sites affects splicing through pre-mRNA secondary structures 
BMC Genomics  2011;12:90.
Background
Alternative splicing increases protein diversity by generating multiple transcript isoforms from a single gene through different combinations of exons or through different selections of splice sites. It has been reported that RNA secondary structures are involved in alternative splicing. Here we perform a genomic study of RNA secondary structures around splice sites in humans (Homo sapiens), mice (Mus musculus), fruit flies (Drosophila melanogaster), and nematodes (Caenorhabditis elegans) to further investigate this phenomenon.
Results
We observe that GC content around splice sites is closely associated with the splice site usage in multiple species. RNA secondary structure is the possible explanation, because the structural stability difference among alternative splice sites, constitutive splice sites, and skipped splice sites can be explained by the GC content difference. Alternative splice sites tend to be GC-enriched and exhibit more stable RNA secondary structures in all of the considered species. In humans and mice, splice sites of first exons and long exons tend to be GC-enriched and hence form more stable structures, indicating the special role of RNA secondary structures in promoter proximal splicing events and the splicing of long exons. In addition, GC-enriched exon-intron junctions tend to be overrepresented in tissue-specific alternative splice sites, indicating the functional consequence of the GC effect. Compared with regions far from splice sites and decoy splice sites, real splice sites are GC-enriched. We also found that the GC-content effect is much stronger than the nucleotide-order effect to form stable secondary structures.
Conclusion
All of these results indicate that GC content is related to splice site usage and it may mediate the splicing process through RNA secondary structures.
doi:10.1186/1471-2164-12-90
PMCID: PMC3041747  PMID: 21281513
2.  Intron-exon structures of eukaryotic model organisms. 
Nucleic Acids Research  1999;27(15):3219-3228.
To investigate the distribution of intron-exon structures of eukaryotic genes, we have constructed a general exon database comprising all available intron-containing genes and exon databases from 10 eukaryotic model organisms: Homo sapiens, Mus musculus, Gallus gallus, Rattus norvegicus, Arabidopsis thaliana, Zea mays, Schizosaccharomyces pombe, Aspergillus, Caenorhabditis elegans and Drosophila. We purged redundant genes to avoid the possible bias brought about by redundancy in the databases. After discarding those questionable introns that do not contain correct splice sites, the final database contained 17 102 introns, 21 019 exons and 2903 independent or quasi-independent genes. On average, a eukaryotic gene contains 3.7 introns per kb protein coding region. The exon distribution peaks around 30-40 residues and most introns are 40-125 nt long. The variable intron-exon structures of the 10 model organisms reveal two interesting statistical phenomena, which cast light on some previous speculations. (i) Genome size seems to be correlated with total intron length per gene. For example, invertebrate introns are smaller than those of human genes, while yeast introns are shorter than invertebrate introns. However, this correlation is weak, suggesting that other factors besides genome size may also affect intron size. (ii) Introns smaller than 50 nt are significantly less frequent than longer introns, possibly resulting from a minimum intron size requirement for intron splicing.
PMCID: PMC148551  PMID: 10454621
3.  Accurate splice site prediction using support vector machines 
BMC Bioinformatics  2007;8(Suppl 10):S7.
Background
For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks.
Results
In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder.
Availability
Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at .
doi:10.1186/1471-2105-8-S10-S7
PMCID: PMC2230508  PMID: 18269701
4.  The role of exon shuffling in shaping protein-protein interaction networks 
BMC Genomics  2010;11(Suppl 5):S11.
Background
Physical protein-protein interaction (PPI) is a critical phenomenon for the function of most proteins in living organisms and a significant fraction of PPIs are the result of domain-domain interactions. Exon shuffling, intron-mediated recombination of exons from existing genes, is known to have been a major mechanism of domain shuffling in metazoans. Thus, we hypothesized that exon shuffling could have a significant influence in shaping the topology of PPI networks.
Results
We tested our hypothesis by compiling exon shuffling and PPI data from six eukaryotic species: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Cryptococcus neoformans and Arabidopsis thaliana. For all four metazoan species, genes enriched in exon shuffling events presented on average higher vertex degree (number of interacting partners) in PPI networks. Furthermore, we verified that a set of protein domains that are simultaneously promiscuous (known to interact to multiple types of other domains), self-interacting (able to interact with another copy of themselves) and abundant in the genomes presents a stronger signal for exon shuffling.
Conclusions
Exon shuffling appears to have been a recurrent mechanism for the emergence of new PPIs along metazoan evolution. In metazoan genomes, exon shuffling also promoted the expansion of some protein domains. We speculate that their promiscuous and self-interacting properties may have been decisive for that expansion.
doi:10.1186/1471-2164-11-S5-S11
PMCID: PMC3045794  PMID: 21210967
5.  Splice site prediction with quadratic discriminant analysis using diversity measure 
Nucleic Acids Research  2003;31(21):6214-6220.
Based on the conservation of nucleotides at splicing sites and the features of base composition and base correlation around these sites we use the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to study the dependence structure of splicing sites and predict the exons/introns and their boundaries for four model genomes: Caenorhabditis elegans, Arabidopsis thaliana, Drosophila melanogaster and human. The comparison of compositional features between two sequences and the comparison of base dependencies at adjacent or non-adjacent positions of two sequences can be integrated automatically in the increment of diversity (ID). Eight feature variables around a potential splice site are defined in terms of ID. They are integrated in a single formal framework given by IDQD. In our calculations 7 (8) base region around the donor (acceptor) sites have been considered in studying the conservation of nucleotides and sequences of 48 bp on either side of splice sites have been used in studying the compositional and base-correlating features. The windows are enlarged to 16 (donor), 29 (acceptor) and 80 bp (either side) to improve the prediction for human splice sites. The prediction capability of the present method is comparable with the leading splice site detector—GeneSplicer.
doi:10.1093/nar/gkg805
PMCID: PMC275452  PMID: 14576308
6.  Conserved RNA structures in the non-canonical Hac1/Xbp1 intron 
RNA Biology  2011;8(4):552-556.
The unconventional splicing of Hac1 by the ribonuclease Ire1 is a key event in the activation of the unfolded protein response (UPR) in Saccharomyces cerevisiae. This splicing is independent of the spliceosome and is mediated by a secondary structure at the intron-exon boundaries of the mRNA. Similar unconventional splicing was also described for the gene Xbp1 in human, mouse, Caenorhabditis elegans and Drosophila melanogaster, and for Hac1 in five other fungi. We used reported RNA structures to build a multiple sequence alignment and the Infernal package to search for homologous structures. We identified homologous non-canonical intron structures in 128 out of 156 searched eukaryotic genomes. Our results show that the sequence of the Hac1/Xbp1 intron is highly conserved only around the splice sites recognized by Ire1. The consensus structure of the Hac1/Xbp1 mRNA is well conserved in Fungi and Metazoa and resembles structures previously described. We show that a typical Hac1/Xbp1 intron is very short, only 20–26 bases, whereas yeast species have a long intron (>100 bases). We identified six species with unambiguous Hac1/Xbp1 homologs that have lost the non-canonical intron structure. We propose that these species use a different mechanism to regulate the UPR.
doi:10.4161/rna.8.4.15396
PMCID: PMC3225973  PMID: 21593604
unfolded protein response; splicing; RNA structure; intron; HAC1; XBP1
7.  What Was the Set of Ubiquitin and Ubiquitin-Like Conjugating Enzymes in the Eukaryote Common Ancestor? 
Journal of Molecular Evolution  2009;68(6):616-628.
Ubiquitin (Ub)-conjugating enzymes (E2) are key enzymes in ubiquitination or Ub-like modifications of proteins. We searched for all proteins belonging to the E2 enzyme super-family in seven species (Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Schizosaccharomyces pombe, Saccharomyces cerevisiae, and Arabidopsis thaliana) to identify families and to reconstruct each family’s phylogeny. Our phylogenetic analysis of 207 genes led us to define 17 E2 families, with 37 E2 genes, in the human genome. The subdivision of E2 into four classes did not correspond to the phylogenetic tree. The sequence signature HPN (histidine–proline–asparagine), followed by a tryptophan residue at 16 (up to 29) amino acids, was highly conserved. When present, the active cysteine was found 7 to 8 amino acids from the C-terminal end of HPN. The secondary structures were characterized by a canonical alpha/beta fold. Only family 10 deviated from the common organization because the proteins were devoid of enzymatic activity. Family 7 had an insertion between beta strands 1 and 2; families 3, 5 and 14 had an insertion between the active cysteine and the conserved tryptophan. The three-dimensional data of these proteins highlight a strong structural conservation of the core domain. Our analysis shows that the primitive eukaryote ancestor possessed a diversified set of E2 enzymes, thus emphasizing the importance of the Ub pathway. This comprehensive overview of E2 enzymes emphasizes the diversity and evolution of this superfamily and helps clarify the nomenclature and true orthologies. A better understanding of the functions of these enzymes is necessary to decipher several human diseases.
Electronic supplementary material
The online version of this article (doi:10.1007/s00239-009-9225-6) contains supplementary material, which is available to authorized users.
doi:10.1007/s00239-009-9225-6
PMCID: PMC2691932  PMID: 19452197
Ubiquitin-conjugating enzymes; Protein superfamily; Phylogeny; Homo sapiens; Molecular evolution
8.  AU-rich intronic elements affect pre-mRNA 5' splice site selection in Drosophila melanogaster. 
Molecular and Cellular Biology  1993;13(12):7689-7697.
cis-spliced nuclear pre-mRNA introns found in a variety of organisms, including Tetrahymena thermophila, Drosophila melanogaster, Caenorhabditis elegans, and plants, are significantly richer in adenosine and uridine residues than their flanking exons are. The functional significance of this intronic AU richness, however, has been demonstrated only in plant nuclei. In these nuclei, 5' and 3' splice sites are selected in part by their positions relative to AU-rich elements spread throughout the length of an intron. Because of this position-dependent selection scheme, a 5' splice site at the normal (+1) exon-intron boundary having only three contiguous consensus nucleotides can compete effectively with an enhanced exonic site (-57E) having nine consensus nucleotides and outcompete an enhanced site (+106E) embedded within the AU-rich intron. To determine whether transitions from AU-poor exonic sequences to AU-rich intronic sequences influence 5' splice site selection in other organisms, alleles of the pea rbcS3A1 intron were expressed in Drosophila Schneider 2 cells, and their splicing patterns were compared with those in tobacco nuclei. We demonstrate that this heterologous transcript can be accurately spliced in transfected Drosophila nuclei and that a +1 G-to-A knockout mutation at the normal splice site activates the same three cryptic 5' splice sites as in tobacco. Enhancement of the exonic (-57) and intronic (+106) sites to consensus splice sites indicates that potential splice sites located in the upstream exon or at the 5' exon-intron boundary are preferred in Drosophila cells over those embedded within AU-rich intronic sequences. In contrast to tobacco, in which the activities of two competing 5' splice sites upstream of the AU-rich intron are modulated by their proximity to the AU transition point, D. melanogaster utilizes the upstream site which has a higher proportion of consensus nucleotides. The enhanced version of the cryptic intronic site is efficiently selected in D. melanogaster when the normal +1 site is weakened or discrete AU-rich elements upstream of the +106E site are disrupted. Selection of this internal site in tobacco requires more drastic disruption of these motifs. We conclude that 5' splice site selection in Drosophila nuclei is influenced by the intrinsic strengths of competing sites and by the presence of AU-rich intronic elements but to a different extent than in tobacco.
Images
PMCID: PMC364840  PMID: 8246985
9.  Inparanoid: a comprehensive database of eukaryotic orthologs 
Nucleic Acids Research  2004;33(Database Issue):D476-D480.
The Inparanoid eukaryotic ortholog database (http://inparanoid.cgb.ki.se/) is a collection of pairwise ortholog groups between 17 whole genomes; Anopheles gambiae, Caenorhabditis briggsae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Takifugu rubripes, Gallus gallus, Homo sapiens, Mus musculus, Pan troglodytes, Rattus norvegicus, Oryza sativa, Plasmodium falciparum, Arabidopsis thaliana, Escherichia coli, Saccharomyces cerevisiae and Schizosaccharomyces pombe. Complete proteomes for these genomes were derived from Ensembl and UniProt and compared pairwise using Blast, followed by a clustering step using the Inparanoid program. An Inparanoid cluster is seeded by a reciprocally best-matching ortholog pair, around which inparalogs (should they exist) are gathered independently, while outparalogs are excluded. The ortholog clusters can be searched on the website using Ensembl gene/protein or UniProt identifiers, annotation text or by Blast alignment against our protein datasets. The entire dataset can be downloaded, as can the Inparanoid program itself.
doi:10.1093/nar/gki107
PMCID: PMC540061  PMID: 15608241
10.  Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples 
BMC Genomics  2008;9:509.
Background
Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes.
Results
We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy.
Conclusion
Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues.
doi:10.1186/1471-2164-9-509
PMCID: PMC2628393  PMID: 18973670
11.  Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns 
BMC Genomics  2006;7:311.
Background
The signals that determine the specificity and efficiency of splicing are multiple and complex, and are not fully understood. Among other factors, the relative contributions of different mechanisms appear to depend on intron size inasmuch as long introns might hinder the activity of the spliceosome through interference with the proper positioning of the intron-exon junctions. Indeed, it has been shown that the information content of splice sites positively correlates with intron length in the nematode, Drosophila, and fungi. We explored the connections between the length of vertebrate introns, the strength of splice sites, exonic splicing signals, and evolution of flanking exons.
Results
A compensatory relationship is shown to exist between different types of signals, namely, the splice sites and the exonic splicing enhancers (ESEs). In the range of relatively short introns (approximately, < 1.5 kilobases in length), the enhancement of the splicing signals for longer introns was manifest in the increased concentration of ESEs. In contrast, for longer introns, this effect was not detectable, and instead, an increase in the strength of the donor and acceptor splice sites was observed. Conceivably, accumulation of A-rich ESE motifs beyond a certain limit is incompatible with functional constraints operating at the level of protein sequence evolution, which leads to compensation in the form of evolution of the splice sites themselves toward greater strength. In addition, however, a correlation between sequence conservation in the exon ends and intron length, particularly, in synonymous positions, was observed throughout the entire length range of introns. Thus, splicing signals other than the currently defined ESEs, i.e., potential new classes of ESEs, might exist in exon sequences, particularly, those that flank long introns.
Conclusion
Several weak but statistically significant correlations were observed between vertebrate intron length, splice site strength, and potential exonic splicing signals. Taken together, these findings attest to a compensatory relationship between splice sites and exonic splicing signals, depending on intron length.
doi:10.1186/1471-2164-7-311
PMCID: PMC1713244  PMID: 17156453
12.  Large-Scale Trends in the Evolution of Gene Structures within 11 Animal Genomes 
PLoS Computational Biology  2006;2(3):e15.
We have used the annotations of six animal genomes (Homo sapiens, Mus musculus, Ciona intestinalis, Drosophila melanogaster, Anopheles gambiae, and Caenorhabditis elegans) together with the sequences of five unannotated Drosophila genomes to survey changes in protein sequence and gene structure over a variety of timescales—from the less than 5 million years since the divergence of D. simulans and D. melanogaster to the more than 500 million years that have elapsed since the Cambrian explosion. To do so, we have developed a new open-source software library called CGL (for “Comparative Genomics Library”). Our results demonstrate that change in intron–exon structure is gradual, clock-like, and largely independent of coding-sequence evolution. This means that genome annotations can be used in new ways to inform, corroborate, and test conclusions drawn from comparative genomics analyses that are based upon protein and nucleotide sequence similarities.
Synopsis
Just as protein sequences change over time, so do gene structures. Over comparatively short evolutionary timescales, introns lengthen and shorten; and over longer timescales the number and positions of introns in homologous genes can change. These facts suggest that the intron–exon structures of genes may provide a source of evolutionary information. The utility of gene structures as materials for phylogenetic analyses, however, depends upon their independence from the forces driving protein evolution. If, for example, intron–exon structures are strongly influenced by selection at the amino acid level, then using them for phylogenetic investigations is largely pointless, as the same information could have been more easily gained from protein analyses. Using 11 animal genomes, Yandell et al. show that evolution of intron lengths and positions is largely—though not completely—independent of protein sequence evolution. This means that gene structures provide a source of information about the evolutionary past independent of protein sequence similarities—a finding the authors employ to investigate the accuracy of the protein clock and to explore the utility of gene structures as a means to resolve deep phylogenetic relationships within the animals.
doi:10.1371/journal.pcbi.0020015
PMCID: PMC1386723  PMID: 16518452
13.  Longer First Introns Are a General Property of Eukaryotic Gene Structure 
PLoS ONE  2008;3(8):e3093.
While many properties of eukaryotic gene structure are well characterized, differences in the form and function of introns that occur at different positions within a transcript are less well understood. In particular, the dynamics of intron length variation with respect to intron position has received relatively little attention. This study analyzes all available data on intron lengths in GenBank and finds a significant trend of increased length in first introns throughout a wide range of species. This trend was found to be even stronger when using high-confidence gene annotation data for three model organisms (Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster) which show that the first intron in the 5′ UTR is - on average - significantly longer than all downstream introns within a gene. A partial explanation for increased first intron length in A. thaliana is suggested by the increased frequency of certain motifs that are present in first introns. The phenomenon of longer first introns can potentially be used to improve gene prediction software and also to detect errors in existing gene annotations.
doi:10.1371/journal.pone.0003093
PMCID: PMC2518113  PMID: 18769727
14.  Intronic and exonic sequences modulate 5' splice site selection in plant nuclei. 
Nucleic Acids Research  1997;25(5):1071-1077.
Pre-mRNA transcripts in a variety of organisms, including plants, Drosophila and Caenorhabditis elegans, contain introns which are significantly richer in adenosine and uridine residues than their flanking exons. Previous analyses using exonic and intronic replacements between two nonequivalent 5'splice sites in the 469 nt long rbcS3A intron 1 provided the first evidence indicating that, in both tobacco and Drosophila nuclei, 5'splice site selection is strongly influenced by the position of that site relative to the AU transition point between exon and intron. To differentiate between two potential models for 5'splice site recognition, we have expressed a completely different set of intronic and exonic replacement constructs containing identical 5'splice sites upstream of beta-conglycinin intron 4 (115 nt). Mutagenesis and deletion of the upstream 5'splice site demonstrate that intronic AU-rich sequences function by promoting recognition of the most upstream 5'splice site rather than by masking the downstream 5'splice site. Sequence insertions define a role for AG-rich exonic sequences in plant pre-mRNA splicing by demonstrating that an AG-rich element is capable of promoting downstream 5'splice site recognition. We conclude that AU-rich intronic sequences, AG-rich exonic sequences and the 5'splice site itself collectively define 5'intron boundaries in dicot nuclei.
PMCID: PMC146543  PMID: 9023120
15.  YOGY: a web-based, integrated database to retrieve protein orthologs and associated Gene Ontology terms 
Nucleic Acids Research  2006;34(Web Server issue):W330-W334.
We present YOGY a web-based resource for orthologous proteins from nine eukaryotic organisms: Homo sapiens, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans, Plasmodium falciparum, Schizosaccharomyces pombe and Saccharomyces cerevisiae. Using a gene name from any of these organisms as a query, this database provides comprehensive, combined information on orthologs in other species using data from five independent resources: KOGs, Inparanoid, HomoloGene, OrthoMCL and a table of curated fission and budding yeast orthologs. Associated Gene Ontology (GO) terms of orthologs can also be retrieved for functional inference. Integrating these different and complementary datasets provides a straightforward tool to identify known and predicted orthologs of proteins from a variety of species. This resource should be useful for bench scientists looking for functional clues for their genes of interest as well as for curators looking for information that can be transferred based on orthology and for rapidly identifying the relevant GO terms as an aid to literature curation. YOGY is accessible online at .
doi:10.1093/nar/gkl311
PMCID: PMC1538793  PMID: 16845020
16.  Involvement of the nuclear cap-binding protein complex in alternative splicing in Arabidopsis thaliana 
Nucleic Acids Research  2009;38(1):265-278.
The nuclear cap-binding protein complex (CBC) participates in 5′ splice site selection of introns that are proximal to the mRNA cap. However, it is not known whether CBC has a role in alternative splicing. Using an RT–PCR alternative splicing panel, we analysed 435 alternative splicing events in Arabidopsis thaliana genes, encoding mainly transcription factors, splicing factors and stress-related proteins. Splicing profiles were determined in wild type plants, the cbp20 and cbp80(abh1) single mutants and the cbp20/80 double mutant. The alternative splicing events included alternative 5′ and 3′ splice site selection, exon skipping and intron retention. Significant changes in the ratios of alternative splicing isoforms were found in 101 genes. Of these, 41% were common to all three CBC mutants and 15% were observed only in the double mutant. The cbp80(abh1) and cbp20/80 mutants had many more changes in alternative splicing in common than did cbp20 and cbp20/80 suggesting that CBP80 plays a more significant role in alternative splicing than CBP20, probably being a platform for interactions with other splicing factors. Cap-binding proteins and the CBC are therefore directly involved in alternative splicing of some Arabidopsis genes and in most cases influenced alternative splicing of the first intron, particularly at the 5′ splice site.
doi:10.1093/nar/gkp869
PMCID: PMC2800227  PMID: 19864257
17.  Modulation of alternative splicing by long-range RNA structures in Drosophila 
Nucleic Acids Research  2009;37(14):4533-4544.
Accurate and efficient recognition of splice sites during pre-mRNA splicing is essential for proper transcriptome expression. Splice site usage can be modulated by secondary structures, but it is unclear if this type of modulation is commonly used or occurs to a significant degree with secondary structures forming over long distances. Using phlyogenetic comparisons of intronic sequences among 12 Drosophila genomes, we elucidated a group of 202 highly conserved pairs of sequences, each at least nine nucleotides long, capable of forming stable stem structures. This set was highly enriched in alternatively spliced introns and introns with weak acceptor sites and long introns, and most occurred over long distances (>150 nucleotides). Experimentally, we analyzed the splicing of several of these introns using mini-genes in Drosophila S2 cells. Wild-type splicing patterns were changed by mutations that opened the stem structure, and restored by compensatory mutations that re-established the base-pairing potential, demonstrating that these secondary structures were indeed implicated in the splice site choice. Mechanistically, the RNA structures masked splice sites, brought together distant splice sites and/or looped out introns. Thus, base-pairing interactions within introns, even those occurring over long distances, are more frequent modulators of alternative splicing than is currently assumed.
doi:10.1093/nar/gkp407
PMCID: PMC2724269  PMID: 19465384
18.  Intron definition in splicing of small Drosophila introns. 
Molecular and Cellular Biology  1994;14(5):3434-3445.
Approximately half of the introns in Drosophila melanogaster are too small to function in a vertebrate and often lack the pyrimidine tract associated with vertebrate 3' splice sites. Here, we report the splicing and spliceosome assembly properties of two such introns: one with a pyrimidine-poor 3' splice site and one with a pyrimidine-rich 3' splice site. The pyrimidine-poor intron was absolutely dependent on its small size for in vivo and in vitro splicing and assembly. As such, it had properties reminiscent of those of yeast introns. The pyrimidine-rich intron had properties intermediate between those of yeasts and vertebrates. This 3' splice site directed assembly of ATP-dependent complexes when present as either an intron or exon and supported low levels of in vivo splicing of a moderate-length intron. We propose that splice sites can be recognized as pairs across either exons or introns, depending on which distance is shorter, and that a pyrimidine-rich region upstream of the 3' splice site facilitates the exon mode.
Images
PMCID: PMC358708  PMID: 8164690
19.  Identification and characterization of NAGNAG alternative splicing in the moss Physcomitrella patens 
BMC Plant Biology  2010;10:76.
Background
Alternative splicing (AS) involving tandem acceptors that are separated by three nucleotides (NAGNAG) is an evolutionarily widespread class of AS, which is well studied in Homo sapiens (human) and Mus musculus (mouse). It has also been shown to be common in the model seed plants Arabidopsis thaliana and Oryza sativa (rice). In one of the first studies involving sequence-based prediction of AS in plants, we performed a genome-wide identification and characterization of NAGNAG AS in the model plant Physcomitrella patens, a moss.
Results
Using Sanger data, we found 295 alternatively used NAGNAG acceptors in P. patens. Using 31 features and training and test datasets of constitutive and alternative NAGNAGs, we trained a classifier to predict the splicing outcome at NAGNAG tandem splice sites (alternative splicing, constitutive at the first acceptor, or constitutive at the second acceptor). Our classifier achieved a balanced specificity and sensitivity of ≥ 89%. Subsequently, a classifier trained exclusively on data well supported by transcript evidence was used to make genome-wide predictions of NAGNAG splicing outcomes. By generation of more transcript evidence from a next-generation sequencing platform (Roche 454), we found additional evidence for NAGNAG AS, with altogether 664 alternative NAGNAGs being detected in P. patens using all currently available transcript evidence. The 454 data also enabled us to validate the predictions of the classifier, with 64% (80/125) of the well-supported cases of AS being predicted correctly.
Conclusion
NAGNAG AS is just as common in the moss P. patens as it is in the seed plants A. thaliana and O. sativa (but not conserved on the level of orthologous introns), and can be predicted with high accuracy. The most informative features are the nucleotides in the NAGNAG and in its immediate vicinity, along with the splice sites scores, as found earlier for NAGNAG AS in animals. Our results suggest that the mechanism behind NAGNAG AS in plants is similar to that in animals and is largely dependent on the splice site and its immediate neighborhood.
doi:10.1186/1471-2229-10-76
PMCID: PMC3095350  PMID: 20426810
20.  Conservation and Sex-Specific Splicing of the transformer Gene in the Calliphorids Cochliomyia hominivorax, Cochliomyia macellaria and Lucilia sericata 
PLoS ONE  2013;8(2):e56303.
Transformer (TRA) promotes female development in several dipteran species including the Australian sheep blowfly Lucilia cuprina, the Mediterranean fruit fly, housefly and Drosophila melanogaster. tra transcripts are sex-specifically spliced such that only the female form encodes full length functional protein. The presence of six predicted TRA/TRA2 binding sites in the sex-specific female intron of the L. cuprina gene suggested that tra splicing is auto-regulated as in medfly and housefly. With the aim of identifying conserved motifs that may play a role in tra sex-specific splicing, here we have isolated and characterized the tra gene from three additional blowfly species, L. sericata, Cochliomyia hominivorax and C. macellaria. The blowfly adult male and female transcripts differ in the choice of splice donor site in the first intron, with males using a site downstream of the site used in females. The tra genes all contain a single TRA/TRA2 site in the male exon and a cluster of four to five sites in the male intron. However, overall the sex-specific intron sequences are poorly conserved in closely related blowflies. The most conserved regions are around the exon/intron junctions, the 3′ end of the intron and near the cluster of TRA/TRA2 sites. We propose a model for sex specific regulation of tra splicing that incorporates the conserved features identified in this study. In L. sericata embryos, the male tra transcript was first detected at around the time of cellular blastoderm formation. RNAi experiments showed that tra is required for female development in L. sericata and C. macellaria. The isolation of the tra gene from the New World screwworm fly C. hominivorax, a major livestock pest, will facilitate the development of a “male-only” strain for genetic control programs.
doi:10.1371/journal.pone.0056303
PMCID: PMC3567074  PMID: 23409170
21.  Pyrimidine tracts between the 5' splice site and branch point facilitate splicing and recognition of a small Drosophila intron. 
Molecular and Cellular Biology  1997;17(5):2774-2780.
The minimum size for splicing of a vertebrate intron is approximately 70 nucleotides. In Drosophila melanogaster, more than half of the introns are significantly below this minimum yet function well. Such short introns often lack the pyrimidine tract located between the branch point and 3' splice site common to metazoan introns. To investigate if small introns contain special sequences that facilitate their recognition, the sequences and factors required for the splicing of a 59-nucleotide intron from the D. melanogaster mle gene have been examined. This intron contains only a minimal region of interrupted pyrimidines downstream of the branch point. Instead, two longer, uninterrupted C-rich tracts are located between the 5' splice site and branch point. Both of these sequences are required for maximal in vivo and in vitro splicing. The upstream sequences are also required for maximal binding of factors to the 5' splice site, cross-linking of U2AF to precursor RNA, and assembly of the active spliceosome, suggesting that sequences upstream of the branch point influence events at both ends of the small mle intron. Thus, a very short intron lacking a classical pyrimidine tract between the branch point and 3' splice site requires accessory pyrimidine sequences in the short region between the 5' splice site and branch point.
PMCID: PMC232128  PMID: 9111348
22.  An Intronic Enhancer Regulates Splicing of the Twintron of Drosophila melanogaster prospero Pre-mRNA by Two Different Spliceosomes 
Molecular and Cellular Biology  2004;24(5):1855-1869.
We have examined the alternative splicing of the Drosophila melanogaster prospero twintron, which contains splice sites for both the U2- and U12-type spliceosome and generates two forms of mRNA, pros-L (U2-type product) and pros-S (U12-type product). We find that twintron splicing is developmentally regulated: pros-L is abundant in early embryogenesis while pros-S displays the opposite pattern. We have established a Kc cell in vitro splicing system that accurately splices a minimal pros substrate containing the twintron and have examined the sequence requirements for pros twintron splicing. Systematic deletion and mutation analysis of intron sequences established that twintron splicing requires a 46-nucleotide purine-rich element located 32 nucleotides downstream of the U2-type 5′ splice site. While this element regulates both splicing pathways, its alteration showed the severest effects on the U2-type splicing pathway. Addition of an RNA competitor containing the wild-type purine-rich element to the Kc extract abolished U2-type splicing and slightly repressed U12-type splicing, suggesting that a trans-acting factor(s) binds the enhancer element to stimulate twintron splicing. Thus, we have identified an intron region critical for prospero twintron splicing as a first step towards elucidating the molecular mechanism of splicing regulation involving competition between the two kinds of spliceosomes.
doi:10.1128/MCB.24.5.1855-1869.2004
PMCID: PMC350559  PMID: 14966268
23.  Intronic Alternative Splicing Regulators Identified by Comparative Genomics in Nematodes 
PLoS Computational Biology  2006;2(7):e86.
Many alternative splicing events are regulated by pentameric and hexameric intronic sequences that serve as binding sites for splicing regulatory factors. We hypothesized that intronic elements that regulate alternative splicing are under selective pressure for evolutionary conservation. Using a Wobble Aware Bulk Aligner genomic alignment of Caenorhabditis elegans and Caenorhabditis briggsae, we identified 147 alternatively spliced cassette exons that exhibit short regions of high nucleotide conservation in the introns flanking the alternative exon. In vivo experiments on the alternatively spliced let-2 gene confirm that these conserved regions can be important for alternative splicing regulation. Conserved intronic element sequences were collected into a dataset and the occurrence of each pentamer and hexamer motif was counted. We compared the frequency of pentamers and hexamers in the conserved intronic elements to a dataset of all C. elegans intron sequences in order to identify short intronic motifs that are more likely to be associated with alternative splicing. High-scoring motifs were examined for upstream or downstream preferences in introns surrounding alternative exons. Many of the high- scoring nematode pentamer and hexamer motifs correspond to known mammalian splicing regulatory sequences, such as (T)GCATG, indicating that the mechanism of alternative splicing regulation is well conserved in metazoans. A comparison of the analysis of the conserved intronic elements, and analysis of the entire introns flanking these same exons, reveals that focusing on intronic conservation can increase the sensitivity of detecting putative splicing regulatory motifs. This approach also identified novel sequences whose role in splicing is under investigation and has allowed us to take a step forward in defining a catalog of splicing regulatory elements for an organism. In vivo experiments confirm that one novel high-scoring sequence from our analysis, (T)CTATC, is important for alternative splicing regulation of the unc-52 gene.
Synopsis
Alternative splicing of precursor messenger RNA is a process by which multiple protein isoforms are generated from a single gene. As many as 60% of human genes are processed in this manner, creating tissue-specific isoforms of proteins that may be a key factor in regulating the complexity of our physiology. One of the major challenges to understanding this process is to identify the sequences on the precursor messenger RNA responsible for splicing regulation. Some of these regulatory sequences occur in regions that are spliced out (called introns). This study tested the hypothesis that there should be evolutionary pressure to maintain these intronic regulatory sequences, even though intron sequence is non-coding and rapidly diverges between species. The authors employed a genomic alignment of two roundworms, Caenorhabditis elegans and Caenorhabditis briggsae, to investigate the regulation of alternative splicing. By examining evolutionarily conserved stretches of introns flanking alternatively spliced exons, the authors identified and functionally confirmed splicing regulatory sequences. Many of the top scoring sequences match known mammalian regulators, suggesting the alternative splicing regulatory mechanism is conserved across all metazoans. Other sequences were not previously identified in mammals and may represent new alternative splicing regulatory elements in higher organisms or ones that may be specific to worms.
doi:10.1371/journal.pcbi.0020086
PMCID: PMC1500816  PMID: 16839192
24.  Molecular evolution of eukaryotic genomes: hemiascomycetous yeast spliceosomal introns 
Nucleic Acids Research  2003;31(4):1121-1135.
As part of the exploratory sequencing program Génolevures, visual scrutinisation and bioinformatic tools were used to detect spliceosomal introns in seven hemiascomycetous yeast species. A total of 153 putative novel introns were identified. Introns are rare in yeast nuclear genes (<5% have an intron), mainly located at the 5′ end of ORFs, and not highly conserved in sequence. They all share a clear non-random vocabulary: conserved splice sites and conserved nucleotide contexts around splice sites. Homologues of metazoan snRNAs and putative homologues of SR splicing factors were identified, confirming that the spliceosomal machinery is highly conserved in eukaryotes. Several introns’ features were tested as possible markers for phylogenetic analysis. We found that intron sizes vary widely within each genome, and according to the phylogenetic position of the yeast species. The evolutionary origin of spliceosomal introns was examined by analysing the degree of conservation of intron positions in homologous yeast genes. Most introns appeared to exist in the last common ancestor of present day yeast species, and then to have been differentially lost during speciation. However, in some cases, it is difficult to exclude a possible sliding event affecting a pre-existing intron or a gain of a novel intron. Taken together, our results indicate that the origin of spliceosomal introns is complex within a given genome, and that present day introns may have resulted from a dynamic flux between intron conservation, intron loss and intron gain during the evolution of hemiascomycetous yeasts.
PMCID: PMC150231  PMID: 12582231
25.  NLSdb: database of nuclear localization signals 
Nucleic Acids Research  2003;31(1):397-399.
NLSdb is a database of nuclear localization signals (NLSs) and of nuclear proteins. NLSs are short stretches of residues mediating transport of nuclear proteins into the nucleus. The database contains 114 experimentally determined NLSs that were obtained through an extensive literature search. Using ‘in silico mutagenesis’ this set was extended to 308 experimental and potential NLSs. This final set matched over 43% of all known nuclear proteins and matches no currently known non-nuclear protein. NLSdb contains over 6000 predicted nuclear proteins and their targeting signals from the PDB and SWISS-PROT/TrEMBL databases. The database also contains over 12 500 predicted nuclear proteins from six entirely sequenced eukaryotic proteomes (Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana and Saccharomyces cerevisiae). NLS motifs often co-localize with DNA-binding regions. This observation was used to also annotate over 1500 DNA-binding proteins. NLSdb can be accessed via the web site: http://cubic.bioc.columbia.edu/db/NLSdb/.
PMCID: PMC165448  PMID: 12520032

Results 1-25 (434774)