PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1286808)

Clipboard (0)
None

Related Articles

1.  Alu Exonization Events Reveal Features Required for Precise Recognition of Exons by the Splicing Machinery 
PLoS Computational Biology  2009;5(3):e1000300.
Despite decades of research, the question of how the mRNA splicing machinery precisely identifies short exonic islands within the vast intronic oceans remains to a large extent obscure. In this study, we analyzed Alu exonization events, aiming to understand the requirements for correct selection of exons. Comparison of exonizing Alus to their non-exonizing counterparts is informative because Alus in these two groups have retained high sequence similarity but are perceived differently by the splicing machinery. We identified and characterized numerous features used by the splicing machinery to discriminate between Alu exons and their non-exonizing counterparts. Of these, the most novel is secondary structure: Alu exons in general and their 5′ splice sites (5′ss) in particular are characterized by decreased stability of local secondary structures with respect to their non-exonizing counterparts. We detected numerous further differences between Alu exons and their non-exonizing counterparts, among others in terms of exon–intron architecture and strength of splicing signals, enhancers, and silencers. Support vector machine analysis revealed that these features allow a high level of discrimination (AUC = 0.91) between exonizing and non-exonizing Alus. Moreover, the computationally derived probabilities of exonization significantly correlated with the biological inclusion level of the Alu exons, and the model could also be extended to general datasets of constitutive and alternative exons. This indicates that the features detected and explored in this study provide the basis not only for precise exon selection but also for the fine-tuned regulation thereof, manifested in cases of alternative splicing.
Author Summary
A typical human gene consists of 9 exons around 150 nucleotides in length, separated by introns that are ∼3,000 nucleotides long. The challenge of the splicing machinery is to precisely identify and ligate the exons, while removing the introns. We aimed to understand how the splicing machinery meets this momentous challenge, based on Alu exonization events. Alus are transposable elements, of which approximately one million copies exist in the human genome, a large portion of which within introns. Throughout evolution, some intronic Alus accumulated mutations and became recognized by the splicing machinery as exons, a process termed exonization. Such Alus remain highly similar to their non-exonizing counterparts but are perceived as different by the splicing machinery. By comparing exonizing Alus to their non-exonizing counterparts, we were able to identify numerous features in which they differ and which presumably lead to the recognition only of the former by the splicing machinery. Our findings reveal insights regarding the role of local RNA secondary structures, exon–intron architecture constraints, and splicing regulatory signals. We integrated these features in a computational model, which was able to successfully mimic the function of the splicing machinery and discriminate between true Alu exons and their intronic counterparts, highlighting the functional importance of these features.
doi:10.1371/journal.pcbi.1000300
PMCID: PMC2639721  PMID: 19266014
2.  Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution 
PLoS Biology  2012;10(1):e1001229.
Inclusion or exclusion of single codons at the splice acceptor site of mammalian genes is regulated in a tissue-specific manner, is strongly conserved, and is associated with local accelerated protein evolution.
Thousands of human genes contain introns ending in NAGNAG (N any nucleotide), where both NAGs can function as 3′ splice sites, yielding isoforms that differ by inclusion/exclusion of three bases. However, few models exist for how such splicing might be regulated, and some studies have concluded that NAGNAG splicing is purely stochastic and nonfunctional. Here, we used deep RNA-Seq data from 16 human and eight mouse tissues to analyze the regulation and evolution of NAGNAG splicing. Using both biological and technical replicates to estimate false discovery rates, we estimate that at least 25% of alternatively spliced NAGNAGs undergo tissue-specific regulation in mammals, and alternative splicing of strongly tissue-specific NAGNAGs was 10 times as likely to be conserved between species as was splicing of non-tissue-specific events, implying selective maintenance. Preferential use of the distal NAG was associated with distinct sequence features, including a more distal location of the branch point and presence of a pyrimidine immediately before the first NAG, and alteration of these features in a splicing reporter shifted splicing away from the distal site. Strikingly, alignments of orthologous exons revealed a ∼15-fold increase in the frequency of three base pair gaps at 3′ splice sites relative to nearby exon positions in both mammals and in Drosophila. Alternative splicing of NAGNAGs in human was associated with dramatically increased frequency of exon length changes at orthologous exon boundaries in rodents, and a model involving point mutations that create, destroy, or alter NAGNAGs can explain both the increased frequency and biased codon composition of gained/lost sequence observed at the beginnings of exons. This study shows that NAGNAG alternative splicing generates widespread differences between the proteomes of mammalian tissues, and suggests that the evolutionary trajectories of mammalian proteins are strongly biased by the locations and phases of the introns that interrupt coding sequences.
Author Summary
In order to translate a gene into protein, all of the non-coding regions (introns) need to be removed from the transcript and the coding regions (exons) stitched back together to make an mRNA. Most human genes are alternatively spliced, allowing the selection of different combinations of exons to produce multiple distinct mRNAs and proteins. Many types of alternative splicing are known to play crucial roles in biological processes including cell fate determination, tumor metabolism, and apoptosis. In this study, we investigated a form of alternative splicing in which competing adjacent 3′ splice sites (or splice acceptor sites) generate mRNAs differing by just an RNA triplet, the size of a single codon. This mode of alternative splicing, known as NAGNAG splicing, affects thousands of human genes and has been known for a decade, but its potential regulation, physiological importance, and conservation across species have been disputed. Using high-throughput sequencing of cDNA (“RNA-Seq”) from human and mouse tissues, we found that single-codon splicing often shows strong tissue specificity. Regulated NAGNAG alternative splice sites are selectively conserved between human and mouse genes, suggesting that they are important for organismal fitness. We identified features of the competing splice sites that influence NAGNAG splicing, and validated their effects in cultured cells. Furthermore, we found that this mode of splicing is associated with accelerated and highly biased protein evolution at exon boundaries. Taken together, our analyses demonstrate that the inclusion or exclusion of RNA triplets at exon boundaries can be effectively regulated by the splicing machinery, and highlight an unexpected connection between RNA processing and protein evolution.
doi:10.1371/journal.pbio.1001229
PMCID: PMC3250501  PMID: 22235189
3.  Evolutionary Convergence on Highly-Conserved 3′ Intron Structures in Intron-Poor Eukaryotes and Insights into the Ancestral Eukaryotic Genome 
PLoS Genetics  2008;4(8):e1000148.
The presence of spliceosomal introns in eukaryotes raises a range of questions about genomic evolution. Along with the fundamental mysteries of introns' initial proliferation and persistence, the evolutionary forces acting on intron sequences remain largely mysterious. Intron number varies across species from a few introns per genome to several introns per gene, and the elements of intron sequences directly implicated in splicing vary from degenerate to strict consensus motifs. We report a 50-species comparative genomic study of intron sequences across most eukaryotic groups. We find two broad and striking patterns. First, we find that some highly intron-poor lineages have undergone evolutionary convergence to strong 3′ consensus intron structures. This finding holds for both branch point sequence and distance between the branch point and the 3′ splice site. Interestingly, this difference appears to exist within the genomes of green alga of the genus Ostreococcus, which exhibit highly constrained intron sequences through most of the intron-poor genome, but not in one much more intron-dense genomic region. Second, we find evidence that ancestral genomes contained highly variable branch point sequences, similar to more complex modern intron-rich eukaryotic lineages. In addition, ancestral structures are likely to have included polyT tails similar to those in metazoans and plants, which we found in a variety of protist lineages. Intriguingly, intron structure evolution appears to be quite different across lineages experiencing different types of genome reduction: whereas lineages with very few introns tend towards highly regular intronic sequences, lineages with very short introns tend towards highly degenerate sequences. Together, these results attest to the complex nature of ancestral eukaryotic splicing, the qualitatively different evolutionary forces acting on intron structures across modern lineages, and the impressive evolutionary malleability of eukaryotic gene structures.
Author Summary
The spliceosomal introns that interrupt eukaryotic genes show great number and sequence variation across species, from the rare, highly uniform yeast introns to the ubiquitous and highly variable vertebrate intron sequences. The causes of these differences remain mysterious. We studied sequences of intron branch points and 3′ termini in 50 eukaryotic species. All intron-rich species exhibit variable 3′ sequences. However, intron-poor species range from variable sequences, to uniform branch point motifs, to uniform branch point motifs in uniform positions along the intronic sequence. This is a more complex pattern than the clear relationship between intron number and 5′ intron sequence uniformity found previously. The correspondence of sequence uniformity and intron number extends to species of the green algal genus Ostreococcus, in which the single intron-rich genomic region shows far more variable intron sequences than in the otherwise intron-poor genome. We suggest that different concentrations of spliceosomal complexes may explain these differences. In addition, we report the existence of 3′ polyT tails in diverse eukaryotic protists, suggesting that this structure is ancestral. Together, these results underscore the complexity of ancestral eukaryotic splicing, the qualitatively different evolutionary forces acting on intron sequences in modern eukaryotes, and the impressive evolutionary malleability of eukaryotic genes.
doi:10.1371/journal.pgen.1000148
PMCID: PMC2483917  PMID: 18688272
4.  Splicing and the Evolution of Proteins in Mammals 
PLoS Biology  2007;5(2):e14.
It is often supposed that a protein's rate of evolution and its amino acid content are determined by the function and anatomy of the protein. Here we examine an alternative possibility, namely that the requirement to specify in the unprocessed RNA, in the vicinity of intron–exon boundaries, information necessary for removal of introns (e.g., exonic splice enhancers) affects both amino acid usage and rates of protein evolution. We find that the majority of amino acids show skewed usage near intron–exon boundaries, and that differences in the trends for the 2-fold and 4-fold blocks of both arginine and leucine show this to be owing to effects mediated at the nucleotide level. More specifically, there is a robust relationship between the extent to which an amino acid is preferred/avoided near boundaries and its enrichment/paucity in splice enhancers. As might then be expected, the rate of evolution is lowest near intron–exon boundaries, at least in part owing to splice enhancers, such that domains flanking intron–exon junctions evolve on average at under half the rate of exon centres from the same gene. In contrast, the rate of evolution of intronless retrogenes is highest near the domains where intron–exon junctions previously resided. The proportion of sequence near intron–exon boundaries is one of the stronger predictors of a protein's rate of evolution in mammals yet described. We conclude that after intron insertion selection favours modification of amino acid content near intron–exon junctions, so as to enable efficient intron removal, these changes then being subject to strong purifying selection even if nonoptimal for protein function. Thus there exists a strong force operating on protein evolution in mammals that is not explained directly in terms of the biology of the protein.
Intron-exon boundaries, once fixed in proteins, are found to be subject to purifying selection, even if they are not optimal for protein function.
Author Summary
Most of the DNA in our genes is actually not involved in the specification of proteins. Rather, the bits with the protein-coding information (exons) are separated from each other by noncoding bits, introns. Before a gene can be translated into protein these introns are removed and the exons are spliced back together to be translated into protein. While information about which DNA to remove is largely in the introns themselves, parts of the exons near the intron–exon boundary can, for example, function as splice enhancer elements. In principle, then, these parts of exons have two functions: to specify the amino acids of the resulting protein and to enable the correct removal of introns. What impact might this have on a gene's evolution? We show that near intron–exon boundaries, amino acid usage is biased towards nucleotides involved in splice control. Moreover, these parts of genes evolve especially slowly. Indeed, we estimate that a gene with many exons would evolve at under half the rate of the same gene with no introns, simply owing to the need to specify where to remove introns. Likewise, genes that have lost their introns evolve especially fast near the former intron's location. Thus, human proteins may not be as optimised as they could be, as their sequence is serving two conflicting roles.
doi:10.1371/journal.pbio.0050014
PMCID: PMC1790955  PMID: 17298171
5.  Comparative analysis of information contents relevant to recognition of introns in many species 
BMC Genomics  2011;12:45.
Background
The basic process of RNA splicing is conserved among eukaryotic species. Three signals (5' and 3' splice sites and branch site) are commonly used to directly conduct splicing, while other features are also related to the recognition of an intron. Although there is experimental evidence pointing to the significant species specificities in the features of intron recognition, a quantitative evaluation of the divergence of these features among a wide variety of eukaryotes has yet to be conducted.
Results
To better understand the splicing process from the viewpoints of evolution and information theory, we collected introns from 61 diverse species of eukaryotes and analyzed the properties of the nucleotide sequences relevant to splicing. We found that trees individually constructed from the five features (the three signals, intron length, and nucleotide composition within an intron) roughly reflect the phylogenetic relationships among the species but sometimes extensively deviate from the species classification. The degree of topological deviation of each feature tree from the reference trees indicates the lowest discordance for the 5' splicing signal, followed by that for the 3' splicing signal, and a considerably greater discordance for the other three features. We also estimated the relative contributions of the five features to short intron recognition in each species. Again, moderate correlation was observed between the similarities in pattern of short intron recognition and the genealogical relationships among the species. When mammalian introns were categorized into three subtypes according to their terminal dinucleotide sequences, each subtype segregated into a nearly monophyletic group, regardless of the host species, with respect to the 5' and 3' splicing signals. It was also found that GC-AG introns are extraordinarily abundant in some species with high genomic G + C contents, and that the U12-type spliceosome might make a greater contribution than currently estimated in most species.
Conclusions
Overall, the present study indicates that both splicing signals themselves and their relative contributions to short intron recognition are rather susceptible to evolutionary changes, while some poorly characterized properties seem to be preserved within the mammalian intron subtypes. Our findings may afford additional clues to understanding of evolution of splicing mechanisms.
doi:10.1186/1471-2164-12-45
PMCID: PMC3033335  PMID: 21247441
6.  The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? 
Biology Direct  2006;1:22.
Background
Ever since the discovery of 'genes in pieces' and mRNA splicing in eukaryotes, origin and evolution of spliceosomal introns have been considered within the conceptual framework of the 'introns early' versus 'introns late' debate. The 'introns early' hypothesis, which is closely linked to the so-called exon theory of gene evolution, posits that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. Under this scenario, the absence of spliceosomal introns in prokaryotes is considered to be a result of "genome streamlining". The 'introns late' hypothesis counters that spliceosomal introns emerged only in eukaryotes, and moreover, have been inserted into protein-coding genes continuously throughout the evolution of eukaryotes. Beyond the formal dilemma, the more substantial side of this debate has to do with possible roles of introns in the evolution of eukaryotes.
Results
I argue that several lines of evidence now suggest a coherent solution to the introns-early versus introns-late debate, and the emerging picture of intron evolution integrates aspects of both views although, formally, there seems to be no support for the original version of introns-early. Firstly, there is growing evidence that spliceosomal introns evolved from group II self-splicing introns which are present, usually, in small numbers, in many bacteria, and probably, moved into the evolving eukaryotic genome from the α-proteobacterial progenitor of the mitochondria. Secondly, the concept of a primordial pool of 'virus-like' genetic elements implies that self-splicing introns are among the most ancient genetic entities. Thirdly, reconstructions of the ancestral state of eukaryotic genes suggest that the last common ancestor of extant eukaryotes had an intron-rich genome. Thus, it appears that ancestors of spliceosomal introns, indeed, have existed since the earliest stages of life's evolution, in a formal agreement with the introns-early scenario. However, there is no evidence that these ancient introns ever became widespread before the emergence of eukaryotes, hence, the central tenet of introns-early, the role of introns in early evolution of proteins, has no support. However, the demonstration that numerous introns invaded eukaryotic genes at the outset of eukaryotic evolution and that subsequent intron gain has been limited in many eukaryotic lineages implicates introns as an ancestral feature of eukaryotic genomes and refutes radical versions of introns-late. Perhaps, most importantly, I argue that the intron invasion triggered other pivotal events of eukaryogenesis, including the emergence of the spliceosome, the nucleus, the linear chromosomes, the telomerase, and the ubiquitin signaling system. This concept of eukaryogenesis, in a sense, revives some tenets of the exon hypothesis, by assigning to introns crucial roles in eukaryotic evolutionary innovation.
Conclusion
The scenario of the origin and evolution of introns that is best compatible with the results of comparative genomics and theoretical considerations goes as follows: self-splicing introns since the earliest stages of life's evolution – numerous spliceosomal introns invading genes of the emerging eukaryote during eukaryogenesis – subsequent lineage-specific loss and gain of introns. The intron invasion, probably, spawned by the mitochondrial endosymbiont, might have critically contributed to the emergence of the principal features of the eukaryotic cell. This scenario combines aspects of the introns-early and introns-late views.
Reviewers
this article was reviewed by W. Ford Doolittle, James Darnell (nominated by W. Ford Doolittle), William Martin, and Anthony Poole.
doi:10.1186/1745-6150-1-22
PMCID: PMC1570339  PMID: 16907971
7.  Discovery and Analysis of Evolutionarily Conserved Intronic Splicing Regulatory Elements 
PLoS Genetics  2007;3(5):e85.
Knowledge of the functional cis-regulatory elements that regulate constitutive and alternative pre-mRNA splicing is fundamental for biology and medicine. Here we undertook a genome-wide comparative genomics approach using available mammalian genomes to identify conserved intronic splicing regulatory elements (ISREs). Our approach yielded 314 ISREs, and insertions of ~70 ISREs between competing splice sites demonstrated that 84% of ISREs altered 5′ and 94% altered 3′ splice site choice in human cells. Consistent with our experiments, comparisons of ISREs to known splicing regulatory elements revealed that 40%–45% of ISREs might have dual roles as exonic splicing silencers. Supporting a role for ISREs in alternative splicing, we found that 30%–50% of ISREs were enriched near alternatively spliced (AS) exons, and included almost all known binding sites of tissue-specific alternative splicing factors. Further, we observed that genes harboring ISRE-proximal exons have biases for tissue expression and molecular functions that are ISRE-specific. Finally, we discovered that for Nova1, neuronal PTB, hnRNP C, and FOX1, the most frequently occurring ISRE proximal to an alternative conserved exon in the splicing factor strongly resembled its own known RNA binding site, suggesting a novel application of ISRE density and the propensity for splicing factors to auto-regulate to associate RNA binding sites to splicing factors. Our results demonstrate that ISREs are crucial building blocks in understanding general and tissue-specific AS regulation and the biological pathways and functions regulated by these AS events.
Author Summary
During RNA splicing, sequences (introns) in a pre-mRNA are excised and discarded, and the remaining sequences (exons) are joined to form the mature RNA. Splicing is regulated not only by the binding of the basic splicing machinery to splice sites located at the exon–intron boundaries, but also by the combined effects of various other splicing factors that bind to a multitude of sequence elements located both in the exons as well as the flanking introns. Instances of alternative splicing, where usage of splice site(s) is incomplete or different between tissues, cell types, or lineages, can be created by the interaction of sequence elements and tissue, cell type, and stage-specific splicing factors. To better understand constitutive and alternative pre-mRNA splicing, the authors describe a comparative genomics approach, using available mammalian genomes, to systematically identify splicing regulatory elements located in the introns proximal to exons. A quarter of the elements were tested experimentally, and most of them altered splicing in human cells. The authors also showed that that the intronic elements are close to tissue-specific alternative exons and are more likely to be located in specific positions in the introns, suggestive of potential regulatory function. These elements are also frequently found in tissue-specific genes, suggesting a coupling between expression and alternative splicing of these genes. Finally, the authors propose a strategy using the elements to identify the binding sites of several splicing factors.
doi:10.1371/journal.pgen.0030085
PMCID: PMC1877881  PMID: 17530930
8.  AU-rich intronic elements affect pre-mRNA 5' splice site selection in Drosophila melanogaster. 
Molecular and Cellular Biology  1993;13(12):7689-7697.
cis-spliced nuclear pre-mRNA introns found in a variety of organisms, including Tetrahymena thermophila, Drosophila melanogaster, Caenorhabditis elegans, and plants, are significantly richer in adenosine and uridine residues than their flanking exons are. The functional significance of this intronic AU richness, however, has been demonstrated only in plant nuclei. In these nuclei, 5' and 3' splice sites are selected in part by their positions relative to AU-rich elements spread throughout the length of an intron. Because of this position-dependent selection scheme, a 5' splice site at the normal (+1) exon-intron boundary having only three contiguous consensus nucleotides can compete effectively with an enhanced exonic site (-57E) having nine consensus nucleotides and outcompete an enhanced site (+106E) embedded within the AU-rich intron. To determine whether transitions from AU-poor exonic sequences to AU-rich intronic sequences influence 5' splice site selection in other organisms, alleles of the pea rbcS3A1 intron were expressed in Drosophila Schneider 2 cells, and their splicing patterns were compared with those in tobacco nuclei. We demonstrate that this heterologous transcript can be accurately spliced in transfected Drosophila nuclei and that a +1 G-to-A knockout mutation at the normal splice site activates the same three cryptic 5' splice sites as in tobacco. Enhancement of the exonic (-57) and intronic (+106) sites to consensus splice sites indicates that potential splice sites located in the upstream exon or at the 5' exon-intron boundary are preferred in Drosophila cells over those embedded within AU-rich intronic sequences. In contrast to tobacco, in which the activities of two competing 5' splice sites upstream of the AU-rich intron are modulated by their proximity to the AU transition point, D. melanogaster utilizes the upstream site which has a higher proportion of consensus nucleotides. The enhanced version of the cryptic intronic site is efficiently selected in D. melanogaster when the normal +1 site is weakened or discrete AU-rich elements upstream of the +106E site are disrupted. Selection of this internal site in tobacco requires more drastic disruption of these motifs. We conclude that 5' splice site selection in Drosophila nuclei is influenced by the intrinsic strengths of competing sites and by the presence of AU-rich intronic elements but to a different extent than in tobacco.
Images
PMCID: PMC364840  PMID: 8246985
9.  Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays 
Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families.
Synopsis
Alternative splicing expands the protein-coding potential of genes and genomes. RNAs copied from a gene can be spliced differently to produce distinct proteins under regulatory influences that arise during development or upon environmental change. These authors present a global analysis of alternative splicing in the mouse, using microarray measurements of splicing from 22 adult tissues. The ability to measure thousands of splicing events across the genome in many tissues has allowed the capture of co-regulated sets of exons whose inclusion in mRNA occurs preferentially in a given set of tissues. An examination of the sequences associated with exons whose expression is regulated in brain or muscle as compared to other tissues reveals extreme conservation of intron sequences nearby the regulated exon. These conserved regions contain sequence motifs likely to contribute to the regulation of alternative splicing in brain and muscle cells. The availability of global gene expression data with splicing level resolution should spur the development of computational methods for detecting and predicting alternative splicing and its regulation. In addition, the authors make strong predictions for biological experiments leading to the identification of components and their mechanisms of action in the regulation of splicing during mammalian development.
doi:10.1371/journal.pcbi.0020004
PMCID: PMC1331982  PMID: 16424921
10.  Large introns in relation to alternative splicing and gene evolution: a case study of Drosophila bruno-3 
BMC Genetics  2009;10:67.
Background
Alternative splicing (AS) of maturing mRNA can generate structurally and functionally distinct transcripts from the same gene. Recent bioinformatic analyses of available genome databases inferred a positive correlation between intron length and AS. To study the interplay between intron length and AS empirically and in more detail, we analyzed the diversity of alternatively spliced transcripts (ASTs) in the Drosophila RNA-binding Bruno-3 (Bru-3) gene. This gene was known to encode thirteen exons separated by introns of diverse sizes, ranging from 71 to 41,973 nucleotides in D. melanogaster. Although Bru-3's structure is expected to be conducive to AS, only two ASTs of this gene were previously described.
Results
Cloning of RT-PCR products of the entire ORF from four species representing three diverged Drosophila lineages provided an evolutionary perspective, high sensitivity, and long-range contiguity of splice choices currently unattainable by high-throughput methods. Consequently, we identified three new exons, a new exon fragment and thirty-three previously unknown ASTs of Bru-3. All exon-skipping events in the gene were mapped to the exons surrounded by introns of at least 800 nucleotides, whereas exons split by introns of less than 250 nucleotides were always spliced contiguously in mRNA. Cases of exon loss and creation during Bru-3 evolution in Drosophila were also localized within large introns. Notably, we identified a true de novo exon gain: exon 8 was created along the lineage of the obscura group from intronic sequence between cryptic splice sites conserved among all Drosophila species surveyed. Exon 8 was included in mature mRNA by the species representing all the major branches of the obscura group. To our knowledge, the origin of exon 8 is the first documented case of exonization of intronic sequence outside vertebrates.
Conclusion
We found that large introns can promote AS via exon-skipping and exon turnover during evolution likely due to frequent errors in their removal from maturing mRNA. Large introns could be a reservoir of genetic diversity, because they have a greater number of mutable sites than short introns. Taken together, gene structure can constrain and/or promote gene evolution.
doi:10.1186/1471-2156-10-67
PMCID: PMC2767349  PMID: 19840385
11.  Human GC-AG alternative intron isoforms with weak donor sites show enhanced consensus at acceptor exon positions 
Nucleic Acids Research  2001;29(12):2581-2593.
It has been previously observed that the intrinsically weak variant GC donor sites, in order to be recognized by the U2-type spliceosome, possess strong consensus sequences maximized for base pair formation with U1 and U5/U6 snRNAs. However, variability in signal strength is a fundamental mechanism for splice site selection in alternative splicing. Here we report human alternative GC-AG introns (for the first time from any species), and show that while constitutive GC-AG introns do possess strong signals at their donor sites, a large subset of alternative GC-AG introns possess weak consensus sequences at their donor sites. Surprisingly, this subset of alternative isoforms shows strong consensus at acceptor exon positions 1 and 2. The improved consensus at the acceptor exon can facilitate a strong interaction with U5 snRNA, which tethers the two exons for ligation during the second step of splicing. Further, these isoforms nearly always possess alternative acceptor sites and exhibit particularly weak polypyrimidine tracts characteristic of AG-dependent introns. The acceptor exon nucleotides are part of the consensus required for the U2AF35-mediated recognition of AG in such introns. Such improved consensus at acceptor exons is not found in either normal or alternative GT-AG introns having weak donor sites or weak polypyrimidine tracts. The changes probably reflect mechanisms that allow GC-AG alternative intron isoforms to cope with two conflicting requirements, namely an apparent need for differential splice strength to direct the choice of alternative sites and a need for improved donor signals to compensate for the central mismatch base pair (C-A) in the RNA duplex of U1 snRNA and the pre-mRNA. The other important findings include (i) one in every twenty alternative introns is a GC-AG intron, and (ii) three of every five observed GC-AG introns are alternative isoforms.
PMCID: PMC55748  PMID: 11410667
12.  The Emergence of Alternative 3′ and 5′ Splice Site Exons from Constitutive Exons 
PLoS Computational Biology  2007;3(5):e95.
Alternative 3′ and 5′ splice site (ss) events constitute a significant part of all alternative splicing events. These events were also found to be related to several aberrant splicing diseases. However, only few of the characteristics that distinguish these events from alternative cassette exons are known currently. In this study, we compared the characteristics of constitutive exons, alternative cassette exons, and alternative 3′ss and 5′ss exons. The results revealed that alternative 3′ss and 5′ss exons are an intermediate state between constitutive and alternative cassette exons, where the constitutive side resembles constitutive exons, and the alternative side resembles alternative cassette exons. The results also show that alternative 3′ss and 5′ss exons exhibit low levels of symmetry (frame-preserving), similar to constitutive exons, whereas the sequence between the two alternative splice sites shows high symmetry levels, similar to alternative cassette exons. In addition, flanking intronic conservation analysis revealed that exons whose alternative splice sites are at least nine nucleotides apart show a high conservation level, indicating intronic participation in the regulation of their splicing, whereas exons whose alternative splice sites are fewer than nine nucleotides apart show a low conservation level. Further examination of these exons, spanning seven vertebrate species, suggests an evolutionary model in which the alternative state is a derivative of an ancestral constitutive exon, where a mutation inside the exon or along the flanking intron resulted in the creation of a new splice site that competes with the original one, leading to alternative splice site selection. This model was validated experimentally on four exons, showing that they indeed originated from constitutive exons that acquired a new competing splice site during evolution.
Author Summary
Alternative splicing is the mechanism that is responsible for the creation of multiple mRNA products from a single gene. It is considered a key player in genomic complexity achievement. Alternative 3′ and 5′ splicing events in which part of the exon is alternatively included or excluded in the mRNA constitute a significant part of all alternative splicing events, and yet little is known regarding their regulation mechanism and the evolutionary background that led to their creation. We show that alternative 3′ and 5′ splice site exons resemble constitutive exons. However, their alternative sequence resembles alternative cassette exons. Comparative genomics spanning seven vertebrate species suggests an evolutionary model in which the alternative state is a derivative of an ancestral constitutive exon, where a mutation inside the exon or along the flanking intron resulted in the creation of a new splice site that competes with the original one, leading to alternative splice site selection. This model was validated experimentally, showing that during evolution mutations shifted constitutive exons to undergo alternative 3′ and 5′ splicing.
doi:10.1371/journal.pcbi.0030095
PMCID: PMC1876488  PMID: 17530917
13.  Modeling the evolution dynamics of exon-intron structure with a general random fragmentation process 
Background
Most eukaryotic genes are interrupted by spliceosomal introns. The evolution of exon-intron structure remains mysterious despite rapid advance in genome sequencing technique. In this work, a novel approach is taken based on the assumptions that the evolution of exon-intron structure is a stochastic process, and that the characteristics of this process can be understood by examining its historical outcome, the present-day size distribution of internal translated exons (exon). Through the combination of simulation and modeling the size distribution of exons in different species, we propose a general random fragmentation process (GRFP) to characterize the evolution dynamics of exon-intron structure. This model accurately predicts the probability that an exon will be split by a new intron and the distribution of novel insertions along the length of the exon.
Results
As the first observation from this model, we show that the chance for an exon to obtain an intron is proportional to its size to the 3rd power. We also show that such size dependence is nearly constant across gene, with the exception of the exons adjacent to the 5′ UTR. As the second conclusion from the model, we show that intron insertion loci follow a normal distribution with a mean of 0.5 (center of the exon) and a standard deviation of 0.11. Finally, we show that intron insertions within a gene are independent of each other for vertebrates, but are more negatively correlated for non-vertebrate. We use simulation to demonstrate that the negative correlation might result from significant intron loss during evolution, which could be explained by selection against multi-intron genes in these organisms.
Conclusions
The GRFP model suggests that intron gain is dynamic with a higher chance for longer exons; introns are inserted into exons randomly with the highest probability at the center of the exon. GRFP estimates that there are 78 introns in every 10 kb coding sequences for vertebrate genomes, agreeing with empirical observations. GRFP also estimates that there are significant intron losses in the evolution of non-vertebrate genomes, with extreme cases of around 57% intron loss in Drosophila melanogaster, 28% in Caenorhabditis elegans, and 24% in Oryza sativa.
doi:10.1186/1471-2148-13-57
PMCID: PMC3732091  PMID: 23448166
Evolution of exon-intron structure; General random fragmentation process; Simulation
14.  Genome-Wide Association between Branch Point Properties and Alternative Splicing 
PLoS Computational Biology  2010;6(11):e1001016.
The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3′ end of introns, with distance to the 3′ splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.
Author Summary
From transcription to translation, the events underlying protein production from DNA sequence are paramount to all aspects of cellular function. Pre-mRNAs in eukaryotes undergo several processing steps prior to their export to the cytoplasm. Among these, splicing – the process of intron removal and exon ligation – has been shown to play a central role in the regulation of gene expression. It has been estimated that more than half of the disease-causing mutations in humans do so by interfering with splicing. The difficulty in describing these disease mechanisms often lies in the low accuracy of the methods for prediction of functional splicing signals in the pre-mRNA. This is especially the case of the branch point, mainly due to its high sequence variability. We have developed a methodology for mammalian branch point prediction based on a machine-learning algorithm, which shows improved accuracy over previous published methods. Moreover, using a combination of experimental and bioinformatics approaches, we uncovered important positional properties of the branch point and shed new light on how some of its features may contribute to the final splicing outcome. These findings might prove useful for a better understanding of how splicing-associated mutations can lead to disease.
doi:10.1371/journal.pcbi.1001016
PMCID: PMC2991248  PMID: 21124863
15.  Efficient internal exon recognition depends on near equal contributions from the 3′ and 5′ splice sites 
Nucleic Acids Research  2011;39(20):8928-8937.
Pre-mRNA splicing is carried out by the spliceosome, which identifies exons and removes intervening introns. In vertebrates, most splice sites are initially recognized by the spliceosome across the exon, because most exons are small and surrounded by large introns. This gene architecture predicts that efficient exon recognition depends largely on the strength of the flanking 3′ and 5′ splice sites. However, it is unknown if the 3′ or the 5′ splice site dominates the exon recognition process. Here, we test the 3′ and 5′ splice site contributions towards efficient exon recognition by systematically replacing the splice sites of an internal exon with sequences of different splice site strengths. We show that the presence of an optimal splice site does not guarantee exon inclusion and that the best predictor for exon recognition is the sum of both splice site scores. Using a genome-wide approach, we demonstrate that the combined 3′ and 5′ splice site strengths of internal exons provide a much more significant separator between constitutive and alternative exons than either the 3′ or the 5′ splice site strength alone.
doi:10.1093/nar/gkr481
PMCID: PMC3203598  PMID: 21795381
16.  Evolution of Alternative Splicing Regulation: Changes in Predicted Exonic Splicing Regulators Are Not Associated with Changes in Alternative Splicing Levels in Primates 
PLoS ONE  2009;4(6):e5800.
Alternative splicing is tightly regulated in a spatio-temporal and quantitative manner. This regulation is achieved by a complex interplay between spliceosomal (trans) factors that bind to different sequence (cis) elements. cis-elements reside in both introns and exons and may either enhance or silence splicing. Differential combinations of cis-elements allows for a huge diversity of overall splicing signals, together comprising a complex ‘splicing code’. Many cis-elements have been identified, and their effects on exon inclusion levels demonstrated in reporter systems. However, the impact of interspecific differences in these elements on the evolution of alternative splicing levels has not yet been investigated at genomic level. Here we study the effect of interspecific differences in predicted exonic splicing regulators (ESRs) on exon inclusion levels in human and chimpanzee. For this purpose, we compiled and studied comprehensive datasets of predicted ESRs, identified by several computational and experimental approaches, as well as microarray data for changes in alternative splicing levels between human and chimpanzee. Surprisingly, we found no association between changes in predicted ESRs and changes in alternative splicing levels. This observation holds across different ESR exon positions, exon lengths, and 5′ splice site strengths. We suggest that this lack of association is mainly due to the great importance of context for ESR functionality: many ESR-like motifs in primates may have little or no effect on splicing, and thus interspecific changes at short-time scales may primarily occur in these effectively neutral ESRs. These results underscore the difficulties of using current computational ESR prediction algorithms to identify truly functionally important motifs, and provide a cautionary tale for studies of the effect of SNPs on splicing in human disease.
doi:10.1371/journal.pone.0005800
PMCID: PMC2686173  PMID: 19495418
17.  Theory on the Coupled Stochastic Dynamics of Transcription and Splice-Site Recognition 
PLoS Computational Biology  2012;8(11):e1002747.
Eukaryotic genes are typically split into exons that need to be spliced together to form the mature mRNA. The splicing process depends on the dynamics and interactions among transcription by the RNA polymerase II complex (RNAPII) and the spliceosomal complex consisting of multiple small nuclear ribonucleo proteins (snRNPs). Here we propose a biophysically plausible initial theory of splicing that aims to explain the effects of the stochastic dynamics of snRNPs on the splicing patterns of eukaryotic genes. We consider two different ways to model the dynamics of snRNPs: pure three-dimensional diffusion and a combination of three- and one-dimensional diffusion along the emerging pre-mRNA. Our theoretical analysis shows that there exists an optimum position of the splice sites on the growing pre-mRNA at which the time required for snRNPs to find the 5′ donor site is minimized. The minimization of the overall search time is achieved mainly via the increase in non-specific interactions between the snRNPs and the growing pre-mRNA. The theory further predicts that there exists an optimum transcript length that maximizes the probabilities for exons to interact with the snRNPs. We evaluate these theoretical predictions by considering human and mouse exon microarray data as well as RNAseq data from multiple different tissues. We observe that there is a broad optimum position of splice sites on the growing pre-mRNA and an optimum transcript length, which are roughly consistent with the theoretical predictions. The theoretical and experimental analyses suggest that there is a strong interaction between the dynamics of RNAPII and the stochastic nature of snRNP search for 5′ donor splicing sites.
Author Summary
The DNA encoding most eukaryotic genes is interrupted by long sequences called introns. These introns need to be removed through the process of splicing to produce the mature messenger RNA. The process of splicing plays a critical role in determining the exact aminoacid content of the ensuing protein. Several molecules denominated small nuclear ribonucleo proteins (snRNPs) are involved in finding the appropriate 5′ donor splicing sites for splicing. Transcription and splicing occur simultaneously and the ultimate product depends on the relative speed of transcription and the stochastic dynamics underlying splicing. Here we propose a biophysically plausible theory that describes the ongoing interactions between transcription and splicing. We show that the theoretical predictions are consistent with experimental measurements of the abundance patterns of different exons and transcripts across tissues.
doi:10.1371/journal.pcbi.1002747
PMCID: PMC3486868  PMID: 23133354
18.  Depolarization and CaM Kinase IV Modulate NMDA Receptor Splicing through Two Essential RNA Elements 
PLoS Biology  2007;5(2):e40.
Alternative splicing controls the activity of many proteins important for neuronal excitation, but the signal-transduction pathways that affect spliced isoform expression are not well understood. One particularly interesting system of alternative splicing is exon 21 (E21) of the NMDA receptor 1 (NMDAR1 E21), which controls the trafficking of NMDA receptors to the plasma membrane and is repressed by Ca++/calmodulin-dependent protein kinase (CaMK) IV signaling. Here, we characterize the splicing of NMDAR1 E21. We find that E21 splicing is reversibly repressed by neuronal depolarization, and we identify two RNA elements within the exon that function together to mediate the inducible repression. One of these exonic elements is similar to an intronic CaMK IV–responsive RNA element (CaRRE) originally identified in the 3′ splice site of the BK channel STREX exon, but not previously observed within an exon. The other element is a new RNA motif. Introduction of either of these two motifs, called CaRRE type 1 and CaRRE type 2, into a heterologous constitutive exon can confer CaMK IV–dependent repression on the new exon. Thus, either exonic CaRRE can be sufficient for CaMK IV–induced repression. Single nucleotide scanning mutagenesis defined consensus sequences for these two CaRRE motifs. A genome-wide motif search and subsequent RT-PCR validation identified a group of depolarization-regulated alternative exons carrying CaRRE consensus sequences. Many of these exons are likely to alter neuronal function. Thus, these two RNA elements define a group of co-regulated splicing events that respond to a common stimulus in neurons to alter their activity.
Alternative splicing of NMDA receptor 1 exon 21 is reversibly repressed by depolarization in a CaMK IV-dependent manner in neurons. This suggests splicing is finely tuned by dynamic activity inputs.
Author Summary
Multiple mechanisms direct changes in neuronal activity in response to external stimuli, ranging from short-acting modifications of membrane proteins to longer-acting changes in gene expression. A frequently regulated step in gene expression is the pre-mRNA splicing reaction in which the inclusion of exons (protein-coding sequences) or the position of splice sites produces alternatively spliced mRNA isoforms encoding functionally different proteins. Here, we study splicing of the NMDA receptor, which responds to the neurotransmitter glutamate to modify neuronal activity. We show that the splicing of an important exon (E21) in the NMDA receptor subunit NR1 mRNA is repressed by cell depolarization and activation of the intracellular signaling molecule, CaMK IV. We find that this splicing repression is mediated by two regulatory sequences within the exon itself. One sequence is similar to a previously described regulatory element that had not been known to function in an exon. The other is a new element. The characterization of these elements as a family of degenerate sequences allowed the identification of a group of exons sharing responsiveness to cell depolarization and CamK IV. These results define a new set of gene expression changes that may occur in modulating neuronal activity.
doi:10.1371/journal.pbio.0050040
PMCID: PMC1790950  PMID: 17298178
19.  Challenging the spliceosome machine 
Genome Biology  2006;7(1):R3.
Analysis of a set of almost 25,000 donor and acceptor splice sites in Drosophila shows that information content increases near splice sites flanking very long of very short introns and exons.
Background
Using cDNA copies of transcripts and corresponding genomic sequences from the Berkeley Drosophila Genome Project, a set of 24,753 donor and acceptor splice sites were computed with a scanning algorithm that tested for single nucleotide insertion, deletion and substitution polymorphisms. Using this dataset, we developed a progressive partitioning approach to examining the effects of challenging the spliceosome system.
Results
Our analysis shows that information content increases near splice sites flanking progressively longer introns and exons, suggesting that longer splice elements require stronger binding of spliceosome components. Information also increases at splice sites near very short introns and exons, suggesting that short splice elements have crowding problems. We observe that the information found at individual splice sites depends upon a balance of splice element lengths in the vicinity, including both flanking and non-adjacent introns and exons.
Conclusion
These results suggest an interdependence of multiple splicing events along the pre-mRNA, which may have implications for how the macromolecular spliceosome machine processes sets of neighboring splice sites.
doi:10.1186/gb-2006-7-1-r3
PMCID: PMC1431713  PMID: 16507135
20.  The strength of the HIV-1 3' splice sites affects Rev function 
Retrovirology  2006;3:89.
Background
The HIV-1 Rev protein is a key component in the early to late switch in HIV-1 splicing from early intronless (e.g. tat, rev) to late intron-containing Rev-dependent (e.g. gag, vif, env) transcripts. Previous results suggested that cis-acting sequences and inefficient 5' and 3' splice sites are a prerequisite for Rev function. However, we and other groups have shown that two of the HIV-1 5' splice sites, D1 and D4, are efficiently used in vitro and in vivo. Here, we focus on the efficiency of the HIV-1 3' splice sites taking into consideration to what extent their intrinsic efficiencies are modulated by their downstream cis-acting exonic sequences. Furthermore, we delineate their role in RNA stabilization and Rev function.
Results
In the presence of an efficient upstream 5' splice site the integrity of the 3' splice site is not essential for Rev function whereas an efficient 3' splice site impairs Rev function. The detrimental effect of a strong 3' splice site on the amount of Rev-dependent intron-containing HIV-1 glycoprotein coding (env) mRNA is not compensatable by weakening the strength of the upstream 5' splice site. Swapping the HIV-1 3' splice sites in an RRE-containing minigene, we found a 3' splice site usage which was variably dependent on the presence of the usual downstream exonic sequence. The most evident activation of 3' splice site usage by its usual downstream exonic sequence was observed for 3' splice site A1 which was turned from an intrinsic very weak 3' splice site into the most active 3' splice site, even abolishing Rev activity. Performing pull-down experiments with nuclear extracts of HeLa cells we identified a novel ASF/SF2-dependent exonic splicing enhancer (ESE) within HIV-1 exon 2 consisting of a heptameric sequence motif occurring twice (M1 and M2) within this short non-coding leader exon. Single point mutation of M1 within an infectious molecular clone is detrimental for HIV-1 exon 2 recognition without affecting Rev-dependent vif expression.
Conclusion
Under the conditions of our assay, the rate limiting step of retroviral splicing, competing with Rev function, seems to be exclusively determined by the functional strength of the 3' splice site. The bipartite ASF/SF2-dependent ESE within HIV-1 exon 2 supports cross-talk between splice site pairs across exon 2 (exon definition) which is incompatible with processing of the intron-containing vif mRNA. We propose that Rev mediates a switch from exon to intron definition necessary for the expression of all intron-containing mRNAs.
doi:10.1186/1742-4690-3-89
PMCID: PMC1697824  PMID: 17144911
21.  Origin and evolution of spliceosomal introns 
Biology Direct  2012;7:11.
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
doi:10.1186/1745-6150-7-11
PMCID: PMC3488318  PMID: 22507701
Intron sliding; Intron gain; Intron loss; Spliceosome; Splicing signals; Evolution of exon/intron structure; Alternative splicing; Phylogenetic trees; Mobile domains; Eukaryotic ancestor
22.  The Pivotal Roles of TIA Proteins in 5′ Splice-Site Selection of Alu Exons and Across Evolution 
PLoS Genetics  2009;5(11):e1000717.
More than 5% of alternatively spliced internal exons in the human genome are derived from Alu elements in a process termed exonization. Alus are comprised of two homologous arms separated by an internal polypyrimidine tract (PPT). In most exonizations, splice sites are selected from within the same arm. We hypothesized that the internal PPT may prevent selection of a splice site further downstream. Here, we demonstrate that this PPT enhanced the selection of an upstream 5′ splice site (5′ss), even in the presence of a stronger 5′ss downstream. Deletion of this PPT shifted selection to the stronger downstream 5′ss. This enhancing effect depended on the strength of the downstream 5′ss, on the efficiency of base-pairing to U1 snRNA, and on the length of the PPT. This effect of the PPT was mediated by the binding of TIA proteins and was dependent on the distance between the PPT and the upstream 5′ss. A wide-scale evolutionary analysis of introns across 22 eukaryotes revealed an enrichment in PPTs within ∼20 nt downstream of the 5′ss. For most metazoans, the strength of the 5′ss inversely correlated with the presence of a downstream PPT, indicative of the functional role of the PPT. Finally, we found that the proteins that mediate this effect, TIA and U1C, and in particular their functional domains, are highly conserved across evolution. Overall, these findings expand our understanding of the role of TIA1/TIAR proteins in enhancing recognition of exons, in general, and Alu exons, in particular.
Author Summary
Human genes are composed of functional regions, termed exons, separated by non-functional regions, termed introns. Intronic sequences may gradually accumulate mutations and subsequently become recognized by the splicing machinery as exons, a process termed exonization. Alu elements are prone to undergo exonization: more than 5% of alternatively spliced internal exons in the human genome originate from Alu elements. A typical Alu element is ∼300 nucleotides long, consisting of two arms separated by a polypyrimdine tract (PPT). Interestingly, in most cases, exonization occurs almost exclusively within either the right arm or the left, not both. Here we found that the PPT between the two arms serves as a binding site for TIA proteins and prevents the exon selection process from expanding into downstream regions. To obtain a wider overview of TIA function, we performed a cross-evolutionary analysis within 22 eukaryotes of this protein and of U1C, a protein known to interact with it, and found that functional regions of both these proteins were highly conserved. These findings highlight the pivotal role of TIA proteins in 5′ splice-site selection of Alu exons and exon recognition in general.
doi:10.1371/journal.pgen.1000717
PMCID: PMC2766253  PMID: 19911040
23.  Splicing signals in Drosophila: intron size, information content, and consensus sequences. 
Nucleic Acids Research  1992;20(16):4255-4262.
A database of 209 Drosophila introns was extracted from Genbank (release number 64.0) and examined by a number of methods in order to characterize features that might serve as signals for messenger RNA splicing. A tight distribution of sizes was observed: while the smallest introns in the database are 51 nucleotides, more than half are less than 80 nucleotides in length, and most of these have lengths in the range of 59-67 nucleotides. Drosophila splice sites found in large and small introns differ in only minor ways from each other and from those found in vertebrate introns. However, larger introns have greater pyrimidine-richness in the region between 11 and 21 nucleotides upstream of 3' splice sites. The Drosophila branchpoint consensus matrix resembles C T A A T (in which branch formation occurs at the underlined A), and differs from the corresponding mammalian signal in the absence of G at the position immediately preceding the branchpoint. The distribution of occurrences of this sequence suggests a minimum distance between 5' splice sites and branchpoints of about 38 nucleotides, and a minimum distance between 3' splice sites and branchpoints of 15 nucleotides. The methods we have used detect no information in exon sequences other than in the few nucleotides immediately adjacent to the splice sites. However, Drosophila resembles many other species in that there is a discontinuity in A + T content between exons and introns, which are A + T rich.
PMCID: PMC334133  PMID: 1508718
24.  Violating the splicing rules: TG dinucleotides function as alternative 3' splice sites in U2-dependent introns 
Genome Biology  2007;8(8):R154.
TG dinucleotides functioning as alternative 3' splice sites were identified and experimentally verified in 36 human genes.
Background
Despite some degeneracy of sequence signals that govern splicing of eukaryotic pre-mRNAs, it is an accepted rule that U2-dependent introns exhibit the 3' terminal dinucleotide AG. Intrigued by anecdotal evidence for functional non-AG 3' splice sites, we carried out a human genome-wide screen.
Results
We identified TG dinucleotides functioning as alternative 3' splice sites in 36 human genes. The TG-derived splice variants were experimentally validated with a success rate of 92%. Interestingly, ratios of alternative splice variants are tissue-specific for several introns. TG splice sites and their flanking intron sequences are substantially conserved between orthologous vertebrate genes, even between human and frog, indicating functional relevance. Remarkably, TG splice sites are exclusively found as alternative 3' splice sites, never as the sole 3' splice site for an intron, and we observed a distance constraint for TG-AG splice site tandems.
Conclusion
Since TGs splice sites are exclusively found as alternative 3' splice sites, the U2 spliceosome apparently accomplishes perfect specificity for 3' AGs at an early splicing step, but may choose 3' TGs during later steps. Given the tiny fraction of TG 3' splice sites compared to the vast amount of non-viable TGs, cis-acting sequence signals must significantly contribute to splice site definition. Thus, we consider TG-AG 3' splice site tandems as promising subjects for studies on the mechanisms of 3' splice site selection.
doi:10.1186/gb-2007-8-8-r154
PMCID: PMC2374985  PMID: 17672918
25.  Patterns of exon-intron architecture variation of genes in eukaryotic genomes 
BMC Genomics  2009;10:47.
Background
The origin and importance of exon-intron architecture comprises one of the remaining mysteries of gene evolution. Several studies have investigated the variations of intron length, GC content, ordinal position in a gene and divergence. However, there is little study about the structural variation of exons and introns.
Results
We investigated the length, GC content, ordinal position and divergence in both exons and introns of 13 eukaryotic genomes, representing plant and animal. Our analyses revealed that three basic patterns of exon-intron variation were present in nearly all analyzed genomes (P < 0.001 in most cases): an ordinal reduction of length and divergence in both exon and intron, a co-variation between exon and its flanking introns in their length, GC content and divergence, and a decrease of average exon (or intron) length, GC content and divergence as the total exon numbers of a gene increased. In addition, we observed that the shorter introns had either low or high GC content, and the GC content of long introns was intermediate.
Conclusion
Although the factors contributing to these patterns have not been identified, our results provide three important clues: common factor(s) exist and may shape both exons and introns; the ordinal reduction patterns may reflect a time-orderly evolution; and the larger first and last exons may be splicing-required. These clues provide a framework for elucidating mechanisms involved in the organization of eukaryotic genomes and particularly in building exon-intron structures.
doi:10.1186/1471-2164-10-47
PMCID: PMC2636830  PMID: 19166620

Results 1-25 (1286808)