PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (506483)

Clipboard (0)
None

Related Articles

1.  A probabilistic framework for aligning paired-end RNA-seq data 
Bioinformatics  2010;26(16):1950-1957.
Motivation: The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment.
Methods: A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment.
Results: The method was applied to 2 × 35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT–PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009).
Availability: Software available at http://www.netlab.uky.edu/p/bioinfo/MapSplice/PER
Contact: liuj@cs.uky.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq336
PMCID: PMC2916723  PMID: 20576625
2.  Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution 
PLoS Biology  2012;10(1):e1001229.
Inclusion or exclusion of single codons at the splice acceptor site of mammalian genes is regulated in a tissue-specific manner, is strongly conserved, and is associated with local accelerated protein evolution.
Thousands of human genes contain introns ending in NAGNAG (N any nucleotide), where both NAGs can function as 3′ splice sites, yielding isoforms that differ by inclusion/exclusion of three bases. However, few models exist for how such splicing might be regulated, and some studies have concluded that NAGNAG splicing is purely stochastic and nonfunctional. Here, we used deep RNA-Seq data from 16 human and eight mouse tissues to analyze the regulation and evolution of NAGNAG splicing. Using both biological and technical replicates to estimate false discovery rates, we estimate that at least 25% of alternatively spliced NAGNAGs undergo tissue-specific regulation in mammals, and alternative splicing of strongly tissue-specific NAGNAGs was 10 times as likely to be conserved between species as was splicing of non-tissue-specific events, implying selective maintenance. Preferential use of the distal NAG was associated with distinct sequence features, including a more distal location of the branch point and presence of a pyrimidine immediately before the first NAG, and alteration of these features in a splicing reporter shifted splicing away from the distal site. Strikingly, alignments of orthologous exons revealed a ∼15-fold increase in the frequency of three base pair gaps at 3′ splice sites relative to nearby exon positions in both mammals and in Drosophila. Alternative splicing of NAGNAGs in human was associated with dramatically increased frequency of exon length changes at orthologous exon boundaries in rodents, and a model involving point mutations that create, destroy, or alter NAGNAGs can explain both the increased frequency and biased codon composition of gained/lost sequence observed at the beginnings of exons. This study shows that NAGNAG alternative splicing generates widespread differences between the proteomes of mammalian tissues, and suggests that the evolutionary trajectories of mammalian proteins are strongly biased by the locations and phases of the introns that interrupt coding sequences.
Author Summary
In order to translate a gene into protein, all of the non-coding regions (introns) need to be removed from the transcript and the coding regions (exons) stitched back together to make an mRNA. Most human genes are alternatively spliced, allowing the selection of different combinations of exons to produce multiple distinct mRNAs and proteins. Many types of alternative splicing are known to play crucial roles in biological processes including cell fate determination, tumor metabolism, and apoptosis. In this study, we investigated a form of alternative splicing in which competing adjacent 3′ splice sites (or splice acceptor sites) generate mRNAs differing by just an RNA triplet, the size of a single codon. This mode of alternative splicing, known as NAGNAG splicing, affects thousands of human genes and has been known for a decade, but its potential regulation, physiological importance, and conservation across species have been disputed. Using high-throughput sequencing of cDNA (“RNA-Seq”) from human and mouse tissues, we found that single-codon splicing often shows strong tissue specificity. Regulated NAGNAG alternative splice sites are selectively conserved between human and mouse genes, suggesting that they are important for organismal fitness. We identified features of the competing splice sites that influence NAGNAG splicing, and validated their effects in cultured cells. Furthermore, we found that this mode of splicing is associated with accelerated and highly biased protein evolution at exon boundaries. Taken together, our analyses demonstrate that the inclusion or exclusion of RNA triplets at exon boundaries can be effectively regulated by the splicing machinery, and highlight an unexpected connection between RNA processing and protein evolution.
doi:10.1371/journal.pbio.1001229
PMCID: PMC3250501  PMID: 22235189
3.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) 
Bioinformatics  2011;27(18):2518-2528.
Motivation: A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously.
Results: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription–polymerase chain reaction (RT–PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability.
Availability: The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM).
Contact: ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu
Supplementary Information:The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.
doi:10.1093/bioinformatics/btr427
PMCID: PMC3167048  PMID: 21775302
4.  Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki) 
BMC Bioinformatics  2013;14:320.
Background
The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment.
Results
We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki.
Conclusions
Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.
doi:10.1186/1471-2105-14-320
PMCID: PMC3827500  PMID: 24209455
5.  MADS+: discovery of differential splicing events from Affymetrix exon junction array data 
Bioinformatics  2009;26(2):268-269.
Motivation: The Affymetrix Human Exon Junction Array is a newly designed high-density exon-sensitive microarray for global analysis of alternative splicing. Contrary to the Affymetrix exon 1.0 array, which only contains four probes per exon and no probes for exon–exon junctions, this new junction array averages eight probes per probeset targeting all exons and exon–exon junctions observed in the human mRNA/EST transcripts, representing a significant increase in the probe density for alternative splicing events. Here, we present MADS+, a computational pipeline to detect differential splicing events from the Affymetrix exon junction array data. For each alternative splicing event, MADS+ evaluates the signals of probes targeting competing transcript isoforms to identify exons or splice sites with different levels of transcript inclusion between two sample groups. MADS+ is used routinely in our analysis of Affymetrix exon junction arrays and has a high accuracy in detecting differential splicing events. For example, in a study of the novel epithelial-specific splicing regulator ESRP1, MADS+ detects hundreds of exons whose inclusion levels are dependent on ESRP1, with a RT-PCR validation rate of 88.5% (153 validated out of 173 tested).
Availability: MADS+ scripts, documentations and annotation files are available at http://www.medicine.uiowa.edu/Labs/Xing/MADSplus/.
Contact: yi-xing@uiowa.edu
doi:10.1093/bioinformatics/btp643
PMCID: PMC2804303  PMID: 19933160
6.  Genome-wide analysis of alternative splicing in Chlamydomonas reinhardtii 
BMC Genomics  2010;11:114.
Background
Genome-wide computational analysis of alternative splicing (AS) in several flowering plants has revealed that pre-mRNAs from about 30% of genes undergo AS. Chlamydomonas, a simple unicellular green alga, is part of the lineage that includes land plants. However, it diverged from land plants about one billion years ago. Hence, it serves as a good model system to study alternative splicing in early photosynthetic eukaryotes, to obtain insights into the evolution of this process in plants, and to compare splicing in simple unicellular photosynthetic and non-photosynthetic eukaryotes. We performed a global analysis of alternative splicing in Chlamydomonas reinhardtii using its recently completed genome sequence and all available ESTs and cDNAs.
Results
Our analysis of AS using BLAT and a modified version of the Sircah tool revealed AS of 498 transcriptional units with 611 events, representing about 3% of the total number of genes. As in land plants, intron retention is the most prevalent form of AS. Retained introns and skipped exons tend to be shorter than their counterparts in constitutively spliced genes. The splice site signals in all types of AS events are weaker than those in constitutively spliced genes. Furthermore, in alternatively spliced genes, the prevalent splice form has a stronger splice site signal than the non-prevalent form. Analysis of constitutively spliced introns revealed an over-abundance of motifs with simple repetitive elements in comparison to introns involved in intron retention. In almost all cases, AS results in a truncated ORF, leading to a coding sequence that is around 50% shorter than the prevalent splice form. Using RT-PCR we verified AS of two genes and show that they produce more isoforms than indicated by EST data. All cDNA/EST alignments and splice graphs are provided in a website at http://combi.cs.colostate.edu/as/chlamy.
Conclusions
The extent of AS in Chlamydomonas that we observed is much smaller than observed in land plants, but is much higher than in simple unicellular heterotrophic eukaryotes. The percentage of different alternative splicing events is similar to flowering plants. Prevalence of constitutive and alternative splicing in Chlamydomonas, together with its simplicity, many available public resources, and well developed genetic and molecular tools for this organism make it an excellent model system to elucidate the mechanisms involved in regulated splicing in photosynthetic eukaryotes.
doi:10.1186/1471-2164-11-114
PMCID: PMC2830987  PMID: 20163725
7.  Identifiability of isoform deconvolution from junction arrays and RNA-Seq 
Bioinformatics  2009;25(23):3056-3059.
Motivation: Splice junction microarrays and RNA-seq are two popular ways of quantifying splice variants within a cell. Unfortunately, isoform expressions cannot always be determined from the expressions of individual exons and splice junctions. While this issue has been noted before, the extent of the problem on various platforms has not yet been explored, nor have potential remedies been presented.
Results: We propose criteria that will guarantee identifiability of an isoform deconvolution model on exon and splice junction arrays and in RNA-Seq. We show that up to 97% of 2256 alternatively spliced human genes selected from the RefSeq database lead to identifiable gene models in RNA-seq, with similar results in mouse. However, in the Human Exon array only 26% of these genes lead to identifiable models, and even in the most comprehensive splice junction array only 69% lead to identifiable models.
Contact: whwong@stanford.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp544
PMCID: PMC3167695  PMID: 19762346
8.  PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data 
Bioinformatics  2012;28(4):479-486.
Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon–exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge.
Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ∼ 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples.
Availability: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion
Contact: y.zhang@lumc.nl; k.ye@lumc.nl
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr712
PMCID: PMC3278765  PMID: 22219203
9.  spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data 
BMC Bioinformatics  2014;15:81.
Background
RNA-seq data is currently underutilized, in part because it is difficult to predict the functional impact of alternate transcription events. Recent software improvements in full-length transcript deconvolution prompted us to develop spliceR, an R package for classification of alternative splicing and prediction of coding potential.
Results
spliceR uses the full-length transcript output from RNA-seq assemblers to detect single or multiple exon skipping, alternative donor and acceptor sites, intron retention, alternative first or last exon usage, and mutually exclusive exon events. For each of these events spliceR also annotates the genomic coordinates of the differentially spliced elements, facilitating downstream sequence analysis. For each transcript isoform fraction values are calculated to identify transcript switching between conditions. Lastly, spliceR predicts the coding potential, as well as the potential nonsense mediated decay (NMD) sensitivity of each transcript.
Conclusions
spliceR is an easy-to-use tool that extends the usability of RNA-seq and assembly technologies by allowing greater depth of annotation of RNA-seq data. spliceR is implemented as an R package and is freely available from the Bioconductor repository ( http://www.bioconductor.org/packages/2.13/bioc/html/spliceR.html).
doi:10.1186/1471-2105-15-81
PMCID: PMC3998036  PMID: 24655717
spliceR; RNA-Seq; Alternative splicing; Nonsense mediated decay (NMD); Isoform switch
10.  iCLIP Predicts the Dual Splicing Effects of TIA-RNA Interactions 
PLoS Biology  2010;8(10):e1000530.
Transcriptome-wide analysis of protein-RNA interactions predicts the dual splicing effects of TIA proteins, showing that their local enhancing function is associated with diverse distal splicing silencing effects.
The regulation of alternative splicing involves interactions between RNA-binding proteins and pre-mRNA positions close to the splice sites. T-cell intracellular antigen 1 (TIA1) and TIA1-like 1 (TIAL1) locally enhance exon inclusion by recruiting U1 snRNP to 5′ splice sites. However, effects of TIA proteins on splicing of distal exons have not yet been explored. We used UV-crosslinking and immunoprecipitation (iCLIP) to find that TIA1 and TIAL1 bind at the same positions on human RNAs. Binding downstream of 5′ splice sites was used to predict the effects of TIA proteins in enhancing inclusion of proximal exons and silencing inclusion of distal exons. The predictions were validated in an unbiased manner using splice-junction microarrays, RT-PCR, and minigene constructs, which showed that TIA proteins maintain splicing fidelity and regulate alternative splicing by binding exclusively downstream of 5′ splice sites. Surprisingly, TIA binding at 5′ splice sites silenced distal cassette and variable-length exons without binding in proximity to the regulated alternative 3′ splice sites. Using transcriptome-wide high-resolution mapping of TIA-RNA interactions we evaluated the distal splicing effects of TIA proteins. These data are consistent with a model where TIA proteins shorten the time available for definition of an alternative exon by enhancing recognition of the preceding 5′ splice site. Thus, our findings indicate that changes in splicing kinetics could mediate the distal regulation of alternative splicing.
Author Summary
Studies of splicing regulation have generally focused on RNA elements located close to alternative exons. Recently, it has been suggested that splicing of alternative exons can also be regulated by distal regulatory sites, but the underlying mechanism is not clear. The TIA proteins are key splicing regulators that enhance the recognition of 5′ splice sites, and their distal effects have remained unexplored so far. Here, we use a new method to map the positions of TIA-RNA interactions with high resolution on a transcriptome-wide scale. The identified binding positions successfully predict the local enhancing and distal silencing effects of TIA proteins. In particular, we show that TIA proteins can regulate distal alternative 3′ splice sites by binding at the 5′ splice site of the preceding exon. This result suggests that alternative splicing is affected by the timing of alternative exon definition relative to the recognition of the preceding 5′ splice site. These findings highlight the importance of analysing distal regulatory sites in order to fully understand the regulation of alternative splicing.
doi:10.1371/journal.pbio.1000530
PMCID: PMC2964331  PMID: 21048981
11.  FDM: a graph-based statistical method to detect differential transcription using RNA-seq data 
Bioinformatics  2011;27(19):2633-2640.
Motivation: In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription between samples. Our focus is on samples observed using short-read RNA sequencing (RNA-seq).
Methods: We characterize differential transcription between two samples as the difference in the relative abundance of the transcript isoforms present in the samples. The magnitude of differential transcription of a gene between two samples can be measured by the square root of the Jensen Shannon Divergence (JSD*) between the gene's transcript abundance vectors in each sample. We define a weighted splice-graph representation of RNA-seq data, summarizing in compact form the alignment of RNA-seq reads to a reference genome. The flow difference metric (FDM) identifies regions of differential RNA transcript expression between pairs of splice graphs, without need for an underlying gene model or catalog of transcripts. We present a novel non-parametric statistical test between splice graphs to assess the significance of differential transcription, and extend it to group-wise comparison incorporating sample replicates.
Results: Using simulated RNA-seq data consisting of four technical replicates of two samples with varying transcription between genes, we show that (i) the FDM is highly correlated with JSD* (r=0.82) when average RNA-seq coverage of the transcripts is sufficiently deep; and (ii) the FDM is able to identify 90% of genes with differential transcription when JSD* >0.28 and coverage >7. This represents higher sensitivity than Cufflinks (without annotations) and rDiff (MMD), which respectively identified 69 and 49% of the genes in this region as differential transcribed. Using annotations identifying the transcripts, Cufflinks was able to identify 86% of the genes in this region as differentially transcribed. Using experimental data consisting of four replicates each for two cancer cell lines (MCF7 and SUM102), FDM identified 1425 genes as significantly different in transcription. Subsequent study of the samples using quantitative real time polymerase chain reaction (qRT-PCR) of several differential transcription sites identified by FDM, confirmed significant differences at these sites.
Availability: http://csbio-linux001.cs.unc.edu/nextgen/software/FDM
Contact: darshan@email.unc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr458
PMCID: PMC3179659  PMID: 21824971
12.  Alternative Splicing Regulation During C. elegans Development: Splicing Factors as Regulated Targets 
PLoS Genetics  2008;4(2):e1000001.
Alternative splicing generates protein diversity and allows for post-transcriptional gene regulation. Estimates suggest that 10% of the genes in Caenorhabditis elegans undergo alternative splicing. We constructed a splicing-sensitive microarray to detect alternative splicing for 352 cassette exons and tested for changes in alternative splicing of these genes during development. We found that the microarray data predicted that 62/352 (∼18%) of the alternative splicing events studied show a strong change in the relative levels of the spliced isoforms (>4-fold) during development. Confirmation of the microarray data by RT-PCR was obtained for 70% of randomly selected genes tested. Among the genes with the most developmentally regulated alternatively splicing was the hnRNP F/H splicing factor homolog, W02D3.11 – now named hrpf-1. For the cassette exon of hrpf-1, the inclusion isoform comprises 65% of hrpf-1 steady state messages in embryos but only 0.1% in the first larval stage. This dramatic change in the alternative splicing of an alternative splicing factor suggests a complex cascade of splicing regulation during development. We analyzed splicing in embryos from a strain with a mutation in the splicing factor sym-2, another hnRNP F/H homolog. We found that approximately half of the genes with large alternative splicing changes between the embryo and L1 stages are regulated by sym-2 in embryos. An analysis of the role of nonsense-mediated decay in regulating steady-state alternative mRNA isoforms was performed. We found that 8% of the 352 events studied have alternative isoforms whose relative steady-state levels in embryos change more than 4-fold in a nonsense-mediated decay mutant, including hrpf-1. Strikingly, 53% of these alternative splicing events that are affected by NMD in our experiment are not obvious substrates for NMD based on the presence of premature termination codons. This suggests that the targeting of splicing factors by NMD may have downstream effects on alternative splicing regulation.
Author Summary
Alternative splicing is a mechanism for generating more than one messenger RNA from a given gene. The alternative transcripts can encode different proteins that share some regions in common but have modified functions, thus increasing the number of proteins encoded by the genome. Alternative splicing can also lead to the production of mRNA isoforms that are then subject to degradation by the nonsense-mediated decay pathway, thus providing a mechanism to down-regulate gene expression without decreasing transcription. Examples of cell type-specific, hormone-responsive, and developmentally-regulated alternative splicing have been described. We decided to measure the extent of developmentally regulated alternative splicing in the nematode model organism Caenorhabditis elegans. We developed a DNA microarray that can measure the alternative splicing of 352 cassette exons simultaneously and used it to probe alternative splicing in RNA extracted from embryos, the four larval stages, and adults. We show that 18% of the alternatively spliced genes tested show >4-fold changes in alternative splicing during development. In addition, we show that one of the most regulated genes is itself a splicing factor, providing support for a model in which a cascade of alternative splicing regulation occurs during development.
doi:10.1371/journal.pgen.1000001
PMCID: PMC2265522  PMID: 18454200
13.  DiffSplice: the genome-wide detection of differential splicing events with RNA-seq 
Nucleic Acids Research  2012;41(2):e39.
The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at http://www.netlab.uky.edu/p/bioinfo/DiffSplice.
doi:10.1093/nar/gks1026
PMCID: PMC3553996  PMID: 23155066
14.  A protocol for visual analysis of alternative splicing in RNA-Seq data using Integrated Genome Browser 
Summary
Ultra-high throughput sequencing of cDNA (RNA-Seq) is an invaluable resource for investigating alternative splicing in an organism. Alternative splicing is a form of post-transcriptional regulation in which primary RNA transcripts from a single gene can be spliced in multiple ways leading to different RNA and protein products. In plants and other species, it has been shown that many genes involved in circadian regulation are alternatively spliced. As new RNA-Seq data sets become available, these data will lead to new insights into links between regulation RNA splicing and the circadian system. Analyzing RNA-Seq data sets requires software tools that can display RNA-Seq read alignments alongside gene models, enabling assessment of how treatments or developmental stages affect splicing patterns and production of novel variants. The Integrated Genome Browser software program (IGB) is a free and flexible desktop tool that enables discovery and quantification of alternative splicing. In this protocol, we use IGB and a cold-stress RNA-Seq data set to examine alternative splicing of Arabidopsis thaliana LHY, a circadian clock regulator. Integrated Genome Browser is freely available from http://www.bioviz.org.
doi:10.1007/978-1-4939-0700-7_8
PMCID: PMC4070736  PMID: 24792048
genome browser; visualization; visual analytics; alternative splicing; A. thaliana; LHY; LHY1; circadian clock
15.  Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs 
Bioinformatics  2013;29(18):2300-2310.
Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues.
Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate.
Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer.
Contact: cdewey@biostat.wisc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt396
PMCID: PMC3753571  PMID: 23846746
16.  FineSplice, enhanced splice junction detection and quantification: a novel pipeline based on the assessment of diverse RNA-Seq alignment solutions 
Nucleic Acids Research  2014;42(8):e71.
Alternative splicing is the main mechanism governing protein diversity. The recent developments in RNA-Seq technology have enabled the study of the global impact and regulation of this biological process. However, the lack of standardized protocols constitutes a major bottleneck in the analysis of alternative splicing. This is particularly important for the identification of exon–exon junctions, which is a critical step in any analysis workflow. Here we performed a systematic benchmarking of alignment tools to dissect the impact of design and method on the mapping, detection and quantification of splice junctions from multi-exon reads. Accordingly, we devised a novel pipeline based on TopHat2 combined with a splice junction detection algorithm, which we have named FineSplice. FineSplice allows effective elimination of spurious junction hits arising from artefactual alignments, achieving up to 99% precision in both real and simulated data sets and yielding superior F1 scores under most tested conditions. The proposed strategy conjugates an efficient mapping solution with a semi-supervised anomaly detection scheme to filter out false positives and allows reliable estimation of expressed junctions from the alignment output. Ultimately this provides more accurate information to identify meaningful splicing patterns. FineSplice is freely available at https://sourceforge.net/p/finesplice/.
doi:10.1093/nar/gku166
PMCID: PMC4005686  PMID: 24574529
17.  Protein Modularity of Alternatively Spliced Exons Is Associated with Tissue-Specific Regulation of Alternative Splicing 
PLoS Genetics  2005;1(3):e34.
Recent comparative genomic analysis of alternative splicing has shown that protein modularity is an important criterion for functional alternative splicing events. Exons that are alternatively spliced in multiple organisms are much more likely to be an exact multiple of 3 nt in length, representing a class of “modular” exons that can be inserted or removed from the transcripts without affecting the rest of the protein. To understand the precise roles of these modular exons, in this paper we have analyzed microarray data for 3,126 alternatively spliced exons across ten mouse tissues generated by Pan and coworkers. We show that modular exons are strongly associated with tissue-specific regulation of alternative splicing. Exons that are alternatively spliced at uniformly high transcript inclusion levels or uniformly low levels show no preference for protein modularity. In contrast, alternatively spliced exons with dramatic changes of inclusion levels across mouse tissues (referred to as “tissue-switched” exons) are both strikingly biased to be modular and are strongly conserved between human and mouse. The analysis of different subsets of tissue-switched exons shows that the increased protein modularity cannot be explained by the overall exon inclusion level, but is specifically associated with tissue-switched alternative splicing.
Synopsis
Alternative splicing is a biological process that generates multiple mRNA and protein variants through alternative combinations of protein-coding exons. It is a widespread mechanism of gene regulation in higher eukaryotes. In recent years, scientists have found that when an exon is observed to be alternatively spliced in multiple species, its length is much more likely to be an exact multiple of three nucleotides. Since each amino acid is encoded by three nucleotides, these exons can be inserted or removed from the transcript as a “modular” protein-coding unit, without affecting the downstream protein translation. However, the precise roles of these modular exons in gene regulation and genome evolution remain unclear.
Xing and Lee have now investigated these modular exons using high-throughput genomics data. They analyzed the mouse splicing microarray data from the research group of Dr. Benjamin Blencowe at University of Toronto. Exons whose alternative splicing levels vary dramatically across multiple tissues are much more likely to be modular exons and are highly conserved during human and mouse evolution. This study establishes a strong link between protein modularity of alternatively spliced exons and tissue-specific regulation of alternative splicing. It provides new insights into the function and regulation of alternative splicing and how it evolves.
doi:10.1371/journal.pgen.0010034
PMCID: PMC1201369  PMID: 16170410
18.  Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data 
BMC Bioinformatics  2012;13(Suppl 6):S11.
Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches.
We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches.
doi:10.1186/1471-2105-13-S6-S11
PMCID: PMC3330053  PMID: 22537040
19.  Cell-type specific analysis of translating RNAs in developing flowers reveals new levels of control 
Combining translating ribosome affinity purification with RNA-seq for cell-specific profiling of translating RNAs in developing flowers.Cell type comparisons of cell type-specific hormone responses, promoter motifs, coexpressed cognate binding factor candidates, and splicing isoforms.Widespread post-transcriptional regulation at both the intron splicing and translational stages.A new class of noncoding RNAs associated with polysomes.
What constitutes a differentiated cell type? How much do cell types differ in their transcription of genes? The development and functions of tissues rely on constant interactions among distinct and nonequivalent cell types. Answering these questions will require quantitative information on transcriptomes, proteomes, protein–protein interactions, protein–nucleic acid interactions, and metabolomes at cellular resolution. The systems approaches emerging in biology promise to explain properties of biological systems based on genome-wide measurements of expression, interaction, regulation, and metabolism. To facilitate a systems approach, it is essential first to capture such components in a global manner, ideally at cellular resolution.
Recently, microarray analysis of transcriptomes has been extended to a cellular level of resolution by using laser microdissection or fluorescence-activated sorting (for review, see Nelson et al, 2008). These methods have been limited by stresses associated with cellular separation and isolation procedures, and biases associated with mandatory RNA amplification steps. A newly developed method, translating ribosome affinity purification (TRAP; Zanetti et al, 2005; Heiman et al, 2008; Mustroph et al, 2009), circumvents these problems by epitopetagging a ribosomal protein in specific cellular domains to selectively purify polysomes. We combined TRAP with deep sequencing, which we term TRAP-seq, to provide cell-level spatiotemporal maps for Arabidopsis early floral development at single-base resolution.
Flower development in Arabidopsis has been studied extensively and is one of the best understood aspects of plant development (for review, see Krizek and Fletcher, 2005). Genetic analysis of homeotic mutants established the ABC model, in which three classes of regulatory genes, A, B and C, work in a combinatorial manner to confer organ identities of four whorls (Coen and Meyerowitz, 1991). Each class of regulatory gene is expressed in a specific and evolutionarily conserved domain, and the action of the class A, B and C genes is necessary for specification of organ identity (Figure 1A).
Using TRAP-seq, we purified cell-specific translating mRNA populations, which we and others call the translatome, from the A, B and C domains of early developing flowers, in which floral patterning and the specification of floral organs is established. To achieve temporal specificity, we used a floral induction system to facilitate collection of early stage flowers (Wellmer et al, 2006). The combination of TRAP-seq with domain-specific promoters and this floral induction system enabled fine spatiotemporal isolation of translating mRNA in specific cellular domains, and at specific developmental stages.
Multiple lines of evidence confirmed the specificity of this approach, including detecting the expression in expected domains but not in other domains for well-studied flower marker genes and known physiological functions (Figures 1B–D and 2A–C). Furthermore, we provide numerous examples from flower development in which a spatiotemporal map of rigorously comparable cell-specific translatomes makes possible new views of the properties of cell domains not evident in data obtained from whole organs or tissues, including patterns of transcription and cis-regulation, new physiological differences among cell domains and between flower stages, putative hormone-active centers, and splicing events specific for flower domains (Figure 2A–D). Such findings may provide new targets for reverse genetics studies and may aid in the formulation and validation of interaction and pathway networks.
Beside cellular heterogeneity, the transcriptome is regulated at several steps through the life of mRNA molecules, which are not directly available through traditional transcriptome profiling of total mRNA abundance. By comparing the translatome and transcriptome, we integratively profiled two key posttranscriptional control points, intron splicing and translation state. From our translatome-wide profiling, we (i) confirmed that both posttranscriptional regulation control points were used by a large portion of the transcriptome; (ii) identified a number of cis-acting features within the coding or noncoding sequences that correlate with splicing or translation state; and (iii) revealed correlation between each regulation mechanism and gene function. Our transcriptome-wide surveys have highlighted target genes transcripts of which are probably under extensive posttranscriptional regulation during flower development.
Finally, we reported the finding of a large number of polysome-associated ncRNAs. About one-third of all annotated ncRNA in the Arabidopsis genome were observed co-purified with polysomes. Coding capacity analysis confirmed that most of them are real ncRNA without conserved ORFs. The group of polysome-associated ncRNA reported in this study is a potential new addition to the expanding riboregulator catalog; they could have roles in translational regulation during early flower development.
Determining both the expression levels of mRNA and the regulation of its translation is important in understanding specialized cell functions. In this study, we describe both the expression profiles of cells within spatiotemporal domains of the Arabidopsis thaliana flower and the post-transcriptional regulation of these mRNAs, at nucleotide resolution. We express a tagged ribosomal protein under the promoters of three master regulators of flower development. By precipitating tagged polysomes, we isolated cell type-specific mRNAs that are probably translating, and quantified those mRNAs through deep sequencing. Cell type comparisons identified known cell-specific transcripts and uncovered many new ones, from which we inferred cell type-specific hormone responses, promoter motifs and coexpressed cognate binding factor candidates, and splicing isoforms. By comparing translating mRNAs with steady-state overall transcripts, we found evidence for widespread post-transcriptional regulation at both the intron splicing and translational stages. Sequence analyses identified structural features associated with each step. Finally, we identified a new class of noncoding RNAs associated with polysomes. Findings from our profiling lead to new hypotheses in the understanding of flower development.
doi:10.1038/msb.2010.76
PMCID: PMC2990639  PMID: 20924354
Arabidopsis; flower; intron; transcriptome; translation
20.  Spliced Leader Trapping Reveals Widespread Alternative Splicing Patterns in the Highly Dynamic Transcriptome of Trypanosoma brucei 
PLoS Pathogens  2010;6(8):e1001037.
Trans-splicing of leader sequences onto the 5′ends of mRNAs is a widespread phenomenon in protozoa, nematodes and some chordates. Using parallel sequencing we have developed a method to simultaneously map 5′splice sites and analyze the corresponding gene expression profile, that we term spliced leader trapping (SLT). The method can be applied to any organism with a sequenced genome and trans-splicing of a conserved leader sequence. We analyzed the expression profiles and splicing patterns of bloodstream and insect forms of the parasite Trypanosoma brucei. We detected the 5′ splice sites of 85% of the annotated protein-coding genes and, contrary to previous reports, found up to 40% of transcripts to be differentially expressed. Furthermore, we discovered more than 2500 alternative splicing events, many of which appear to be stage-regulated. Based on our findings we hypothesize that alternatively spliced transcripts present a new means of regulating gene expression and could potentially contribute to protein diversity in the parasite. The entire dataset can be accessed online at TriTrypDB or through: http://splicer.unibe.ch/.
Author Summary
Some organisms like the human and animal parasite Trypanosoma brucei add a leader sequence to their mRNAs through a reaction called trans-splicing. Until now the splice sites for most mRNAs were unknown in T. brucei. Using high throughput sequencing we have developed a method to identify the splice sites and at the same time measure the abundance of the corresponding mRNAs. Analyzing three different life cycle stages of the parasite we identified the vast majority of splice sites in the organism and, to our great surprise, uncovered more than 2500 alternative splicing events, many of which appeared to be specific for one of the life cycle stages. Alternative splicing is a result of the addition of the leader sequence to different positions on the mRNA, leading to mixed mRNA populations that can encode for proteins with varying properties. One of the most obvious changes caused by alternative splicing is the gain or loss of targeting signals, leading to differential localization of the corresponding proteins. Based on our findings we hypothesize that alternative splicing is a major mechanism to regulate gene expression in T. brucei and could contribute to protein diversity in the parasite.
doi:10.1371/journal.ppat.1001037
PMCID: PMC2916883  PMID: 20700444
21.  An EMT–Driven Alternative Splicing Program Occurs in Human Breast Cancer and Modulates Cellular Phenotype 
PLoS Genetics  2011;7(8):e1002218.
Epithelial-mesenchymal transition (EMT), a mechanism important for embryonic development, plays a critical role during malignant transformation. While much is known about transcriptional regulation of EMT, alternative splicing of several genes has also been correlated with EMT progression, but the extent of splicing changes and their contributions to the morphological conversion accompanying EMT have not been investigated comprehensively. Using an established cell culture model and RNA–Seq analyses, we determined an alternative splicing signature for EMT. Genes encoding key drivers of EMT–dependent changes in cell phenotype, such as actin cytoskeleton remodeling, regulation of cell–cell junction formation, and regulation of cell migration, were enriched among EMT–associated alternatively splicing events. Our analysis suggested that most EMT–associated alternative splicing events are regulated by one or more members of the RBFOX, MBNL, CELF, hnRNP, or ESRP classes of splicing factors. The EMT alternative splicing signature was confirmed in human breast cancer cell lines, which could be classified into basal and luminal subtypes based exclusively on their EMT–associated splicing pattern. Expression of EMT–associated alternative mRNA transcripts was also observed in primary breast cancer samples, indicating that EMT–dependent splicing changes occur commonly in human tumors. The functional significance of EMT–associated alternative splicing was tested by expression of the epithelial-specific splicing factor ESRP1 or by depletion of RBFOX2 in mesenchymal cells, both of which elicited significant changes in cell morphology and motility towards an epithelial phenotype, suggesting that splicing regulation alone can drive critical aspects of EMT–associated phenotypic changes. The molecular description obtained here may aid in the development of new diagnostic and prognostic markers for analysis of breast cancer progression.
Author Summary
Epithelial-to-mesenchymal transition (EMT) is the process by which cancer cells lose their epithelial characteristics and obtain a mesenchymal phenotype that is thought to allow them to migrate away from the primary tumor. A better understanding of how EMT is controlled would be valuable in predicting the likelihood of metastasis and in designing targeted therapies to block metastatic progression. While there have been many studies on the contribution of changes in gene expression to EMT, much less is known regarding the role of alternative splicing of mRNA during EMT. Alternative splicing can produce different protein isoforms from the same gene that often have distinct activities and functions. Here, we used a recently developed method to characterize changes in alternative splicing during EMT and found that thousands of multi-exon genes underwent alternative splicing. Alternative isoform expression was confirmed in human breast cancer cell lines and in primary human breast cancer samples, indicating that EMT–dependent splicing changes occur commonly in human tumors. Since EMT is considered an early step in metastatic progression, novel markers of EMT that we identified in human breast cancer samples might become valuable prognostic and diagnostic tools if confirmed in a larger cohort of patients.
doi:10.1371/journal.pgen.1002218
PMCID: PMC3158048  PMID: 21876675
22.  Genome-wide transcriptome analysis shows extensive alternative RNA splicing in the zoonotic parasite Schistosoma japonicum 
BMC Genomics  2014;15(1):715.
Background
Schistosoma japonicum is a pathogen of the phylum Platyhelminthes that causes zoonotic schistosomiasis in China and Southeast Asian countries where a lack of efficient measures has hampered disease control. The development of tools for diagnosis of acute and chronic infection and for novel antiparasite reagents relies on understanding the biological mechanisms that the parasite exploits.
Results
In this study, the polyadenylated transcripts from the male and female S. japonicum were sequenced using a high-throughput RNA-seq technique. Bioinformatic and experimental analyses focused on post-transcriptional RNA processing, which revealed extensive alternative splicing events in the adult stage of the parasite. The numbers of protein-coding sequences identified in the transcriptomes of the female and male S. japonicum were 15,939 and 19,501 respectively, which is more than predicted from the annotated genome sequence. Further, we identified four types of post-transcriptional processing, or alternative splicing, in both female and male worms of S. japonicum: exon skipping, intron retention, and alternative donor and acceptor sites. Unlike mammalian organisms, in S. japonicum, the alternative donor and acceptor sites were more common than the other two types of post-transcriptional processing. In total, respectively 13,438 and 16,507 alternative splicing events were predicted in the transcriptomes of female and male S. japonicum.
Conclusions
By using RNA-seq technology, we obtained the global transcriptomes of male and female S. japonicum. These results further provide a comprehensive view of the global transcriptome of S. japonicum. The findings of a substantial level of alternative splicing events dynamically occurring in S. japonicum parasitization of mammalian hosts suggest complicated transcriptional and post-transcriptional regulation mechanisms employed by the parasite. These data should not only significantly improve the re-annotation of the genome sequences but also should provide new information about the biology of the parasite.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-715) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-715
PMCID: PMC4203478  PMID: 25156522
23.  Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays 
Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families.
Synopsis
Alternative splicing expands the protein-coding potential of genes and genomes. RNAs copied from a gene can be spliced differently to produce distinct proteins under regulatory influences that arise during development or upon environmental change. These authors present a global analysis of alternative splicing in the mouse, using microarray measurements of splicing from 22 adult tissues. The ability to measure thousands of splicing events across the genome in many tissues has allowed the capture of co-regulated sets of exons whose inclusion in mRNA occurs preferentially in a given set of tissues. An examination of the sequences associated with exons whose expression is regulated in brain or muscle as compared to other tissues reveals extreme conservation of intron sequences nearby the regulated exon. These conserved regions contain sequence motifs likely to contribute to the regulation of alternative splicing in brain and muscle cells. The availability of global gene expression data with splicing level resolution should spur the development of computational methods for detecting and predicting alternative splicing and its regulation. In addition, the authors make strong predictions for biological experiments leading to the identification of components and their mechanisms of action in the regulation of splicing during mammalian development.
doi:10.1371/journal.pcbi.0020004
PMCID: PMC1331982  PMID: 16424921
24.  SAGE2Splice: Unmapped SAGE Tags Reveal Novel Splice Junctions 
PLoS Computational Biology  2006;2(4):e34.
Serial analysis of gene expression (SAGE) not only is a method for profiling the global expression of genes, but also offers the opportunity for the discovery of novel transcripts. SAGE tags are mapped to known transcripts to determine the gene of origin. Tags that map neither to a known transcript nor to the genome were hypothesized to span a splice junction, for which the exon combination or exon(s) are unknown. To test this hypothesis, we have developed an algorithm, SAGE2Splice, to efficiently map SAGE tags to potential splice junctions in a genome. The algorithm consists of three search levels. A scoring scheme was designed based on position weight matrices to assess the quality of candidates. Using optimized parameters for SAGE2Splice analysis and two sets of SAGE data, candidate junctions were discovered for 5%–6% of unmapped tags. Candidates were classified into three categories, reflecting the previous annotations of the putative splice junctions. Analysis of predicted tags extracted from EST sequences demonstrated that candidate junctions having the splice junction located closer to the center of the tags are more reliable. Nine of these 12 candidates were validated by RT-PCR and sequencing, and among these, four revealed previously uncharacterized exons. Thus, SAGE2Splice provides a new functionality for the identification of novel transcripts and exons. SAGE2Splice is available online at http://www.cisreg.ca.
Synopsis
Serial analysis of gene expression (SAGE) analysis is used to profile the RNA transcripts present in a cell or tissue sample. In SAGE experiments, short portions of transcripts are sequenced in proportion to their abundance. These sequence tags must be mapped back to sequence databases to determine from which gene they were derived. Although the present genome annotation efforts have greatly facilitated this mapping process, a significant fraction of tags remain unassigned. The authors describe a computational algorithm, SAGE2Splice, that effectively and efficiently maps a subset of these unmapped tags to candidate splice junctions (the edges of two exons). In two test cases, 7%–8% of analyzed tags matched potential splice junctions. Based on the availability of RNA, sufficient information to design polymerase chain reaction (PCR) primers, and the confidence score associated with the predictions, 12 candidate splice junctions were selected for experimental tests. Nine of the tested predictions were validated by PCR and sequencing, confirming the capacity of the SAGE2Splice method to reveal previously unknown exons. Using recommended high specificity parameters, 5%–6% of high-quality unmapped SAGE tags were found to map to candidate splice junctions. An Internet interface to the SAGE2Splice system is described at http://www.cisreg.ca.
doi:10.1371/journal.pcbi.0020034
PMCID: PMC1447652  PMID: 16683015
25.  The multiplicity of alternative splicing decisions in Caenorhabditis elegans is linked to specific intronic regulatory motifs and minisatellites 
BMC Genomics  2014;15(1):364.
Background
Alternative splicing diversifies the pool of messenger RNA molecules encoded by individual genes. This diversity is particularly high when multiple splicing decisions cause a combinatorial arrangement of several alternate exons. We know very little on how the multiple decisions occurring during the maturation of single transcripts are coordinated and whether specific sequence elements might be involved.
Results
Here, the Caenorhabditis elegans genome was surveyed in order to identify sequence elements that might play a specific role in the regulation of multiple splicing decisions. The introns flanking alternate exons in transcripts whose maturation involves multiple alternative splicing decisions were compared to those whose maturation involves a single decision. Fifty-eight penta-, hexa-, and hepta-meric elements, clustered in 17 groups, were significantly over-represented in genes subject to multiple alternative splicing decisions. Most of these motifs relate to known splicing regulatory elements and appear to be well conserved in the related species Caenorhabditis briggsae. The usage of specific motifs is not linked to the gene product function, but rather depends on the gene structure, since it is influenced by the distance separating the multiple splicing decision sites. Two of these motifs are part of the CeRep25B minisatellite, which is also over-represented at the vicinity of alternative splicing regions. Most of the remaining motifs are not part of repeated sequence elements, but tend to occur in specific heterologous pairs in genes subject to multiple alternative splicing decisions.
Conclusions
The existence of specific intronic sequence elements linked to multiple alternative splicing decisions is intriguing and suggests that these elements might have some specialized regulatory role during splicing.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-364) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-364
PMCID: PMC4039745  PMID: 24884695
Alternate splice sites; Coordination of multiple choices; Regulatory elements; Worm; IMMAD; MASS; SASS

Results 1-25 (506483)