1.  RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application 
BMC Genomics  2015;16(Suppl 6):S3.
The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.).
Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results.
In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq).
Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.
PMCID: PMC4461013  PMID: 26046471
RNA-Seq; Alternative splicing; transcriptome regulation; workflow
2.  Alternative Splicing of RNA Triplets Is Often Regulated and Accelerates Proteome Evolution 
PLoS Biology  2012;10(1):e1001229.
Inclusion or exclusion of single codons at the splice acceptor site of mammalian genes is regulated in a tissue-specific manner, is strongly conserved, and is associated with local accelerated protein evolution.
Thousands of human genes contain introns ending in NAGNAG (N any nucleotide), where both NAGs can function as 3′ splice sites, yielding isoforms that differ by inclusion/exclusion of three bases. However, few models exist for how such splicing might be regulated, and some studies have concluded that NAGNAG splicing is purely stochastic and nonfunctional. Here, we used deep RNA-Seq data from 16 human and eight mouse tissues to analyze the regulation and evolution of NAGNAG splicing. Using both biological and technical replicates to estimate false discovery rates, we estimate that at least 25% of alternatively spliced NAGNAGs undergo tissue-specific regulation in mammals, and alternative splicing of strongly tissue-specific NAGNAGs was 10 times as likely to be conserved between species as was splicing of non-tissue-specific events, implying selective maintenance. Preferential use of the distal NAG was associated with distinct sequence features, including a more distal location of the branch point and presence of a pyrimidine immediately before the first NAG, and alteration of these features in a splicing reporter shifted splicing away from the distal site. Strikingly, alignments of orthologous exons revealed a ∼15-fold increase in the frequency of three base pair gaps at 3′ splice sites relative to nearby exon positions in both mammals and in Drosophila. Alternative splicing of NAGNAGs in human was associated with dramatically increased frequency of exon length changes at orthologous exon boundaries in rodents, and a model involving point mutations that create, destroy, or alter NAGNAGs can explain both the increased frequency and biased codon composition of gained/lost sequence observed at the beginnings of exons. This study shows that NAGNAG alternative splicing generates widespread differences between the proteomes of mammalian tissues, and suggests that the evolutionary trajectories of mammalian proteins are strongly biased by the locations and phases of the introns that interrupt coding sequences.
Author Summary
In order to translate a gene into protein, all of the non-coding regions (introns) need to be removed from the transcript and the coding regions (exons) stitched back together to make an mRNA. Most human genes are alternatively spliced, allowing the selection of different combinations of exons to produce multiple distinct mRNAs and proteins. Many types of alternative splicing are known to play crucial roles in biological processes including cell fate determination, tumor metabolism, and apoptosis. In this study, we investigated a form of alternative splicing in which competing adjacent 3′ splice sites (or splice acceptor sites) generate mRNAs differing by just an RNA triplet, the size of a single codon. This mode of alternative splicing, known as NAGNAG splicing, affects thousands of human genes and has been known for a decade, but its potential regulation, physiological importance, and conservation across species have been disputed. Using high-throughput sequencing of cDNA (“RNA-Seq”) from human and mouse tissues, we found that single-codon splicing often shows strong tissue specificity. Regulated NAGNAG alternative splice sites are selectively conserved between human and mouse genes, suggesting that they are important for organismal fitness. We identified features of the competing splice sites that influence NAGNAG splicing, and validated their effects in cultured cells. Furthermore, we found that this mode of splicing is associated with accelerated and highly biased protein evolution at exon boundaries. Taken together, our analyses demonstrate that the inclusion or exclusion of RNA triplets at exon boundaries can be effectively regulated by the splicing machinery, and highlight an unexpected connection between RNA processing and protein evolution.
PMCID: PMC3250501  PMID: 22235189
3.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) 
Bioinformatics  2011;27(18):2518-2528.
Motivation: A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously.
Results: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription–polymerase chain reaction (RT–PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability.
Availability: The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (
Supplementary Information:The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.
PMCID: PMC3167048  PMID: 21775302
4.  Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki) 
BMC Bioinformatics  2013;14:320.
The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment.
We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at
Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.
PMCID: PMC3827500  PMID: 24209455
5.  Genome-wide analysis of alternative splicing in Chlamydomonas reinhardtii 
BMC Genomics  2010;11:114.
Genome-wide computational analysis of alternative splicing (AS) in several flowering plants has revealed that pre-mRNAs from about 30% of genes undergo AS. Chlamydomonas, a simple unicellular green alga, is part of the lineage that includes land plants. However, it diverged from land plants about one billion years ago. Hence, it serves as a good model system to study alternative splicing in early photosynthetic eukaryotes, to obtain insights into the evolution of this process in plants, and to compare splicing in simple unicellular photosynthetic and non-photosynthetic eukaryotes. We performed a global analysis of alternative splicing in Chlamydomonas reinhardtii using its recently completed genome sequence and all available ESTs and cDNAs.
Our analysis of AS using BLAT and a modified version of the Sircah tool revealed AS of 498 transcriptional units with 611 events, representing about 3% of the total number of genes. As in land plants, intron retention is the most prevalent form of AS. Retained introns and skipped exons tend to be shorter than their counterparts in constitutively spliced genes. The splice site signals in all types of AS events are weaker than those in constitutively spliced genes. Furthermore, in alternatively spliced genes, the prevalent splice form has a stronger splice site signal than the non-prevalent form. Analysis of constitutively spliced introns revealed an over-abundance of motifs with simple repetitive elements in comparison to introns involved in intron retention. In almost all cases, AS results in a truncated ORF, leading to a coding sequence that is around 50% shorter than the prevalent splice form. Using RT-PCR we verified AS of two genes and show that they produce more isoforms than indicated by EST data. All cDNA/EST alignments and splice graphs are provided in a website at
The extent of AS in Chlamydomonas that we observed is much smaller than observed in land plants, but is much higher than in simple unicellular heterotrophic eukaryotes. The percentage of different alternative splicing events is similar to flowering plants. Prevalence of constitutive and alternative splicing in Chlamydomonas, together with its simplicity, many available public resources, and well developed genetic and molecular tools for this organism make it an excellent model system to elucidate the mechanisms involved in regulated splicing in photosynthetic eukaryotes.
PMCID: PMC2830987  PMID: 20163725
6.  MADS+: discovery of differential splicing events from Affymetrix exon junction array data 
Bioinformatics  2009;26(2):268-269.
Motivation: The Affymetrix Human Exon Junction Array is a newly designed high-density exon-sensitive microarray for global analysis of alternative splicing. Contrary to the Affymetrix exon 1.0 array, which only contains four probes per exon and no probes for exon–exon junctions, this new junction array averages eight probes per probeset targeting all exons and exon–exon junctions observed in the human mRNA/EST transcripts, representing a significant increase in the probe density for alternative splicing events. Here, we present MADS+, a computational pipeline to detect differential splicing events from the Affymetrix exon junction array data. For each alternative splicing event, MADS+ evaluates the signals of probes targeting competing transcript isoforms to identify exons or splice sites with different levels of transcript inclusion between two sample groups. MADS+ is used routinely in our analysis of Affymetrix exon junction arrays and has a high accuracy in detecting differential splicing events. For example, in a study of the novel epithelial-specific splicing regulator ESRP1, MADS+ detects hundreds of exons whose inclusion levels are dependent on ESRP1, with a RT-PCR validation rate of 88.5% (153 validated out of 173 tested).
Availability: MADS+ scripts, documentations and annotation files are available at
PMCID: PMC2804303  PMID: 19933160
7.  A probabilistic framework for aligning paired-end RNA-seq data 
Bioinformatics  2010;26(16):1950-1957.
Motivation: The RNA-seq paired-end read (PER) protocol samples transcript fragments longer than the sequencing capability of today's technology by sequencing just the two ends of each fragment. Deep sampling of the transcriptome using the PER protocol presents the opportunity to reconstruct the unsequenced portion of each transcript fragment using end reads from overlapping PERs, guided by the expected length of the fragment.
Methods: A probabilistic framework is described to predict the alignment to the genome of all PER transcript fragments in a PER dataset. Starting from possible exonic and spliced alignments of all end reads, our method constructs potential splicing paths connecting paired ends. An expectation maximization method assigns likelihood values to all splice junctions and assigns the most probable alignment for each transcript fragment.
Results: The method was applied to 2 × 35 bp PER datasets from cancer cell lines MCF-7 and SUM-102. PER fragment alignment increased the coverage 3-fold compared to the alignment of the end reads alone, and increased the accuracy of splice detection. The accuracy of the expectation maximization (EM) algorithm in the presence of alternative paths in the splice graph was validated by qRT–PCR experiments on eight exon skipping alternative splicing events. PER fragment alignment with long-range splicing confirmed 8 out of 10 fusion events identified in the MCF-7 cell line in an earlier study by (Maher et al., 2009).
Availability: Software available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2916723  PMID: 20576625
8.  Alternative Splicing Regulation During C. elegans Development: Splicing Factors as Regulated Targets 
PLoS Genetics  2008;4(2):e1000001.
Alternative splicing generates protein diversity and allows for post-transcriptional gene regulation. Estimates suggest that 10% of the genes in Caenorhabditis elegans undergo alternative splicing. We constructed a splicing-sensitive microarray to detect alternative splicing for 352 cassette exons and tested for changes in alternative splicing of these genes during development. We found that the microarray data predicted that 62/352 (∼18%) of the alternative splicing events studied show a strong change in the relative levels of the spliced isoforms (>4-fold) during development. Confirmation of the microarray data by RT-PCR was obtained for 70% of randomly selected genes tested. Among the genes with the most developmentally regulated alternatively splicing was the hnRNP F/H splicing factor homolog, W02D3.11 – now named hrpf-1. For the cassette exon of hrpf-1, the inclusion isoform comprises 65% of hrpf-1 steady state messages in embryos but only 0.1% in the first larval stage. This dramatic change in the alternative splicing of an alternative splicing factor suggests a complex cascade of splicing regulation during development. We analyzed splicing in embryos from a strain with a mutation in the splicing factor sym-2, another hnRNP F/H homolog. We found that approximately half of the genes with large alternative splicing changes between the embryo and L1 stages are regulated by sym-2 in embryos. An analysis of the role of nonsense-mediated decay in regulating steady-state alternative mRNA isoforms was performed. We found that 8% of the 352 events studied have alternative isoforms whose relative steady-state levels in embryos change more than 4-fold in a nonsense-mediated decay mutant, including hrpf-1. Strikingly, 53% of these alternative splicing events that are affected by NMD in our experiment are not obvious substrates for NMD based on the presence of premature termination codons. This suggests that the targeting of splicing factors by NMD may have downstream effects on alternative splicing regulation.
Author Summary
Alternative splicing is a mechanism for generating more than one messenger RNA from a given gene. The alternative transcripts can encode different proteins that share some regions in common but have modified functions, thus increasing the number of proteins encoded by the genome. Alternative splicing can also lead to the production of mRNA isoforms that are then subject to degradation by the nonsense-mediated decay pathway, thus providing a mechanism to down-regulate gene expression without decreasing transcription. Examples of cell type-specific, hormone-responsive, and developmentally-regulated alternative splicing have been described. We decided to measure the extent of developmentally regulated alternative splicing in the nematode model organism Caenorhabditis elegans. We developed a DNA microarray that can measure the alternative splicing of 352 cassette exons simultaneously and used it to probe alternative splicing in RNA extracted from embryos, the four larval stages, and adults. We show that 18% of the alternatively spliced genes tested show >4-fold changes in alternative splicing during development. In addition, we show that one of the most regulated genes is itself a splicing factor, providing support for a model in which a cascade of alternative splicing regulation occurs during development.
PMCID: PMC2265522  PMID: 18454200
9.  Alternative splicing detection workflow needs a careful combination of sample prep and bioinformatics analysis 
BMC Bioinformatics  2015;16(Suppl 9):S2.
RNA-Seq provides remarkable power in the area of biomarkers discovery and disease characterization. Two crucial steps that affect RNA-Seq experiment results are Library Sample Preparation (LSP) and Bioinformatics Analysis (BA). This work describes an evaluation of the combined effect of LSP methods and BA tools in the detection of splice variants.
Different LSPs (TruSeq unstranded/stranded, ScriptSeq, NuGEN) allowed the detection of a large common set of splice variants. However, each LSP also detected a small set of unique transcripts that are characterized by a low coverage and/or FPKM. This effect was particularly evident using the low input RNA NuGEN v2 protocol.
A benchmark dataset, in which synthetic reads as well as reads generated from standard (Illumina TruSeq 100) and low input (NuGEN) LSPs were spiked-in was used to evaluate the effect of LSP on the statistical detection of alternative splicing events (AltDE). Statistical detection of AltDE was done using as prototypes for splice variant-quantification Cuffdiff2 and RSEM-EBSeq. As prototype for exon-level analysis DEXSeq was used. Exon-level analysis performed slightly better than splice variant-quantification approaches, although at most only 50% of the spiked-in transcripts was detected. The performances of both splice variant-quantification and exon-level analysis improved when raising the number of input reads.
Data, derived from NuGEN v2, were not the ideal input for AltDE, especially when the exon-level approach was used. We observed that both splice variant-quantification and exon-level analysis performances were strongly dependent on the number of input reads. Moreover, the ribosomal RNA depletion protocol was less sensitive in detecting splicing variants, possibly due to the significant percentage of the reads mapping to non-coding transcripts.
PMCID: PMC4464605  PMID: 26050971
10.  Histone modifications involved in cassette exon inclusions: a quantitative and interpretable analysis 
BMC Genomics  2014;15(1):1148.
Chromatin structure and epigenetic modifications have been shown to involve in the co-transcriptional splicing of RNA precursors. In particular, some studies have suggested that some types of histone modifications (HMs) may participate in the alternative splicing and function as exon marks. However, most existing studies pay attention to the qualitative relationship between epigenetic modifications and exon inclusion. The quantitative analysis that reveals to what extent each type of epigenetic modification is responsible for exon inclusion is very helpful for us to understand the splicing process.
In this paper, we focus on the quantitative analysis of HMs’ influence on the inclusion of cassette exons (CEs) into mature RNAs. With the high-throughput ChIP-seq and RNA-seq data obtained from ENCODE website, we modeled the association of HMs with CE inclusions by logistic regression whose coefficients are meaningful and interpretable for us to reveal the effect of each type of HM. Three type of HMs, H3K36me3, H3K9me3 and H4K20me1, were found to play major role in CE inclusions. HMs’ effect on CE inclusions is conservative across cell types, and does not depend on the expression levels of the genes hosting CEs. HMs located in the flanking regions of CEs were also taken into account in our analysis, and HMs within bounded flanking regions were shown to affect moderately CE inclusions. Moreover, we also found that HMs on CEs whose length is approximately close to nucleosomal-DNA length affect greatly on CE inclusion.
We suggested that a few types of HMs correlate closely to alternative splicing and perhaps function jointly with splicing machinery to regulate the inclusion level of exons. Our findings are helpful to understand HMs’ effect on exon definition, as well as the mechanism of co-transcriptional splicing.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1148) contains supplementary material, which is available to authorized users.
PMCID: PMC4378014  PMID: 25526687
Histone modifications; Alternative splicing; Quantitative analysis
11.  Protein Modularity of Alternatively Spliced Exons Is Associated with Tissue-Specific Regulation of Alternative Splicing 
PLoS Genetics  2005;1(3):e34.
Recent comparative genomic analysis of alternative splicing has shown that protein modularity is an important criterion for functional alternative splicing events. Exons that are alternatively spliced in multiple organisms are much more likely to be an exact multiple of 3 nt in length, representing a class of “modular” exons that can be inserted or removed from the transcripts without affecting the rest of the protein. To understand the precise roles of these modular exons, in this paper we have analyzed microarray data for 3,126 alternatively spliced exons across ten mouse tissues generated by Pan and coworkers. We show that modular exons are strongly associated with tissue-specific regulation of alternative splicing. Exons that are alternatively spliced at uniformly high transcript inclusion levels or uniformly low levels show no preference for protein modularity. In contrast, alternatively spliced exons with dramatic changes of inclusion levels across mouse tissues (referred to as “tissue-switched” exons) are both strikingly biased to be modular and are strongly conserved between human and mouse. The analysis of different subsets of tissue-switched exons shows that the increased protein modularity cannot be explained by the overall exon inclusion level, but is specifically associated with tissue-switched alternative splicing.
Alternative splicing is a biological process that generates multiple mRNA and protein variants through alternative combinations of protein-coding exons. It is a widespread mechanism of gene regulation in higher eukaryotes. In recent years, scientists have found that when an exon is observed to be alternatively spliced in multiple species, its length is much more likely to be an exact multiple of three nucleotides. Since each amino acid is encoded by three nucleotides, these exons can be inserted or removed from the transcript as a “modular” protein-coding unit, without affecting the downstream protein translation. However, the precise roles of these modular exons in gene regulation and genome evolution remain unclear.
Xing and Lee have now investigated these modular exons using high-throughput genomics data. They analyzed the mouse splicing microarray data from the research group of Dr. Benjamin Blencowe at University of Toronto. Exons whose alternative splicing levels vary dramatically across multiple tissues are much more likely to be modular exons and are highly conserved during human and mouse evolution. This study establishes a strong link between protein modularity of alternatively spliced exons and tissue-specific regulation of alternative splicing. It provides new insights into the function and regulation of alternative splicing and how it evolves.
PMCID: PMC1201369  PMID: 16170410
12.  A chromatin code for alternative splicing involving a putative association between CTCF and HP1α proteins 
BMC Biology  2015;13:31.
Alternative splicing is primarily controlled by the activity of splicing factors and by the elongation of the RNA polymerase II (RNAPII). Recent experiments have suggested a new complex network of splicing regulation involving chromatin, transcription and multiple protein factors. In particular, the CCCTC-binding factor (CTCF), the Argonaute protein AGO1, and members of the heterochromatin protein 1 (HP1) family have been implicated in the regulation of splicing associated with chromatin and the elongation of RNAPII. These results raise the question of whether these proteins may associate at the chromatin level to modulate alternative splicing.
Using chromatin immunoprecipitation sequencing (ChIP-Seq) data for CTCF, AGO1, HP1α, H3K27me3, H3K9me2, H3K36me3, RNAPII, total H3 and 5metC and alternative splicing arrays from two cell lines, we have analyzed the combinatorial code of their binding to chromatin in relation to the alternative splicing patterns between two cell lines, MCF7 and MCF10. Using Machine Learning techniques, we identified the changes in chromatin signals that are most significantly associated with splicing regulation between these two cell lines. Moreover, we have built a map of the chromatin signals on the pre-mRNA, that is, a chromatin-based RNA-map, which can explain 606 (68.55%) of the regulated events between MCF7 and MCF10. This chromatin code involves the presence of HP1α, CTCF, AGO1, RNAPII and histone marks around regulated exons and can differentiate patterns of skipping and inclusion. Additionally, we found a significant association of HP1α and CTCF activities around the regulated exons and a putative DNA binding site for HP1α.
Our results show that a considerable number of alternative splicing events could have a chromatin-dependent regulation involving the association of HP1α and CTCF near regulated exons. Additionally, we find further evidence for the involvement of HP1α and AGO1 in chromatin-related splicing regulation.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-015-0141-5) contains supplementary material, which is available to authorized users.
PMCID: PMC4446157  PMID: 25934638
Chromatin; Splicing; Histones; Splicing code
13.  Assessing long-distance RNA sequence connectivity via RNA-templated DNA–DNA ligation 
eLife  null;4:e03700.
Many RNAs, including pre-mRNAs and long non-coding RNAs, can be thousands of nucleotides long and undergo complex post-transcriptional processing. Multiple sites of alternative splicing within a single gene exponentially increase the number of possible spliced isoforms, with most human genes currently estimated to express at least ten. To understand the mechanisms underlying these complex isoform expression patterns, methods are needed that faithfully maintain long-range exon connectivity information in individual RNA molecules. In this study, we describe SeqZip, a methodology that uses RNA-templated DNA–DNA ligation to retain and compress connectivity between distant sequences within single RNA molecules. Using this assay, we test proposed coordination between distant sites of alternative exon utilization in mouse Fn1, and we characterize the extraordinary exon diversity of Drosophila melanogaster Dscam1.
eLife digest
A flow chart can show how an outcome can be achieved from a particular start point by breaking down an activity into a list of possible steps. Often, a flow chart contains several alternative steps, not all of which are taken every time the flow chart is used. The same can be said of genes, which are biological instructions that often contain many options within their DNA sequences.
Proteins—which perform many roles in cells—are built following the instructions contained in genes. First, the DNA sequence of the gene is copied. This produces a molecule of ribonucleic acid (RNA), which is able to move around the cell to find the machinery that can use the genetic information to make a protein. Genes and their RNA copies contain instructions with more steps—called exons—than are necessary to make a working protein, so extra exons are removed (‘spliced’) from the RNA copies. Different combinations of exons can be removed, so splicing can make different versions of the RNA called isoforms. These allow a single gene to build many different proteins. In fruit flies, for example, the different exons of the gene Dscam1 can be spliced into one of 38,016 unique RNA isoforms.
Current technology only allows researchers to deduce the sequence of RNA molecules by combining sequences recorded from short fragments of the molecule. However, before splicing, RNA molecules tend to be much longer than this, so this restricts our understanding of the RNA isoforms found in cells. Here, Roy et al. devised and tested a new method called SeqZip to solve this problem.
SeqZip uses short fragments of DNA called ligamers that can only stick to the sections of RNA that will remain after the molecule has been spliced. After splicing, the ligamers can be stuck together to make a DNA replica of the spliced RNA. The end product is at least 49 times shorter than the original RNA, so it is easier to sequence. In addition, the combinations of the ligamers in the DNA replica show which exons of a specific gene are kept and which ones are spliced out.
To test the method, Roy et al. studied a mouse gene that has six RNA isoforms. SeqZip reduced the length of the RNA by five times and made it possible to measure how frequently the different isoforms naturally arise. Roy et al. also used SeqZip to work out which isoforms of the Dscam1 gene are used at different stages in the life of fruit fly larvae. SeqZip can provide insights into how complex organisms like flies, mice, and humans have evolved with relatively few—a little over 20,000—genes in their genomes.
PMCID: PMC4442144  PMID: 25866926
ligation; Dscam1; RNA-templated; isoform; alternative splicing; fibronectin; D. melanogaster; mouse
14.  spliceR: an R package for classification of alternative splicing and prediction of coding potential from RNA-seq data 
BMC Bioinformatics  2014;15:81.
RNA-seq data is currently underutilized, in part because it is difficult to predict the functional impact of alternate transcription events. Recent software improvements in full-length transcript deconvolution prompted us to develop spliceR, an R package for classification of alternative splicing and prediction of coding potential.
spliceR uses the full-length transcript output from RNA-seq assemblers to detect single or multiple exon skipping, alternative donor and acceptor sites, intron retention, alternative first or last exon usage, and mutually exclusive exon events. For each of these events spliceR also annotates the genomic coordinates of the differentially spliced elements, facilitating downstream sequence analysis. For each transcript isoform fraction values are calculated to identify transcript switching between conditions. Lastly, spliceR predicts the coding potential, as well as the potential nonsense mediated decay (NMD) sensitivity of each transcript.
spliceR is an easy-to-use tool that extends the usability of RNA-seq and assembly technologies by allowing greater depth of annotation of RNA-seq data. spliceR is implemented as an R package and is freely available from the Bioconductor repository (
PMCID: PMC3998036  PMID: 24655717
spliceR; RNA-Seq; Alternative splicing; Nonsense mediated decay (NMD); Isoform switch
15.  Challenges in estimating percent inclusion of alternatively spliced junctions from RNA-seq data 
BMC Bioinformatics  2012;13(Suppl 6):S11.
Transcript quantification is a long-standing problem in genomics and estimating the relative abundance of alternatively-spliced isoforms from the same transcript is an important special case. Both problems have recently been illuminated by high-throughput RNA sequencing experiments which are quickly generating large amounts of data. However, much of the signal present in this data is corrupted or obscured by biases resulting in non-uniform and non-proportional representation of sequences from different transcripts. Many existing analyses attempt to deal with these and other biases with various task-specific approaches, which makes direct comparison between them difficult. However, two popular tools for isoform quantification, MISO and Cufflinks, have adopted a general probabilistic framework to model and mitigate these biases in a more general fashion. These advances motivate the need to investigate the effects of RNA-seq biases on the accuracy of different approaches for isoform quantification. We conduct the investigation by building models of increasing sophistication to account for noise introduced by the biases and compare their accuracy to the established approaches.
We focus on methods that estimate the expression of alternatively-spliced isoforms with the percent-spliced-in (PSI) metric for each exon skipping event. To improve their estimates, many methods use evidence from RNA-seq reads that align to exon bodies. However, the methods we propose focus on reads that span only exon-exon junctions. As a result, our approaches are simpler and less sensitive to exon definitions than existing methods, which enables us to distinguish their strengths and weaknesses more easily. We present several probabilistic models of of position-specific read counts with increasing complexity and compare them to each other and to the current state-of-the-art methods in isoform quantification, MISO and Cufflinks. On a validation set with RT-PCR measurements for 26 cassette events, some of our methods are more accurate and some are significantly more consistent than these two popular tools. This comparison demonstrates the challenges in estimating the percent inclusion of alternatively spliced junctions and illuminates the tradeoffs between different approaches.
PMCID: PMC3330053  PMID: 22537040
16.  An EMT–Driven Alternative Splicing Program Occurs in Human Breast Cancer and Modulates Cellular Phenotype 
PLoS Genetics  2011;7(8):e1002218.
Epithelial-mesenchymal transition (EMT), a mechanism important for embryonic development, plays a critical role during malignant transformation. While much is known about transcriptional regulation of EMT, alternative splicing of several genes has also been correlated with EMT progression, but the extent of splicing changes and their contributions to the morphological conversion accompanying EMT have not been investigated comprehensively. Using an established cell culture model and RNA–Seq analyses, we determined an alternative splicing signature for EMT. Genes encoding key drivers of EMT–dependent changes in cell phenotype, such as actin cytoskeleton remodeling, regulation of cell–cell junction formation, and regulation of cell migration, were enriched among EMT–associated alternatively splicing events. Our analysis suggested that most EMT–associated alternative splicing events are regulated by one or more members of the RBFOX, MBNL, CELF, hnRNP, or ESRP classes of splicing factors. The EMT alternative splicing signature was confirmed in human breast cancer cell lines, which could be classified into basal and luminal subtypes based exclusively on their EMT–associated splicing pattern. Expression of EMT–associated alternative mRNA transcripts was also observed in primary breast cancer samples, indicating that EMT–dependent splicing changes occur commonly in human tumors. The functional significance of EMT–associated alternative splicing was tested by expression of the epithelial-specific splicing factor ESRP1 or by depletion of RBFOX2 in mesenchymal cells, both of which elicited significant changes in cell morphology and motility towards an epithelial phenotype, suggesting that splicing regulation alone can drive critical aspects of EMT–associated phenotypic changes. The molecular description obtained here may aid in the development of new diagnostic and prognostic markers for analysis of breast cancer progression.
Author Summary
Epithelial-to-mesenchymal transition (EMT) is the process by which cancer cells lose their epithelial characteristics and obtain a mesenchymal phenotype that is thought to allow them to migrate away from the primary tumor. A better understanding of how EMT is controlled would be valuable in predicting the likelihood of metastasis and in designing targeted therapies to block metastatic progression. While there have been many studies on the contribution of changes in gene expression to EMT, much less is known regarding the role of alternative splicing of mRNA during EMT. Alternative splicing can produce different protein isoforms from the same gene that often have distinct activities and functions. Here, we used a recently developed method to characterize changes in alternative splicing during EMT and found that thousands of multi-exon genes underwent alternative splicing. Alternative isoform expression was confirmed in human breast cancer cell lines and in primary human breast cancer samples, indicating that EMT–dependent splicing changes occur commonly in human tumors. Since EMT is considered an early step in metastatic progression, novel markers of EMT that we identified in human breast cancer samples might become valuable prognostic and diagnostic tools if confirmed in a larger cohort of patients.
PMCID: PMC3158048  PMID: 21876675
17.  Unproductive alternative splicing and nonsense mRNAs: A widespread phenomenon among plant circadian clock genes 
Biology Direct  2012;7:20.
Recent mapping of eukaryotic transcriptomes and spliceomes using massively parallel RNA sequencing (RNA-seq) has revealed that the extent of alternative splicing has been considerably underestimated. Evidence also suggests that many pre-mRNAs undergo unproductive alternative splicing resulting in incorporation of in-frame premature termination codons (PTCs). The destinies and potential functions of the PTC-harboring mRNAs remain poorly understood. Unproductive alternative splicing in circadian clock genes presents a special case study because the daily oscillations of protein expression levels require rapid and steep adjustments in mRNA levels.
We conducted a systematic survey of alternative splicing of plant circadian clock genes using RNA-seq and found that many Arabidopsis thaliana circadian clock-associated genes are alternatively spliced. Results were confirmed using reverse transcription polymerase chain reaction (RT-PCR), quantitative RT-PCR (qRT-PCR), and/or Sanger sequencing. Intron retention events were frequently observed in mRNAs of the CCA1/LHY-like subfamily of MYB transcription factors. In contrast, the REVEILLE2 (RVE2) transcript was alternatively spliced via inclusion of a "poison cassette exon" (PCE). The PCE type events introducing in-frame PTCs are conserved in some mammalian and plant serine/arginine-rich splicing factors. For some circadian genes such as CCA1 the ratio of the productive isoform (i.e., a representative splice variant encoding the full-length protein) to its PTC counterpart shifted sharply under specific environmental stress conditions.
Our results demonstrate that unproductive alternative splicing is a widespread phenomenon among plant circadian clock genes that frequently generates mRNA isoforms harboring in-frame PTCs. Because LHY and CCA1 are core components of the plant central circadian oscillator, the conservation of alternatively spliced variants between CCA1 and LHY and for CCA1 across phyla [2] indicates a potential role of nonsense transcripts in regulation of circadian rhythms. Most of the alternatively spliced isoforms harbor in-frame PTCs that arise from full or partial intron retention events. However, a PTC in the RVE2 transcript is introduced through a PCE event. The conservation of AS events and modulation of the relative abundance of nonsense isoforms by environmental and diurnal conditions suggests possible regulatory roles for these alternatively spliced transcripts in circadian clock function. The temperature-dependent expression of the PTC transcripts among members of CCA1/LHY subfamily indicates that alternative splicing may be involved in regulation of the clock temperature compensation mechanism.
This article was reviewed by Dr. Eugene Koonin, Dr. Chungoo Park (nominated by Dr. Kateryna Makova), and Dr. Marcelo Yanovsky (nominated by Dr. Valerian Dolja).
PMCID: PMC3403997  PMID: 22747664
Arabidopsis thaliana; Alternative splicing; Circadian clock; RNA-seq; Intron retention; Cassette exon; Nonsense mRNAs; Premature termination codon; CIRCADIAN CLOCK ASSOCIATED 1 (CCA1); LATE ELONGATED HYPOCOTYL (LHY); REVEILLE 2 (RVE2).
18.  iCLIP Predicts the Dual Splicing Effects of TIA-RNA Interactions 
PLoS Biology  2010;8(10):e1000530.
Transcriptome-wide analysis of protein-RNA interactions predicts the dual splicing effects of TIA proteins, showing that their local enhancing function is associated with diverse distal splicing silencing effects.
The regulation of alternative splicing involves interactions between RNA-binding proteins and pre-mRNA positions close to the splice sites. T-cell intracellular antigen 1 (TIA1) and TIA1-like 1 (TIAL1) locally enhance exon inclusion by recruiting U1 snRNP to 5′ splice sites. However, effects of TIA proteins on splicing of distal exons have not yet been explored. We used UV-crosslinking and immunoprecipitation (iCLIP) to find that TIA1 and TIAL1 bind at the same positions on human RNAs. Binding downstream of 5′ splice sites was used to predict the effects of TIA proteins in enhancing inclusion of proximal exons and silencing inclusion of distal exons. The predictions were validated in an unbiased manner using splice-junction microarrays, RT-PCR, and minigene constructs, which showed that TIA proteins maintain splicing fidelity and regulate alternative splicing by binding exclusively downstream of 5′ splice sites. Surprisingly, TIA binding at 5′ splice sites silenced distal cassette and variable-length exons without binding in proximity to the regulated alternative 3′ splice sites. Using transcriptome-wide high-resolution mapping of TIA-RNA interactions we evaluated the distal splicing effects of TIA proteins. These data are consistent with a model where TIA proteins shorten the time available for definition of an alternative exon by enhancing recognition of the preceding 5′ splice site. Thus, our findings indicate that changes in splicing kinetics could mediate the distal regulation of alternative splicing.
Author Summary
Studies of splicing regulation have generally focused on RNA elements located close to alternative exons. Recently, it has been suggested that splicing of alternative exons can also be regulated by distal regulatory sites, but the underlying mechanism is not clear. The TIA proteins are key splicing regulators that enhance the recognition of 5′ splice sites, and their distal effects have remained unexplored so far. Here, we use a new method to map the positions of TIA-RNA interactions with high resolution on a transcriptome-wide scale. The identified binding positions successfully predict the local enhancing and distal silencing effects of TIA proteins. In particular, we show that TIA proteins can regulate distal alternative 3′ splice sites by binding at the 5′ splice site of the preceding exon. This result suggests that alternative splicing is affected by the timing of alternative exon definition relative to the recognition of the preceding 5′ splice site. These findings highlight the importance of analysing distal regulatory sites in order to fully understand the regulation of alternative splicing.
PMCID: PMC2964331  PMID: 21048981
19.  Identifiability of isoform deconvolution from junction arrays and RNA-Seq 
Bioinformatics  2009;25(23):3056-3059.
Motivation: Splice junction microarrays and RNA-seq are two popular ways of quantifying splice variants within a cell. Unfortunately, isoform expressions cannot always be determined from the expressions of individual exons and splice junctions. While this issue has been noted before, the extent of the problem on various platforms has not yet been explored, nor have potential remedies been presented.
Results: We propose criteria that will guarantee identifiability of an isoform deconvolution model on exon and splice junction arrays and in RNA-Seq. We show that up to 97% of 2256 alternatively spliced human genes selected from the RefSeq database lead to identifiable gene models in RNA-seq, with similar results in mouse. However, in the Human Exon array only 26% of these genes lead to identifiable models, and even in the most comprehensive splice junction array only 69% lead to identifiable models.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3167695  PMID: 19762346
20.  Genome-wide association between DNA methylation and alternative splicing in an invertebrate 
BMC Genomics  2012;13:480.
Gene bodies are the most evolutionarily conserved targets of DNA methylation in eukaryotes. However, the regulatory functions of gene body DNA methylation remain largely unknown. DNA methylation in insects appears to be primarily confined to exons. Two recent studies in Apis mellifera (honeybee) and Nasonia vitripennis (jewel wasp) analyzed transcription and DNA methylation data for one gene in each species to demonstrate that exon-specific DNA methylation may be associated with alternative splicing events. In this study we investigated the relationship between DNA methylation, alternative splicing, and cross-species gene conservation on a genome-wide scale using genome-wide transcription and DNA methylation data.
We generated RNA deep sequencing data (RNA-seq) to measure genome-wide mRNA expression at the exon- and gene-level. We produced a de novo transcriptome from this RNA-seq data and computationally predicted splice variants for the honeybee genome. We found that exons that are included in transcription are higher methylated than exons that are skipped during transcription. We detected enrichment for alternative splicing among methylated genes compared to unmethylated genes using fisher’s exact test. We performed a statistical analysis to reveal that the presence of DNA methylation or alternative splicing are both factors associated with a longer gene length and a greater number of exons in genes. In concordance with this observation, a conservation analysis using BLAST revealed that each of these factors is also associated with higher cross-species gene conservation.
This study constitutes the first genome-wide analysis exhibiting a positive relationship between exon-level DNA methylation and mRNA expression in the honeybee. Our finding that methylated genes are enriched for alternative splicing suggests that, in invertebrates, exon-level DNA methylation may play a role in the construction of splice variants by positively influencing exon inclusion during transcription. The results from our cross-species homology analysis suggest that DNA methylation and alternative splicing are genetic mechanisms whose utilization could contribute to a longer gene length and a slower rate of gene evolution.
PMCID: PMC3526459  PMID: 22978521
21.  DiffSplice: the genome-wide detection of differential splicing events with RNA-seq 
Nucleic Acids Research  2012;41(2):e39.
The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at
PMCID: PMC3553996  PMID: 23155066
22.  Drosha Promotes Splicing of a Pre-microRNA-like Alternative Exon 
PLoS Genetics  2014;10(5):e1004312.
The ribonuclease III enzyme Drosha has a central role in the biogenesis of microRNA (miRNA) by binding and cleaving hairpin structures in primary RNA transcripts into precursor miRNAs (pre-miRNAs). Many miRNA genes are located within protein-coding host genes and cleaved by Drosha in a manner that is coincident with splicing of introns by the spliceosome. The close proximity of splicing and pre-miRNA biogenesis suggests a potential for co-regulation of miRNA and host gene expression, though this relationship is not completely understood. Here, we describe a cleavage-independent role for Drosha in the splicing of an exon that has a predicted hairpin structure resembling a Drosha substrate. We find that Drosha can cleave the alternatively spliced exon 5 of the eIF4H gene into a pre-miRNA both in vitro and in cells. However, the primary role of Drosha in eIF4H gene expression is to promote the splicing of exon 5. Drosha binds to the exon and enhances splicing in a manner that depends on RNA structure but not on cleavage by Drosha. We conclude that Drosha can function like a splicing enhancer and promote exon inclusion. Our results reveal a new mechanism of alternative splicing regulation involving a cleavage-independent role for Drosha in splicing.
Author Summary
MicroRNAs (miRNAs) are short non-coding RNAs that function in gene silencing and are produced by cleavage from a larger primary RNA transcript through a reaction that is carried out by the Microprocessor. Primary miRNA transcripts are often located within the introns of genes. Thus, both the Microprocessor and the spliceosome, which is responsible for pre-mRNA splicing, interact with the same sequences, though little is known about how these two processes influence each other. In this study, we discovered that the alternatively spliced eIF4H exon 5 is predicted to form an RNA hairpin that resembles a Microprocessor substrate. We found that the Microprocessor can bind and cleave exon 5, which precludes inclusion of the exon in the mRNA. However, we find that Drosha, a component of the Microprocessor, primarily functions to enhance exon 5 splicing both in vitro and in cells, rather than to cleave the RNA. Our results suggest that the Microprocessor has a role in splicing that is distinct from its role in miRNA biogenesis. This Microprocessor activity represents a new function for the complex that may be an important mechanism for regulating alternative splicing.
PMCID: PMC4006729  PMID: 24786770
23.  Kinetic competition during the transcription cycle results in stochastic RNA processing 
eLife  2014;3:e03939.
Synthesis of mRNA in eukaryotes involves the coordinated action of many enzymatic processes, including initiation, elongation, splicing, and cleavage. Kinetic competition between these processes has been proposed to determine RNA fate, yet such coupling has never been observed in vivo on single transcripts. In this study, we use dual-color single-molecule RNA imaging in living human cells to construct a complete kinetic profile of transcription and splicing of the β-globin gene. We find that kinetic competition results in multiple competing pathways for pre-mRNA splicing. Splicing of the terminal intron occurs stochastically both before and after transcript release, indicating there is not a strict quality control checkpoint. The majority of pre-mRNAs are spliced after release, while diffusing away from the site of transcription. A single missense point mutation (S34F) in the essential splicing factor U2AF1 which occurs in human cancers perturbs this kinetic balance and defers splicing to occur entirely post-release.
eLife digest
To make a protein, part of a DNA sequence is copied to make a messenger RNA (or mRNA) molecule in a process known as transcription. The enzyme that builds an mRNA molecule first binds to a start point on a DNA strand, and then uses the DNA sequence to build a ‘pre-mRNA’ molecule until a stop signal is reached.
To make the final mRNA molecule, sections called introns are removed from the pre-mRNA molecules, and the parts left behind—known as exons—are then joined together. This process is called splicing. However, it is not fully understood how the splicing process is coordinated with the other stages of transcription. For example, does splicing occur after the pre-mRNA molecule is completed or while it is still being built? And what controls the order in which these processes occur?
One theory about how the different mRNA-making processes are coordinated is called kinetic competition. This theory states that the fastest process is the most likely to occur, even if the other processes use less energy and so might be expected to be preferred. Alternatively, the different steps may be started and stopped by ‘checkpoints’ that cause the different processes to follow on from each other in a set order.
Coulon et al. used fluorescence microscopy to investigate how mRNA molecules are made during the transcription of a human gene that makes a hemoglobin protein. To make the RNA visible, two different fluorescent markers were introduced into the pre-mRNA that cause different regions of the mRNA to glow in different colors. Coulon et al. made the introns fluoresce red and the exons glow green. Unspliced pre-mRNA molecules contain both introns and exons and so fluoresce in both colors, whereas spliced mRNA molecules contain only exons and so only glow with a green color.
By looking at both the red and green fluorescence signals at the same time, Coulon et al. could see when an intron was spliced out of the pre-mRNA. This revealed that in normal cells, splicing can occur either before or after the RNA is released from where it is transcribed. Thus, splicing and transcription does not follow a set pattern, suggesting that checkpoints do not control the sequence of events. Instead, the fact that a spliced mRNA molecule can be formed in different ways suggests kinetic competition controls the process.
In some cancer cells, there are defects in the cellular machinery that controls splicing. When looking at cells with such a defect, Coulon et al. found that splicing only occurred after transcription was completed. This study thus provides insight into the complex workings of mRNA synthesis and establishes a blueprint for understanding how splicing is impaired in diseases such as cancer.
PMCID: PMC4210818  PMID: 25271374
transcription; RNA processing; splicing; single-molecule imaging; fluctuation analysis; human
24.  FineSplice, enhanced splice junction detection and quantification: a novel pipeline based on the assessment of diverse RNA-Seq alignment solutions 
Nucleic Acids Research  2014;42(8):e71.
Alternative splicing is the main mechanism governing protein diversity. The recent developments in RNA-Seq technology have enabled the study of the global impact and regulation of this biological process. However, the lack of standardized protocols constitutes a major bottleneck in the analysis of alternative splicing. This is particularly important for the identification of exon–exon junctions, which is a critical step in any analysis workflow. Here we performed a systematic benchmarking of alignment tools to dissect the impact of design and method on the mapping, detection and quantification of splice junctions from multi-exon reads. Accordingly, we devised a novel pipeline based on TopHat2 combined with a splice junction detection algorithm, which we have named FineSplice. FineSplice allows effective elimination of spurious junction hits arising from artefactual alignments, achieving up to 99% precision in both real and simulated data sets and yielding superior F1 scores under most tested conditions. The proposed strategy conjugates an efficient mapping solution with a semi-supervised anomaly detection scheme to filter out false positives and allows reliable estimation of expressed junctions from the alignment output. Ultimately this provides more accurate information to identify meaningful splicing patterns. FineSplice is freely available at
PMCID: PMC4005686  PMID: 24574529
25.  The Germ Cell Nuclear Proteins hnRNP G-T and RBMY Activate a Testis-Specific Exon 
PLoS Genetics  2009;5(11):e1000707.
The human testis has almost as high a frequency of alternative splicing events as brain. While not as extensively studied as brain, a few candidate testis-specific splicing regulator proteins have been identified, including the nuclear RNA binding proteins RBMY and hnRNP G-T, which are germ cell-specific versions of the somatically expressed hnRNP G protein and are highly conserved in mammals. The splicing activator protein Tra2β is also highly expressed in the testis and physically interacts with these hnRNP G family proteins. In this study, we identified a novel testis-specific cassette exon TLE4-T within intron 6 of the human transducing-like enhancer of split 4 (TLE4) gene which makes a more transcriptionally repressive TLE4 protein isoform. TLE4-T splicing is normally repressed in somatic cells because of a weak 5′ splice site and surrounding splicing-repressive intronic regions. TLE4-T RNA pulls down Tra2β and hnRNP G proteins which activate its inclusion. The germ cell-specific RBMY and hnRNP G-T proteins were more efficient in stimulating TLE4-T incorporation than somatically expressed hnRNP G protein. Tra2b bound moderately to TLE4-T RNA, but more strongly to upstream sites to potently activate an alternative 3′ splice site normally weakly selected in the testis. Co-expression of Tra2β with either hnRNP G-T or RBMY re-established the normal testis physiological splicing pattern of this exon. Although they can directly bind pre-mRNA sequences around the TLE4-T exon, RBMY and hnRNP G-T function as efficient germ cell-specific splicing co-activators of TLE4-T. Our study indicates a delicate balance between the activity of positive and negative splicing regulators combinatorially controls physiological splicing inclusion of exon TLE4-T and leads to modulation of signalling pathways in the testis. In addition, we identified a high-affinity binding site for hnRNP G-T protein, showing it is also a sequence-specific RNA binding protein.
Author Summary
This study investigates tissue-specific alternative splicing, which plays a key role in generating diversity in animal cells. We found a new testis-specific exon in a human homologue of the important Drosophila developmental regulator Groucho, which is activated by germ cell RNA binding proteins. By analyzing splicing control of this exon, we elucidated how variations in the activity and expression of splicing regulators together counterbalance splicing activation, and achieve more tightly regulated physiological splicing patterns. We find that although this new human testis-specific exon is not conserved in mice, it is functionally important in that it encodes a peptide which increases the activity of this developmental regulator as a transcriptional repressor. This study provides new insights into how signalling pathways are evolving in human germ cells and the possible molecular defects that might be occurring in infertile men who have genetic deletions of germ cell-specific RNA binding proteins.
PMCID: PMC2762042  PMID: 19893608

