mRNA-Seq is a precise and highly reproducible technique for measurement of transcripts levels and yields sequence information of a transcriptome at a single nucleotide base-level thus enabling us to determine splice junctions and alternative splicing events with high confidence. Often analysis of mRNA-Seq data does not attempt to quantify the expressions at isoform level. In this paper our objective would be use the mRNA-Seq data to infer expression at isoform level, where splicing patterns of a gene is assumed to be known. A Bayesian latent variable based modeling framework is proposed here, where the parameterization enables us to infer at various levels. For example, expression variability of an isoform across different conditions; the model parameterization also allows us to carry out two-sample comparisons, e.g., using a Bayesian t-test, in addition simple presence or absence of an isoform can also be estimated by the use of the latent variables present in the model. In this paper we would carry out inference on isoform expression under different normalization techniques, since it has been recently shown that one of the most prominent sources of variation in differential call using mRNA-Seq data is the normalization method used. The statistical framework is developed for multiple isoforms and easily extends to reads mapping to multiple genes. This could be achieved by slight conceptual modifications in definitions of what we consider as a gene and what as an exon. Additionally proposed framework can be extended by appropriate modeling of the design matrix to infer about yet unknown novel transcripts. However such attempts should be made judiciously since the input date used in the proposed model does not use reads from splice junctions.
mRNA-Seq; isoform expression; Bayesian latent variable modeling; multi-sample comparison; Bayesian t-test; spike-n-slab method
Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e.g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS, and PAS information, especially for isoforms whose expression levels are significantly high. The software is publicly available for free at http://www.cs.ucr.edu/∼jianxing/IsoInfer.html.
alternative splicing; convex quadratic programming; deep sequencing; isoform inference; RNA-Seq
Motivation: Splice junction microarrays and RNA-seq are two popular ways of quantifying splice variants within a cell. Unfortunately, isoform expressions cannot always be determined from the expressions of individual exons and splice junctions. While this issue has been noted before, the extent of the problem on various platforms has not yet been explored, nor have potential remedies been presented.
Results: We propose criteria that will guarantee identifiability of an isoform deconvolution model on exon and splice junction arrays and in RNA-Seq. We show that up to 97% of 2256 alternatively spliced human genes selected from the RefSeq database lead to identifiable gene models in RNA-seq, with similar results in mouse. However, in the Human Exon array only 26% of these genes lead to identifiable models, and even in the most comprehensive splice junction array only 69% lead to identifiable models.
Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: The Affymetrix Human Exon Junction Array is a newly designed high-density exon-sensitive microarray for global analysis of alternative splicing. Contrary to the Affymetrix exon 1.0 array, which only contains four probes per exon and no probes for exon–exon junctions, this new junction array averages eight probes per probeset targeting all exons and exon–exon junctions observed in the human mRNA/EST transcripts, representing a significant increase in the probe density for alternative splicing events. Here, we present MADS+, a computational pipeline to detect differential splicing events from the Affymetrix exon junction array data. For each alternative splicing event, MADS+ evaluates the signals of probes targeting competing transcript isoforms to identify exons or splice sites with different levels of transcript inclusion between two sample groups. MADS+ is used routinely in our analysis of Affymetrix exon junction arrays and has a high accuracy in detecting differential splicing events. For example, in a study of the novel epithelial-specific splicing regulator ESRP1, MADS+ detects hundreds of exons whose inclusion levels are dependent on ESRP1, with a RT-PCR validation rate of 88.5% (153 validated out of 173 tested).
Availability: MADS+ scripts, documentations and annotation files are available at http://www.medicine.uiowa.edu/Labs/Xing/MADSplus/.
SPACE is an algorithm developed to predict and quantify the pre-mRNA splicing structure of transcripts using exon and ‘exon plus junction’ microarray data.
Exon and exon+junction microarrays are promising tools for studying alternative splicing. Current analytical tools applied to these arrays lack two relevant features: the ability to predict unknown spliced forms and the ability to quantify the concentration of known and unknown isoforms. SPACE is an algorithm that has been developed to (1) estimate the number of different transcripts expressed under several conditions, (2) predict the precursor mRNA splicing structure and (3) quantify the transcript concentrations including unknown forms. The results presented here show its robustness and accuracy for real and simulated data.
Through alternative splicing, most human genes express multiple isoforms that often differ in function. To infer isoform regulation from high-throughput sequencing of cDNA fragments (RNA-seq), we developed the mixture-of-isoforms (MISO) model, a statistical model that estimates expression of alternatively spliced exons and isoforms and assesses confidence in these estimates. Incorporation of mRNA fragment length distribution in paired-end RNA-seq greatly improved estimation of alternative-splicing levels. MISO also detects differentially regulated exons or isoforms. Application of MISO implicated the RNA splicing factor hnRNP H1 in the regulation of alternative cleavage and polyadenylation, a role that was supported by UV cross-linking–immunoprecipitation sequencing (CLIP-seq) analysis in human cells. Our results provide a probabilistic framework for RNA-seq analysis, give functional insights into pre-mRNA processing and yield guidelines for the optimal design of RNA-seq experiments for studies of gene and isoform expression.
Splicing is a cellular mechanism, which dictates eukaryotic gene expression by removing the noncoding introns and ligating the coding exons in the form of a messenger RNA molecule. Alternative splicing (AS) adds a major level of complexity to this mechanism and thus to the regulation of gene expression. This widespread cellular phenomenon generates multiple messenger RNA isoforms from a single gene, by utilizing alternative splice sites and promoting different exon–intron inclusions and exclusions. AS greatly increases the coding potential of eukaryotic genomes and hence contributes to the diversity of eukaryotic proteomes. Mutations that lead to disruptions of either constitutive splicing or AS cause several diseases, among which are myotonic dystrophy and cystic fibrosis. Aberrant splicing is also well established in cancer states. Identification of rare novel mutations associated with splice-site recognition, and splicing regulation in general, could provide further insight into genetic mechanisms of rare diseases. Here, disease relevance of aberrant splicing is reviewed, and the new methodological approach of starting from disease phenotype, employing exome sequencing and identifying rare mutations affecting splicing regulation is described. Exome sequencing has emerged as a reliable method for finding sequence variations associated with various disease states. To date, genetic studies using exome sequencing to find disease-causing mutations have focused on the discovery of nonsynonymous single nucleotide polymorphisms that alter amino acids or introduce early stop codons, or on the use of exome sequencing as a means to genotype known single nucleotide polymorphisms. The involvement of splicing mutations in inherited diseases has received little attention and thus likely occurs more frequently than currently estimated. Studies of exome sequencing followed by molecular and bioinformatic analyses have great potential to reveal the high impact of splicing mutations underlying human disease.
Motivation: Transcripts from ∼95% of human multi-exon genes are subject to alternative splicing (AS). The growing interest in AS is propelled by its prominent contribution to transcriptome and proteome complexity and the role of aberrant AS in numerous diseases. Recent technological advances enable thousands of exons to be simultaneously profiled across diverse cell types and cellular conditions, but require accurate identification of condition-specific splicing changes. It is necessary to accurately identify such splicing changes to elucidate the underlying regulatory programs or link the splicing changes to specific diseases.
Results: We present a probabilistic model tailored for high-throughput AS data, where observed isoform levels are explained as combinations of condition-specific AS signals. According to our formulation, given an AS dataset our tasks are to detect common signals in the data and identify the exons relevant to each signal. Our model can incorporate prior knowledge about underlying AS signals, measurement quality and gene expression level effects. Using a large-scale multi-tissue AS dataset, we demonstrate the advantage of our method over standard alternative approaches. In addition, we describe newly found tissue-specific AS signals which were verified experimentally, and discuss associated regulatory features.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
The complexity of mammalian transcriptomes is compounded by alternative splicing which allows one gene to produce multiple transcript isoforms. However, transcriptome comparison has been limited to differential analysis at the gene level instead of the individual transcript isoform level. High-throughput sequencing technologies and high-resolution tiling arrays provide an unprecedented opportunity to compare transcriptomes at the level of individual splice variants. However, sequence read coverage or probe intensity at each position may represent a family of splice variants instead of one single isoform. Here we propose a hierarchical Bayesian model, BASIS (Bayesian Analysis of Splicing IsoformS), to infer the differential expression level of each transcript isoform in response to two conditions. A latent variable was introduced to perform direct statistical selection of differentially expressed isoforms. Model parameters were inferred based on an ergodic Markov chain generated by our Gibbs sampler. BASIS has the ability to borrow information across different probes (or positions) from the same genes and different genes. BASIS can handle the heteroskedasticity of probe intensity or sequence read coverage. We applied BASIS to a human tiling-array data set and a mouse RNA-seq data set. Some of the predictions were validated by quantitative real-time RT–PCR experiments.
Both transcription and post-transcriptional processes, such as alternative splicing, play crucial roles in controlling developmental programs in metazoans. Recently emerged RNA-seq method has brought our understandings of eukaryotic transcriptomes to a new level, because it can resolve both gene expression level and alternative splicing events simultaneously.
To gain a better understanding of cellular differentiation in gonads, we analyzed mRNA profiles from Drosophila testes and ovaries using RNA-seq. We identified a set of genes that have sex-specific isoforms in wild-type (wt) gonads, including several transcription factors. We found that differentiation of sperms from undifferentiated germ cells induced a dramatic down-regulation of RNA splicing factors. Our data confirmed that RNA splicing events are significantly more frequent in the undifferentiated-cell enriched bag of marbles (bam) mutant testis, but down-regulated upon differentiation in wt testis. Consistent with this, we showed that genes required for meiosis and terminal differentiation in wt testis were mainly regulated at the transcriptional level, but not by alternative splicing. Unexpectedly, we observed an increase in expression of all families of chromatin remodeling factors and histone modifying enzymes in the undifferentiated cell-enriched bam testis. More interestingly, chromatin regulators and histone modifying enzymes with opposite enzymatic activities are co-enriched in undifferentiated cells in testis, suggesting these cells may possess dynamic chromatin architecture. Finally, our data revealed many new features of the Drosophila gonadal transcriptomes, and will lead to a more comprehensive understanding of how differential gene expression and splicing regulate gametogenesis in Drosophila. Our data provided a foundation for the systematic study of gene expression and alternative splicing in many interesting areas of germ cell biology in Drosophila, such as the molecular basis for sexual dimorphism and the regulation of the proliferation vs. terminal differentiation programs in germline stem cell lineages. The GEO accession number for the raw and analyzed RNA-seq data is GSE16960.
Transcription; alternative splicing; differentiation; testis; ovary; Drosophila
Differential splice site pairing establishes alternative splicing patterns resulting in the generation of multiple mRNA isoforms. This process is carried out by the spliceosome, which is activated by a series of sequential structural rearrangements of its five core snRNPs. To determine when splice sites become functionally paired, we carried out a series of kinetic trap experiments using pre-mRNAs that undergo alternative 5′ splice site selection or alternative exon inclusion. We show that commitment to splice site pairing in both cases occurs in the A complex, which is characterized by the ATP-dependent association of the U2 snRNP with the branch point. Interestingly, the timing of splice site pairing is independent of the intron or exon definition modes of splice site recognition. Using the ATP analog ATPγS, we showed that ATP hydrolysis is required for splice site pairing independent from U2 snRNP binding to the pre-mRNA. These results identify the A complex as the spliceosomal assembly step dedicated to splice site pairing and suggest that ATP hydrolysis locks splice sites into a splicing pattern after stable U2 snRNP association to the branch point.
Over 40 different human immunodeficiency virus type 1 (HIV-1) mRNAs are produced by alternative splicing of the primary HIV-1 RNA transcripts. In addition, approximately half of the viral RNA remains unspliced and is used as genomic RNA and as mRNA for the Gag and Pol gene products. Regulation of splicing at the HIV-1 3′ splice sites (3′ss) requires suboptimal polypyrimidine tracts, and positive or negative regulation occurs through the binding of cellular factors to cis-acting splicing regulatory elements. We have previously shown that splicing at HIV-1 3′ss A1, which produces single-spliced vif mRNA and promotes the inclusion of HIV exon 2 into both completely and incompletely spliced viral mRNAs, is increased by optimizing the 5′ splice site (5′ss) downstream of exon 2 (5′ss D2). Here we show that the mutations within 5′ss D2 that are predicted to lower or increase the affinity of the 5′ss for U1 snRNP result in reduced or increased Vif expression, respectively. Splicing at 5′ss D2 was not necessary for the effect of 5′ss D2 on Vif expression. In addition, we have found that mutations of the GGGG motif proximal to the 5′ss D2 increase exon 2 inclusion and Vif expression. Finally, we report the presence of a novel exonic splicing enhancer (ESE) element within the 5′-proximal region of exon 2 that facilitates both exon inclusion and Vif expression. This ESE binds specifically to the cellular SR protein SRp75. Our results suggest that the 5′ss D2, the proximal GGGG silencer, and the ESE act competitively to determine the level of vif mRNA splicing and Vif expression. We propose that these positive and negative splicing elements act together to allow the accumulation of vif mRNA and unspliced HIV-1 mRNA, compatible with optimal virus replication.
Alternative splicing of transcripts in a signal-dependent manner has emerged as an important concept to ensure appropriate expression of splice variants under different conditions. Binding of the general splicing factor U2AF to splice sites preceding alternatively spliced exons has been suggested to be an important step for splice site recognition. For splicing to proceed, U2AF has to be replaced by other factors. We show here that U2AF interacts with the signal-dependent splice regulator Sam68 and that forced expression of Sam68 results in enhanced binding of the U2AF65 subunit to an alternatively spliced pre-mRNA sequence in vivo. Conversely, the rapid signal-induced and phosphorylation-dependent interference with Sam68 binding to RNA was accompanied by reduced pre-mRNA occupancy of U2AF in vivo. Our data suggest that Sam68 can affect splice site occupancy by U2AF in signal-dependent splicing. We propose that the induced release of U2AF from pre-mRNA provides a regulatory step to control alternative splicing.
Although the splicing of transcripts from most eukaryotic genes occurs in a constitutive fashion, some genes can undergo a process of alternative splicing. This is a genetically economical process which allows a single gene to give rise to several protein isoforms by the inclusion or exclusion of sequences into or from the mature mRNA. CD44 provides a unique example; more than 1,000 possible isoforms can be produced by the inclusion or exclusion of a central tandem array of 10 alternatively spliced exons. Certain alternatively spliced exons have been ascribed specific functions; however, independent regulation of the inclusion or skipping of each of these exons would clearly demand an extremely complex regulatory network. Such a network would involve the interaction of many exon-specific trans-acting factors with the pre-mRNA. Therefore, to assess whether the exons are indeed independently regulated, we have examined the alternative exon content of a large number of individual CD44 cDNA isoforms. This analysis shows that the downstream alternatively spliced exons are favored over those lying upstream and that alternative exons are often included in blocks rather than singly. Using a novel in vivo alternative splicing assay, we show that intron length has a major influence upon the alternative splicing of CD44. We propose a kinetic model in which short introns may overcome the poor recognition of alternatively spliced exons. These observations suggest that for CD44, intron length has been exploited in the evolution of the genomic structure to enable tissue-specific patterns of splicing to be maintained.
RNA splicing is required to remove introns from pre-mRNA and alternative splicing generates protein diversity. Topoisomerase I (Top1) has been shown to be coupled with splicing by regulating SR splicing proteins. Prior studies on isolated genes also showed that Top1 poisoning by camptothecin (CPT), which traps Top1 cleavage complexes (Top1cc), can alter RNA splicing. Here we tested the impact of Top1 inhibition on splicing at the genome-wide level in human colon carcinoma HCT116 and breast carcinoma MCF7 cells. The RNA of HCT116 cells treated with CPT for various times was analyzed with ExonHit Human Splice Array. Unlike to other exon array platforms, the ExonHit arrays include junction probes that allow the detection of splice variants with high sensitivity and specificity. We report that CPT treatment preferentially affects the splicing of splicing-related factors, such as RBM8A, and generates transcripts coding for inactive proteins lacking key functional domains. The splicing alterations induced by CPT are not observed with cisplatin or vinblastine, and are not simply due to reduced Top1 activity as TOP1 downregulation by siRNA did not alter splicing like CPT treatment. Inhibition of RNA polymerase II (Pol II) hyperphosphorylation by DRB blocked the splicing alteration induced by CPT, which suggests that the rapid Pol II hyperphosphorylation induced by CPT interferes with normal splicing. The preferential effect of CPT on genes encoding splicing factors may explain the abnormal splicing of a large number of genes in response to Top1cc.
Topoisomerase I; splicing; camptothecin; transcription; RNA polymerase II
Bovine papillomavirus type 1 (BPV-1) late pre-mRNAs are spliced in keratinocytes in a differentiation-specific manner: the late leader 5′ splice site alternatively splices to a proximal 3′ splice site (at nucleotide 3225) to express L2 or to a distal 3′ splice site (at nucleotide 3605) to express L1. Two exonic splicing enhancers, each containing two ASF/SF2 (alternative splicing factor/splicing factor 2) binding sites, are located between the two 3′ splice sites and have been identified as regulating alternative 3′ splice site usage. The present report demonstrates for the first time that ASF/SF2 is required under physiological conditions for the expression of BPV-1 late RNAs and for selection of the proximal 3′ splice site for BPV-1 RNA splicing in DT40-ASF cells, a genetically engineered chicken B-cell line that expresses only human ASF/SF2 controlled by a tetracycline-repressible promoter. Depletion of ASF/SF2 from the cells by tetracycline greatly decreased viral RNA expression and RNA splicing at the proximal 3′ splice site while increasing use of the distal 3′ splice site in the remaining viral RNAs. Activation of cells lacking ASF/SF2 through anti-immunoglobulin M-B-cell receptor cross-linking rescued viral RNA expression and splicing at the proximal 3′ splice site and enhanced Akt phosphorylation and expression of the phosphorylated serine/arginine-rich (SR) proteins SRp30s (especially SC35) and SRp40. Treatment with wortmannin, a specific phosphatidylinositol 3-kinase/Akt kinase inhibitor, completely blocked the activation-induced activities. ASF/SF2 thus plays an important role in viral RNA expression and splicing at the proximal 3′ splice site, but activation-rescued viral RNA expression and splicing in ASF/SF2-depleted cells is mediated through the phosphatidylinositol 3-kinase/Akt pathway and is associated with the enhanced expression of other SR proteins.
RNA splicing is an essential step in gene expression, and is often variable, giving rise to multiple alternatively spliced mRNA and protein isoforms from a single gene locus. The design of effective databases to support experimental and computational investigations of alternative splicing (AS) is a significant challenge. In an effort to integrate accurate exon and splice site annotation with current knowledge about splicing regulatory elements and predicted AS events, and to link information about the splicing of orthologous genes in different species, we have developed the Hollywood system. This database was built upon genomic annotation of splicing patterns of known genes derived from spliced alignment of complementary DNAs (cDNAs) and expressed sequence tags, and links features such as splice site sequence and strength, exonic splicing enhancers and silencers, conserved and non-conserved patterns of splicing, and cDNA library information for inferred alternative exons. Hollywood was implemented as a relational database and currently contains comprehensive information for human and mouse. It is accompanied by a web query tool that allows searches for sets of exons with specific splicing characteristics or splicing regulatory element composition, or gives a graphical or sequence-level summary of splicing patterns for a specific gene. A streamlined graphical representation of gene splicing patterns is provided, and these patterns can alternatively be layered onto existing information in the UCSC Genome Browser. The database is accessible at .
Alternative splicing is known to increase the complexity of mammalian transcriptomes since nearly all mammalian genes express multiple pre-mRNA isoforms. However, our knowledge of the extent and function of alternative splicing in early embryonic development is based mainly on a few isolated examples. High throughput technologies now allow us to study genome-wide alternative splicing during mouse development.
A genome-wide analysis of alternative isoform expression in embryonic day 8.5, 9.5 and 11.5 mouse embryos and placenta was carried out using a splicing-sensitive exon microarray. We show that alternative splicing and isoform expression is frequent across developmental stages and tissues, and is comparable in frequency to the variation in whole-transcript expression. The genes that are alternatively spliced across our samples are disproportionately involved in important developmental processes. Finally, we find that a number of RNA binding proteins, including putative splicing factors, are differentially expressed and spliced across our samples suggesting that such proteins may be involved in regulating tissue and temporal variation in isoform expression. Using an example of a well characterized splicing factor, Fox2, we demonstrate that changes in Fox2 expression levels can be used to predict changes in inclusion levels of alternative exons that are flanked by Fox2 binding sites.
We propose that alternative splicing is an important developmental regulatory mechanism. We further propose that gene expression should routinely be monitored at both the whole transcript and the isoform level in developmental studies
Tau protein, which binds to and stabilizes microtubules, is critical for neuronal survival and function. In the human brain, tau pre-mRNA splicing is regulated to maintain a delicate balance of exon 10-containing and exon 10-skipping isoforms. Splicing mutations affecting tau exon 10 alternative splicing lead to tauopathies, a group of neurodegenerative disorders including dementia. Molecular mechanisms regulating tau alternative splicing remain to be elucidated. In this study, we have developed an expression cloning strategy to identify splicing factors that stimulate tau exon 10 inclusion. Using this expression cloning approach, we have identified a previously unknown tau exon 10 splicing regulator, RBM4 (RNA binding motif protein 4). In cells transfected with a tau minigene, RBM4 overexpression leads to an increased inclusion of exon 10, whereas RBM4 down-regulation decreases exon 10 inclusion. The activity of RBM4 in stimulating tau exon 10 inclusion is abolished by mutations in its RNA-binding domain. A putative intronic splicing enhancer located in intron 10 of the tau gene is required for the splicing stimulatory activity of RBM4. Immunohistological analyses reveal that RBM4 is expressed in the human brain regions affected in tauopathy, including the hippocampus and frontal cortex. Our study demonstrates that RBM4 is involved in tau exon 10 alternative splicing. Our work also suggests that down-regulating tau exon 10 splicing activators, such as RBM4, may be of therapeutic potential in tauopathies involving excessive tau exon 10 inclusion.
Alternative splicing creates diverse mRNA isoforms from single genes and thereby enhances complexity of transcript structure and of gene function. We describe a method called spliceotyping, which translates combinatorial mRNA splicing patterns along transcripts into a library of binary strings of nucleic acid tags that encode the exon composition of individual mRNA molecules. The exon inclusion pattern of each analyzed transcript is thus represented as binary data, and the abundance of different splice variants is registered by counts of individual molecules. The technique is illustrated in a model experiment by analyzing the splicing patterns of the adenovirus early 1A gene and the beta actin reference transcript. The method permits many genes to be analyzed in parallel and it will be valuable for elucidating the complex effects of combinatorial splicing.
The human CD44 gene contains 10 variable exons (v1 to v10) that can be alternatively spliced to generate hundreds of different CD44 protein isoforms. Human CD44 variable exon v3 inclusion in the final mRNA depends on a multisite bipartite splicing enhancer located within the exon itself, which we have recently described, and provides the protein domain responsible for growth factor binding to CD44.
We have analyzed the sequence of CD44v3 in 95 mammalian species to report high conservation levels for both its splicing regulatory elements (the 3' splice site and the exonic splicing enhancer), and the functional glycosaminglycan binding site coded by v3. We also report the functional expression of CD44v3 isoforms in peripheral blood cells of different mammalian taxa with both consensus and variant v3 sequences.
CD44v3 mammalian sequences maintain all functional splicing regulatory elements as well as the GAG binding site with the same relative positions and sequence identity previously described during alternative splicing of human CD44. The sequence within the GAG attachment site, which in turn contains the Y motif of the exonic splicing enhancer, is more conserved relative to the rest of exon. Amplification of CD44v3 sequence from mammalian species but not from birds, fish or reptiles, may lead to classify CD44v3 as an exclusive mammalian gene trait.
We examine here the roles of cellular splicing factors and virus regulatory proteins in coordinately regulating alternative splicing of the tat/rev mRNA of equine infectious anemia virus (EIAV). This bicistronic mRNA contains four exons; exons 1 and 2 encode Tat, and exons 3 and 4 encode Rev. In the absence of Rev expression, the four-exon mRNA is synthesized exclusively, but when Rev is expressed, exon 3 is skipped to produce an mRNA that contains only exons 1, 2, and 4. We identify a purine-rich exonic splicing enhancer (ESE) in exon 3 that promotes exon inclusion. Similar to other cellular ESEs that have been identified by other laboratories, the EIAV ESE interacted specifically with SR proteins, a group of serine/arginine-rich splicing factors that function in constitutive and alternative mRNA splicing. Substitution of purines with pyrimidines in the ESE resulted in a switch from exon inclusion to exon skipping in vivo and abolished binding of SR proteins in vitro. Exon skipping was also induced by expression of EIAV Rev. We show that Rev binds to exon 3 RNA in vitro, and while the precise determinants have not been mapped, Rev function in vivo and RNA binding in vitro indicate that the RNA element necessary for Rev responsiveness overlaps or is adjacent to the ESE. We suggest that EIAV Rev promotes exon skipping by interfering with SR protein interactions with RNA or with other splicing factors.
The nuclear cap-binding protein complex (CBC) participates in 5′ splice site selection of introns that are proximal to the mRNA cap. However, it is not known whether CBC has a role in alternative splicing. Using an RT–PCR alternative splicing panel, we analysed 435 alternative splicing events in Arabidopsis thaliana genes, encoding mainly transcription factors, splicing factors and stress-related proteins. Splicing profiles were determined in wild type plants, the cbp20 and cbp80(abh1) single mutants and the cbp20/80 double mutant. The alternative splicing events included alternative 5′ and 3′ splice site selection, exon skipping and intron retention. Significant changes in the ratios of alternative splicing isoforms were found in 101 genes. Of these, 41% were common to all three CBC mutants and 15% were observed only in the double mutant. The cbp80(abh1) and cbp20/80 mutants had many more changes in alternative splicing in common than did cbp20 and cbp20/80 suggesting that CBP80 plays a more significant role in alternative splicing than CBP20, probably being a platform for interactions with other splicing factors. Cap-binding proteins and the CBC are therefore directly involved in alternative splicing of some Arabidopsis genes and in most cases influenced alternative splicing of the first intron, particularly at the 5′ splice site.
The human immunodeficiency virus type 1 (HIV-1) RNA follows a complex splicing pathway in which a single primary transcript either remains unspliced or is alternatively spliced to more than 30 different singly and multiply spliced mRNAs. We have used an in vitro splicing assay to identify cis elements within the viral genome that regulate HIV-1 RNA splicing. A novel splicing regulatory element (SRE) within the first tat coding exon has been detected. This element specifically inhibits splicing at the upstream 3' splice site flanking this tat exon. The element only functions when in the sense orientation and is position dependent when inserted downstream of a heterologous 3' splice site. In vivo, an HIV-1 SRE mutant demonstrated a decrease in unspliced viral RNA, increased levels of single- and double-spliced tat mRNA, and reduced levels of env and rev mRNAs. In addition to the negative cis-acting SRE, the flanking 5' splice site downstream of the first tat coding exon acts positively to increase splicing at the upstream 3' splice sites. These results are consistent with hypotheses of bridging interactions between cellular factors that bind to the 5' splice site and those that bind at the upstream 3' splice site.
Alternative splicing, polyadenylation of pre-messenger RNA molecules and differential promoter usage can produce a variety of transcript isoforms whose respective expression levels are regulated in time and space, thus contributing specific biological functions. However, the repertoire of mammalian alternative transcripts and their regulation are still poorly understood. Second-generation sequencing is now opening unprecedented routes to address the analysis of entire transcriptomes. Here, we developed methods that allow the prediction and quantification of alternative isoforms derived solely from exon expression levels in RNA-Seq data. These are based on an explicit statistical model and enable the prediction of alternative isoforms within or between conditions using any known gene annotation, as well as the relative quantification of known transcript structures. Applying these methods to a human RNA-Seq dataset, we validated a significant fraction of the predictions by RT-PCR. Data further showed that these predictions correlated well with information originating from junction reads. A direct comparison with exon arrays indicated improved performances of RNA-Seq over microarrays in the prediction of skipped exons. Altogether, the set of methods presented here comprehensively addresses multiple aspects of alternative isoform analysis. The software is available as an open-source R-package called Solas at http://cmb.molgen.mpg.de/2ndGenerationSequencing/Solas/.