Precursor mRNA splicing is one of the most highly regulated processes in metazoan species. In addition to generating vast repertoires of RNAs and proteins, splicing has a profound impact on other gene regulatory layers, including mRNA transcription, turnover, transport and translation. Conversely, factors regulating chromatin and transcription complexes impact the splicing process. This extensive cross-talk between gene regulatory layers takes advantage of dynamic spatial, physical and temporal organizational properties of the cell nucleus, and further emphasizes the importance of developing a multidimensional understanding of splicing control.
With the rapidly declining cost of data generation, and the accumulation of massive datasets, molecular biology is entering an era in which incisive analysis of existing data will play an increasingly prominent role in the discovery of new biological phenomena and the elucidation of molecular mechanisms. Here, we discuss resources of publicly available sequencing data most useful for interrogating the mechanisms of gene expression. Existing next-generation sequence datasets, however, come with significant challenges in the form of technical and bioinformatic artifacts, which we discuss in detail. We also recount several breakthroughs made largely through the analysis of existing data, primarily in the RNA field.
Genetic association studies, pharmacological investigations, and analysis of mice lacking individual genes have made it clear that cocaine administration and withdrawal have a profound impact on multiple neurotransmitter systems. The GABAergic medium spiny neurons of the nucleus accumbens (NAc) exhibit changes in the expression of genes encoding receptors for glutamate and in the signaling pathways triggered by dopamine binding to G-protein coupled dopamine receptors. Deep sequence analysis provides a sensitive, quantitative and global analysis of the effects of cocaine on the NAc transcriptome. RNA prepared from the NAc of adult male mice receiving daily injections of saline or cocaine, or cocaine followed by a period of withdrawal, was used for high-throughput sequence analysis. Changes were validated by qPCR or Western blot. Based on pathway analysis, a preponderance of the genes affected by cocaine and withdrawal were involved in the cadherin, heterotrimeric G-protein, and Wnt signaling pathways. Distinct subsets of cadherins and protocadherins exhibited a sustained increase or decrease in expression. Sustained down-regulation of several heterotrimeric G-protein β- and γ-subunits was observed. In addition to altered expression of receptors for small molecule neurotransmitters, neuropeptides and endocannabinoids, changes in the expression of plasma membrane transporters and vesicular neurotransmitter transporters were also observed. The effects of chronic cocaine and withdrawal on the expression of genes essential to cholinergic, glutamatergic, GABAergic, peptidergic, and endocannabinoid signaling are as profound as their effects on dopaminergic transmission. Simultaneous targeting of multiple withdrawal-specific changes in gene expression may facilitate development of new therapeutic approaches that are better able to prevent relapse.
RNA-Seq; pathway analysis; Wnt/cadherin signaling; heterotrimeric G-protein; glutamate; neuropeptide; acetylcholine; GABA
Anopheles darlingi is the principal neotropical malaria vector, responsible for more than a million cases of malaria per year on the American continent. Anopheles darlingi diverged from the African and Asian malaria vectors ∼100 million years ago (mya) and successfully adapted to the New World environment. Here we present an annotated reference A. darlingi genome, sequenced from a wild population of males and females collected in the Brazilian Amazon. A total of 10 481 predicted protein-coding genes were annotated, 72% of which have their closest counterpart in Anopheles gambiae and 21% have highest similarity with other mosquito species. In spite of a long period of divergent evolution, conserved gene synteny was observed between A. darlingi and A. gambiae. More than 10 million single nucleotide polymorphisms and short indels with potential use as genetic markers were identified. Transposable elements correspond to 2.3% of the A. darlingi genome. Genes associated with hematophagy, immunity and insecticide resistance, directly involved in vector–human and vector–parasite interactions, were identified and discussed. This study represents the first effort to sequence the genome of a neotropical malaria vector, and opens a new window through which we can contemplate the evolutionary history of anopheline mosquitoes. It also provides valuable information that may lead to novel strategies to reduce malaria transmission on the South American continent. The A. darlingi genome is accessible at www.labinfo.lncc.br/index.php/anopheles-darlingi.
CRISPR-Cas systems are RNA-guided immune systems that protect prokaryotes against viruses and other invaders. The CRISPR locus encodes crRNAs that recognize invading nucleic acid sequences and trigger silencing by the associated Cas proteins. There are multiple CRISPR-Cas systems with distinct compositions and mechanistic processes. Thermococcus kodakarensis (Tko) is a hyperthermophilic euryarchaeon that has both a Type I-A Csa and a Type I-B Cst CRISPR-Cas system. We have analyzed the expression and composition of crRNAs from the three CRISPRs in Tko by RNA deep sequencing and northern analysis. Our results indicate that crRNAs associated with these two CRISPR-Cas systems include an 8-nucleotide conserved sequence tag at the 5′ end. We challenged Tko with plasmid invaders containing sequences targeted by endogenous crRNAs and observed active CRISPR-Cas-mediated silencing. Plasmid silencing was dependent on complementarity with a crRNA as well as on a sequence element found immediately adjacent to the crRNA recognition site in the target termed the PAM (protospacer adjacent motif). Silencing occurred independently of the orientation of the target sequence in the plasmid, and appears to occur at the DNA level, presumably via DNA degradation. In addition, we have directed silencing of an invader plasmid by genetically engineering the chromosomal CRISPR locus to express customized crRNAs directed against the plasmid. Our results support CRISPR engineering as a feasible approach to develop prokaryotic strains that are resistant to infection for use in industry.
CRISPR; Cas; archaea; Thermococcus; hyperthermophile; immune; RNA; DNA; silencing; interference
Small RNAs target invaders for silencing in the CRISPR-Cas pathways that protect bacteria and archaea from viruses and plasmids. The CRISPR RNAs (crRNAs) contain sequence elements acquired from invaders that guide CRISPR-associated (Cas) proteins back to the complementary invading DNA or RNA. Here, we have analyzed essential features of the crRNAs associated with the Cas RAMP module (Cmr) effector complex, which cleaves targeted RNAs. We show that Cmr crRNAs contain an 8-nucleotide 5’ sequence tag (also found on crRNAs associated with other CRISPR-Cas pathways) that is critical for crRNA function and can be used to engineer crRNAs that direct cleavage of novel targets. We also present data that indicates that the Cmr complex cleaves an endogenous complementary RNA in Pyrococcus furiosus, providing direct in vivo evidence of RNA targeting by the CRISPR-Cas system. Our findings indicate that the CRISPR RNA-Cmr protein pathway may be exploited to cleave RNAs of interest.
Genomic imprinting occurs when expression of an allele differs based on the sex of the parent that transmitted the allele. In D. melanogaster, imprinting can occur, but its impact on allelic expression genome-wide is unclear. Here, we search for imprinted genes in D. melanogaster using RNA-seq to compare allele-specific expression between pools of 7–10 day old adult female progeny from reciprocal crosses. 119 genes with allelic expression patterns consistent with imprinting were identified and showed significant clustering within the genome. Surprisingly, additional analysis of several of these genes showed that either genomic heterogeneity or high levels of intrinsic noise caused imprinting-like allelic expression. Consequently, our data provide no convincing evidence of imprinting for D. melanogaster genes in their native genomic context. Elucidating sources of false positive signals for imprinting in allele-specific RNA-seq data, as done here, is critical given the growing popularity of this method for identifying imprinted genes.
The collection of components required to carry out the intricate processes involved in generating and maintaining a living, breathing and, sometimes, thinking organism is staggeringly complex. Where do all of the parts come from? Early estimates stated that about 100,000 genes would be required to make up a mammal; however, the actual number is less than one-quarter of that, barely four times the number of genes in budding yeast. It is now clear that the ‘missing’ information is in large part provided by alternative splicing, the process by which multiple different functional messenger RNAs, and therefore proteins, can be synthesized from a single gene.
Alternative splicing is a widespread means of increasing protein diversity and regulating gene expression in eukaryotes. Much progress has been made in understanding the proteins involved in regulating alternative splicing, the sequences they bind to, and how these interactions lead to changes in splicing patterns. However, several recent studies have identified other players involved in regulating alternative splicing. A major theme emerging from these studies is that RNA secondary structures play an under appreciated role in the regulation of alternative splicing. This review provides and overview of the basic aspects of splicing regulation and highlights recent progress in understanding the role of RNA secondary structure in this process.
We analyzed the usage and consequences of alternative cleavage and polyadenylation (APA) in Drosophila melanogaster by using >1 billion reads of stranded mRNA-seq across a variety of dissected tissues. Beyond demonstrating that a majority of fly transcripts are subject to APA, we observed broad trends for 3′ untranslated region (UTR) shortening in the testis and lengthening in the central nervous system (CNS); the latter included hundreds of unannotated extensions ranging up to 18 kb. Extensive northern analyses validated the accumulation of full-length neural extended transcripts, and in situ hybridization indicated their spatial restriction to the CNS. Genes encoding RNA binding proteins (RBPs) and transcription factors were preferentially subject to 3′ UTR extensions. Motif analysis indicated enrichment of miRNA and RBP sites in the neural extensions, and their termini were enriched in canonical cis elements that promote cleavage and polyadenylation. Altogether, we reveal broad tissue-specific patterns of APA in Drosophila and transcripts with unprecedented 3′ UTR length in the nervous system.
Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Lynch syndrome (LS) leads to an increased risk of early-onset colorectal and other types of cancer and is caused by germline mutations in DNA mismatch repair (MMR) genes. Loss of MMR function results in a mutator phenotype that likely underlies its role in tumorigenesis. However, loss of MMR also results in the elimination of a DNA damage-induced checkpoint/apoptosis activation barrier that may allow damaged cells to grow unchecked. A fundamental question is whether loss of MMR provides pre-cancerous stem cells an immediate selective advantage in addition to establishing a mutator phenotype. To test this hypothesis in an in vivo system, we utilized the planarian Schmidtea mediterranea which contains a significant population of identifiable adult stem cells. We identified a planarian homolog of human MSH2, a MMR gene which is mutated in 38% of LS cases. The planarian Smed-msh2 is expressed in stem cells and some progeny. We depleted Smed-msh2 mRNA levels by RNA-interference and found a striking survival advantage in these animals treated with a cytotoxic DNA alkylating agent compared to control animals. We demonstrated that this tolerance to DNA damage is due to the survival of mitotically active, MMR-deficient stem cells. Our results suggest that loss of MMR provides an in vivo survival advantage to the stem cell population in the presence of DNA damage that may have implications for tumorigenesis.
The Down syndrome cell adhesion molecule (Dscam) gene has essential roles in neural wiring and pathogen recognition in Drosophila melanogaster. Dscam encodes 38,016 distinct isoforms via extensive alternative splicing. The 95 alternative exons in Dscam are organized into clusters that are spliced in a mutually exclusive manner. The exon 6 cluster contains 48 variable exons and uses a complex system of competing RNA structures to ensure that only one variable exon is included. Here we show that the heterogeneous nuclear ribonucleoprotein hrp36 acts specifically within, and throughout, the exon 6 cluster to prevent the inclusion of multiple exons. Moreover, hrp36 prevents serine/arginine-rich proteins from promoting the ectopic inclusion of multiple exon 6 variants. Thus, the fidelity of mutually exclusive splicing in the exon 6 cluster is governed by an intricate combination of alternative RNA structures and a globally acting splicing repressor.
RNAs can be physically classified into poly(A)+ or poly(A)- transcripts according to the presence or absence of a poly(A) tail at their 3' ends. Current deep sequencing approaches largely depend on the enrichment of transcripts with a poly(A) tail, and therefore offer little insight into the nature and expression of transcripts that lack poly(A) tails.
We have used deep sequencing to explore the repertoire of both poly(A)+ and poly(A)- RNAs from HeLa cells and H9 human embryonic stem cells (hESCs). Using stringent criteria, we found that while the majority of transcripts are poly(A)+, a significant portion of transcripts are either poly(A)- or bimorphic, being found in both the poly(A)+ and poly(A)- populations. Further analyses revealed that many mRNAs may not contain classical long poly(A) tails and such messages are overrepresented in specific functional categories. In addition, we surprisingly found that a few excised introns accumulate in cells and thus constitute a new class of non-polyadenylated long non-coding RNAs. Finally, we have identified a specific subset of poly(A)- histone mRNAs, including two histone H1 variants, that are expressed in undifferentiated hESCs and are rapidly diminished upon differentiation; further, these same histone genes are induced upon reprogramming of fibroblasts to induced pluripotent stem cells.
We offer a rich source of data that allows a deeper exploration of the poly(A)- landscape of the eukaryotic transcriptome. The approach we present here also applies to the analysis of the poly(A)- transcriptomes of other organisms.
Alternative splicing is typically thought to be controlled by RNA binding proteins that modulate the activity of the spliceosome. A new study not only demonstrates that alternative splicing can be regulated without the involvement of auxiliary splicing factors, but also provides mechanistic insight into how this can occur.
In this issue of Molecular Cell, Schwer (2008) demonstrates that during the latest stage of the splicing reaction the RNA-dependent helicase Prp22 is deposited upon the downstream exon where it subsequently strips the spliced messenger RNA from the spliceosome.
A new study reveals that extracellular signals can activate a signal-transduction cascade that simultaneously alters alternative splicing and translation of the same target. These concerted efforts probably serve to increase the speed and strength of the cellular response to changes in the extracellular environment.
The Drosophila fruitless (fru) gene encodes a transcription factor that essentially regulates all aspects of male courtship behavior. The use of alternative 5′-splice sites generates fru isoforms that determine gender-appropriate sexual behaviors. Alternative splicing of fru is regulated by TRA and TRA2 and depends on an exonic splicing enhancer (fruRE) consisting of three 13-nucleotide repeat elements, nearly identical to those that regulate alternative sex-specific 3′-splice site choice in the doublesex (dsx) gene. dsx has provided a useful model system to investigate the mechanisms of enhancer-dependent 3′-splice site choice. However, little is known about enhancer-dependent regulation of alternative 5′-splice sites. The mechanisms of this process were investigated using an in vitro system in which recombinant TRA/TRA2 could activate the female-specific 5′-splice site of fru. Mutational analysis demonstrated that one 13-nucleotide repeat element within the fruRE is required and sufficient to activate the regulated female-specific splice site. As was established for dsx, the fruRE can be replaced by a short element encompassing tandem 13-nucleotide repeat elements, by heterologous splicing enhancers, and by artificially tethering a splicing activator to the pre-mRNA. Complementation experiments showed that Ser/Arg-rich proteins facilitate enhancer-dependent 5′-splice site activation. We conclude that splicing enhancers function similarly in activating regulated 5′- and 3′-splice sites. These results suggest that exonic splicing enhancers recruit multiple spliceosomal components required for the initial recognition of 5′- and 3′-splice sites.
A new study in this issue of Molecular Cell (Pleiss et al., 2007b) shows that changes in the environment rapidly alter the splicing efficiency of specific pre-mRNAs in yeast.
RNA interference (RNAi) is a useful tool for degrading targeted messenger RNAs (mRNAs) and thus “knocking down” the abundance of the encoded protein. We have been using RNAi in cultured Drosophila cells to evaluate the effect of “knocking down” numerous mRNA processing factors on the alternative splicing of specific pre-mRNAs. This relatively simple technique has allowed us to identify a number of splicing factors that impact the alternative splicing of particular alternatively spliced exons. This approach can be extended to examine the splicing of nearly any gene.
RNA interference (RNAi); Drosophila melanogaster; Schneider (S2) cells; knock-down
Single-strand conformational polymorphism analysis has been used successfully to identify single nucleotide changes within sequences based on the fact that multidetection enhancement gels will separate molecules based on their conformation rather than their size. We have expanded the utility of this technique to analyze easily the alternative splicing of pre-mRNAs containing multiple mutually exclusive exons of the same size. We have used this technique to study the Caenorhabditis elegans let-2 gene containing two alternative exons and the Drosophilia melanogaster Dscam gene, which contains 12 mutually exclusive exons. The ease and the quantitative nature of this technique should be very useful.
Alternative splicing; single-strand conformational polymorphism (SSCP); exons
Numerous inherited human genetic disorders are caused by defects in pre-mRNA splicing. Two recent studies have added a new twist to the link between genetic variation and pre-mRNA splicing by identifying SNPs that correlate with heritable changes in alternative splicing but do not cause disease. This suggests that allele-specific alternative splicing is a mechanism that accounts for individual variation in the human population.
RNA interference (RNAi) is becoming a popular method for analyzing gene function in a variety of biological processes. We have used RNAi in cultured Drosophila cells to identify trans-acting factors that regulate the alternative splicing of endogenously transcribed pre-mRNAs. We have generated a dsRNA library comprising ~70% of the Drosophila genes encoding RNA binding proteins and assessed the function of each protein in the regulation of alternative splicing. This approach not only identiWes trans-acting factors regulating specific alternative splicing events, but also can provide insight into the alternative splicing regulatory networks of Drosophila. Here, we describe this RNAi approach to identify alternative splicing regulatory proteins in detail.
Alternative splicing; RNA interference; Drosophila
Drosophila Dscam encodes 38,016 distinct axon guidance receptors through the mutually exclusive alternative splicing of 95 variable exons. Importantly, known mechanisms that ensure the mutually exclusive splicing of pairs of exons cannot explain this phenomenon in Dscam. I have identified two classes of conserved elements in the Dscam exon 6 cluster, which contains 48 alternative exons—the docking site, located in the intron downstream of constitutive exon 5, and the selector sequences, which are located upstream of each exon 6 variant. Strikingly, each selector sequence is complementary to a portion of the docking site, and this pairing juxtaposes one, and only one, alternative exon to the upstream constitutive exon. The mutually exclusive nature of the docking site:selector sequence interactions suggests that the formation of these competing RNA structures is a central component of the mechanism guaranteeing that only one exon 6 variant is included in each Dscam mRNA.