Precursor mRNA splicing is one of the most highly regulated processes in metazoan species. In addition to generating vast repertoires of RNAs and proteins, splicing has a profound impact on other gene regulatory layers, including mRNA transcription, turnover, transport and translation. Conversely, factors regulating chromatin and transcription complexes impact the splicing process. This extensive cross-talk between gene regulatory layers takes advantage of dynamic spatial, physical and temporal organizational properties of the cell nucleus, and further emphasizes the importance of developing a multidimensional understanding of splicing control.
In this issue of Molecular Cell, Zhang and colleagues (2013) identify a new class of intron-derived circular RNAs (ciRNAs) and show that they have the potential to enhance transcription of their host gene.
Duplications of the chromosome 15q11-q13.1 region are associated with an estimated 1 to 3% of all autism cases, making this copy number variation (CNV) one of the most frequent chromosome abnormalities associated with autism spectrum disorder (ASD). Several genes located within the 15q11-q13.1 duplication region including ubiquitin protein ligase E3A (UBE3A), the gene disrupted in Angelman syndrome (AS), are involved in neural function and may play important roles in the neurobehavioral phenotypes associated with chromosome 15q11-q13.1 duplication (Dup15q) syndrome.
We have generated induced pluripotent stem cell (iPSC) lines from five different individuals containing CNVs of 15q11-q13.1. The iPSC lines were differentiated into mature, functional neurons. Gene expression across the 15q11-q13.1 locus was compared among the five iPSC lines and corresponding iPSC-derived neurons using quantitative reverse transcription PCR (qRT-PCR). Genome-wide gene expression was compared between neurons derived from three iPSC lines using mRNA-Seq.
Analysis of 15q11-q13.1 gene expression in neurons derived from Dup15q iPSCs reveals that gene copy number does not consistently predict expression levels in cells with interstitial duplications of 15q11-q13.1. mRNA-Seq experiments show that there is substantial overlap in the genes differentially expressed between 15q11-q13.1 deletion and duplication neurons, Finally, we demonstrate that UBE3A transcripts can be pharmacologically rescued to normal levels in iPSC-derived neurons with a 15q11-q13.1 duplication.
Chromatin structure may influence gene expression across the 15q11-q13.1 region in neurons. Genome-wide analyses suggest that common neuronal pathways may be disrupted in both the Angelman and Dup15q syndromes. These data demonstrate that our disease-specific stem cell models provide a new tool to decipher the underlying cellular and genetic disease mechanisms of ASD and may also offer a pathway to novel therapeutic intervention in Dup15q syndrome.
UBE3A; autism; induced pluripotent stem cells; 15q duplication; Angelman syndrome
The Drosophila Dscam1 gene encodes a vast number of cell recognition molecules through alternative splicing. These exhibit isoform-specific homophilic binding and regulate self-avoidance, the tendency of neurites from the same cell to repel one another. Genetic experiments indicate that different cells must express different isoforms. How this is achieved is not known, as the expression of alternative exons in vivo has not been shown. Here, we modified the endogenous Dscam1 locus to generate splicing reporters for all variants of exon 4. We demonstrate that splicing does not occur in a cell-type specific fashion, that cells identified by their unique locations express different exon 4 variants in different animals, and that splicing in identified neurons can change over time. Probabilistic expression is compatible with a widespread role in neural circuit assembly through self-avoidance and is incompatible with models in which specific isoforms of Dscam1 mediate recognition between processes of different cells.
Sixty years after Watson and Crick published the double helix model of DNA's structure, thirteen members of Genome Biology's Editorial Board select key advances in the field of genome biology subsequent to that discovery.
With the rapidly declining cost of data generation, and the accumulation of massive datasets, molecular biology is entering an era in which incisive analysis of existing data will play an increasingly prominent role in the discovery of new biological phenomena and the elucidation of molecular mechanisms. Here, we discuss resources of publicly available sequencing data most useful for interrogating the mechanisms of gene expression. Existing next-generation sequence datasets, however, come with significant challenges in the form of technical and bioinformatic artifacts, which we discuss in detail. We also recount several breakthroughs made largely through the analysis of existing data, primarily in the RNA field.
Genetic association studies, pharmacological investigations, and analysis of mice lacking individual genes have made it clear that cocaine administration and withdrawal have a profound impact on multiple neurotransmitter systems. The GABAergic medium spiny neurons of the nucleus accumbens (NAc) exhibit changes in the expression of genes encoding receptors for glutamate and in the signaling pathways triggered by dopamine binding to G-protein coupled dopamine receptors. Deep sequence analysis provides a sensitive, quantitative and global analysis of the effects of cocaine on the NAc transcriptome. RNA prepared from the NAc of adult male mice receiving daily injections of saline or cocaine, or cocaine followed by a period of withdrawal, was used for high-throughput sequence analysis. Changes were validated by qPCR or Western blot. Based on pathway analysis, a preponderance of the genes affected by cocaine and withdrawal were involved in the cadherin, heterotrimeric G-protein, and Wnt signaling pathways. Distinct subsets of cadherins and protocadherins exhibited a sustained increase or decrease in expression. Sustained down-regulation of several heterotrimeric G-protein β- and γ-subunits was observed. In addition to altered expression of receptors for small molecule neurotransmitters, neuropeptides and endocannabinoids, changes in the expression of plasma membrane transporters and vesicular neurotransmitter transporters were also observed. The effects of chronic cocaine and withdrawal on the expression of genes essential to cholinergic, glutamatergic, GABAergic, peptidergic, and endocannabinoid signaling are as profound as their effects on dopaminergic transmission. Simultaneous targeting of multiple withdrawal-specific changes in gene expression may facilitate development of new therapeutic approaches that are better able to prevent relapse.
RNA-Seq; pathway analysis; Wnt/cadherin signaling; heterotrimeric G-protein; glutamate; neuropeptide; acetylcholine; GABA
Anopheles darlingi is the principal neotropical malaria vector, responsible for more than a million cases of malaria per year on the American continent. Anopheles darlingi diverged from the African and Asian malaria vectors ∼100 million years ago (mya) and successfully adapted to the New World environment. Here we present an annotated reference A. darlingi genome, sequenced from a wild population of males and females collected in the Brazilian Amazon. A total of 10 481 predicted protein-coding genes were annotated, 72% of which have their closest counterpart in Anopheles gambiae and 21% have highest similarity with other mosquito species. In spite of a long period of divergent evolution, conserved gene synteny was observed between A. darlingi and A. gambiae. More than 10 million single nucleotide polymorphisms and short indels with potential use as genetic markers were identified. Transposable elements correspond to 2.3% of the A. darlingi genome. Genes associated with hematophagy, immunity and insecticide resistance, directly involved in vector–human and vector–parasite interactions, were identified and discussed. This study represents the first effort to sequence the genome of a neotropical malaria vector, and opens a new window through which we can contemplate the evolutionary history of anopheline mosquitoes. It also provides valuable information that may lead to novel strategies to reduce malaria transmission on the South American continent. The A. darlingi genome is accessible at www.labinfo.lncc.br/index.php/anopheles-darlingi.
CRISPR-Cas systems are RNA-guided immune systems that protect prokaryotes against viruses and other invaders. The CRISPR locus encodes crRNAs that recognize invading nucleic acid sequences and trigger silencing by the associated Cas proteins. There are multiple CRISPR-Cas systems with distinct compositions and mechanistic processes. Thermococcus kodakarensis (Tko) is a hyperthermophilic euryarchaeon that has both a Type I-A Csa and a Type I-B Cst CRISPR-Cas system. We have analyzed the expression and composition of crRNAs from the three CRISPRs in Tko by RNA deep sequencing and northern analysis. Our results indicate that crRNAs associated with these two CRISPR-Cas systems include an 8-nucleotide conserved sequence tag at the 5′ end. We challenged Tko with plasmid invaders containing sequences targeted by endogenous crRNAs and observed active CRISPR-Cas-mediated silencing. Plasmid silencing was dependent on complementarity with a crRNA as well as on a sequence element found immediately adjacent to the crRNA recognition site in the target termed the PAM (protospacer adjacent motif). Silencing occurred independently of the orientation of the target sequence in the plasmid, and appears to occur at the DNA level, presumably via DNA degradation. In addition, we have directed silencing of an invader plasmid by genetically engineering the chromosomal CRISPR locus to express customized crRNAs directed against the plasmid. Our results support CRISPR engineering as a feasible approach to develop prokaryotic strains that are resistant to infection for use in industry.
CRISPR; Cas; archaea; Thermococcus; hyperthermophile; immune; RNA; DNA; silencing; interference
Small RNAs target invaders for silencing in the CRISPR-Cas pathways that protect bacteria and archaea from viruses and plasmids. The CRISPR RNAs (crRNAs) contain sequence elements acquired from invaders that guide CRISPR-associated (Cas) proteins back to the complementary invading DNA or RNA. Here, we have analyzed essential features of the crRNAs associated with the Cas RAMP module (Cmr) effector complex, which cleaves targeted RNAs. We show that Cmr crRNAs contain an 8-nucleotide 5’ sequence tag (also found on crRNAs associated with other CRISPR-Cas pathways) that is critical for crRNA function and can be used to engineer crRNAs that direct cleavage of novel targets. We also present data that indicates that the Cmr complex cleaves an endogenous complementary RNA in Pyrococcus furiosus, providing direct in vivo evidence of RNA targeting by the CRISPR-Cas system. Our findings indicate that the CRISPR RNA-Cmr protein pathway may be exploited to cleave RNAs of interest.
Genomic imprinting occurs when expression of an allele differs based on the sex of the parent that transmitted the allele. In D. melanogaster, imprinting can occur, but its impact on allelic expression genome-wide is unclear. Here, we search for imprinted genes in D. melanogaster using RNA-seq to compare allele-specific expression between pools of 7–10 day old adult female progeny from reciprocal crosses. 119 genes with allelic expression patterns consistent with imprinting were identified and showed significant clustering within the genome. Surprisingly, additional analysis of several of these genes showed that either genomic heterogeneity or high levels of intrinsic noise caused imprinting-like allelic expression. Consequently, our data provide no convincing evidence of imprinting for D. melanogaster genes in their native genomic context. Elucidating sources of false positive signals for imprinting in allele-specific RNA-seq data, as done here, is critical given the growing popularity of this method for identifying imprinted genes.
The collection of components required to carry out the intricate processes involved in generating and maintaining a living, breathing and, sometimes, thinking organism is staggeringly complex. Where do all of the parts come from? Early estimates stated that about 100,000 genes would be required to make up a mammal; however, the actual number is less than one-quarter of that, barely four times the number of genes in budding yeast. It is now clear that the ‘missing’ information is in large part provided by alternative splicing, the process by which multiple different functional messenger RNAs, and therefore proteins, can be synthesized from a single gene.
Alternative splicing is a widespread means of increasing protein diversity and regulating gene expression in eukaryotes. Much progress has been made in understanding the proteins involved in regulating alternative splicing, the sequences they bind to, and how these interactions lead to changes in splicing patterns. However, several recent studies have identified other players involved in regulating alternative splicing. A major theme emerging from these studies is that RNA secondary structures play an under appreciated role in the regulation of alternative splicing. This review provides and overview of the basic aspects of splicing regulation and highlights recent progress in understanding the role of RNA secondary structure in this process.
We analyzed the usage and consequences of alternative cleavage and polyadenylation (APA) in Drosophila melanogaster by using >1 billion reads of stranded mRNA-seq across a variety of dissected tissues. Beyond demonstrating that a majority of fly transcripts are subject to APA, we observed broad trends for 3′ untranslated region (UTR) shortening in the testis and lengthening in the central nervous system (CNS); the latter included hundreds of unannotated extensions ranging up to 18 kb. Extensive northern analyses validated the accumulation of full-length neural extended transcripts, and in situ hybridization indicated their spatial restriction to the CNS. Genes encoding RNA binding proteins (RBPs) and transcription factors were preferentially subject to 3′ UTR extensions. Motif analysis indicated enrichment of miRNA and RBP sites in the neural extensions, and their termini were enriched in canonical cis elements that promote cleavage and polyadenylation. Altogether, we reveal broad tissue-specific patterns of APA in Drosophila and transcripts with unprecedented 3′ UTR length in the nervous system.
The planarian Schmidtea mediterranea is a powerful model organism for studying stem cell biology due to its extraordinary regenerative ability mediated by neoblasts, a population of adult somatic stem cells. Elucidation of the S. mediterranea transcriptome and the dynamics of transcript expression will increase our understanding of the gene regulatory programs that regulate stem cell function and differentiation. Here, we have used RNA-Seq to characterize the S. mediterranea transcriptome in sexual and asexual animals and in purified neoblast and differentiated cell populations. Our analysis identified many uncharacterized genes, transcripts, and alternatively spliced isoforms that are differentially expressed in a strain or cell type-specific manner. Transcriptome profiling of purified neoblasts and differentiated cells identified neoblast-enriched transcripts, many of which likely play important roles in regeneration and stem cell function. Strikingly, many of the neoblast-enriched genes are orthologs of genes whose expression is enriched in human embryonic stem cells, suggesting that a core set of genes that regulate stem cell function are conserved across metazoan species.
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
Drosophila melanogaster is one of the most well studied genetic model organisms, nonetheless its genome still contains unannotated coding and non-coding genes, transcripts, exons, and RNA editing sites. Full discovery and annotation are prerequisites for understanding how the regulation of transcription, splicing, and RNA editing directs development of this complex organism. We used RNA-Seq, tiling microarrays, and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events and inferred protein isoforms that previously eluded discovery using established experimental, prediction and conservation-based approaches. Together, these data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development.
Lynch syndrome (LS) leads to an increased risk of early-onset colorectal and other types of cancer and is caused by germline mutations in DNA mismatch repair (MMR) genes. Loss of MMR function results in a mutator phenotype that likely underlies its role in tumorigenesis. However, loss of MMR also results in the elimination of a DNA damage-induced checkpoint/apoptosis activation barrier that may allow damaged cells to grow unchecked. A fundamental question is whether loss of MMR provides pre-cancerous stem cells an immediate selective advantage in addition to establishing a mutator phenotype. To test this hypothesis in an in vivo system, we utilized the planarian Schmidtea mediterranea which contains a significant population of identifiable adult stem cells. We identified a planarian homolog of human MSH2, a MMR gene which is mutated in 38% of LS cases. The planarian Smed-msh2 is expressed in stem cells and some progeny. We depleted Smed-msh2 mRNA levels by RNA-interference and found a striking survival advantage in these animals treated with a cytotoxic DNA alkylating agent compared to control animals. We demonstrated that this tolerance to DNA damage is due to the survival of mitotically active, MMR-deficient stem cells. Our results suggest that loss of MMR provides an in vivo survival advantage to the stem cell population in the presence of DNA damage that may have implications for tumorigenesis.
The Down syndrome cell adhesion molecule (Dscam) gene has essential roles in neural wiring and pathogen recognition in Drosophila melanogaster. Dscam encodes 38,016 distinct isoforms via extensive alternative splicing. The 95 alternative exons in Dscam are organized into clusters that are spliced in a mutually exclusive manner. The exon 6 cluster contains 48 variable exons and uses a complex system of competing RNA structures to ensure that only one variable exon is included. Here we show that the heterogeneous nuclear ribonucleoprotein hrp36 acts specifically within, and throughout, the exon 6 cluster to prevent the inclusion of multiple exons. Moreover, hrp36 prevents serine/arginine-rich proteins from promoting the ectopic inclusion of multiple exon 6 variants. Thus, the fidelity of mutually exclusive splicing in the exon 6 cluster is governed by an intricate combination of alternative RNA structures and a globally acting splicing repressor.
RNAs can be physically classified into poly(A)+ or poly(A)- transcripts according to the presence or absence of a poly(A) tail at their 3' ends. Current deep sequencing approaches largely depend on the enrichment of transcripts with a poly(A) tail, and therefore offer little insight into the nature and expression of transcripts that lack poly(A) tails.
We have used deep sequencing to explore the repertoire of both poly(A)+ and poly(A)- RNAs from HeLa cells and H9 human embryonic stem cells (hESCs). Using stringent criteria, we found that while the majority of transcripts are poly(A)+, a significant portion of transcripts are either poly(A)- or bimorphic, being found in both the poly(A)+ and poly(A)- populations. Further analyses revealed that many mRNAs may not contain classical long poly(A) tails and such messages are overrepresented in specific functional categories. In addition, we surprisingly found that a few excised introns accumulate in cells and thus constitute a new class of non-polyadenylated long non-coding RNAs. Finally, we have identified a specific subset of poly(A)- histone mRNAs, including two histone H1 variants, that are expressed in undifferentiated hESCs and are rapidly diminished upon differentiation; further, these same histone genes are induced upon reprogramming of fibroblasts to induced pluripotent stem cells.
We offer a rich source of data that allows a deeper exploration of the poly(A)- landscape of the eukaryotic transcriptome. The approach we present here also applies to the analysis of the poly(A)- transcriptomes of other organisms.
Compelling evidence indicates that the CRISPR-Cas system protects prokaryotes from viruses and other potential genome invaders. This adaptive prokaryotic immune system arises from the clustered regularly interspaced short palindromic repeats (CRISPRs) found in prokaryotic genomes, which harbor short invader-derived sequences, and the CRISPR-associated (Cas) protein-coding genes. Here we have identified a CRISPR-Cas effector complex that is comprised of small invader-targeting RNAs from the CRISPR loci (termed prokaryotic silencing (psi)RNAs) and the RAMP module (or Cmr) Cas proteins. The psiRNA-Cmr protein complexes cleave complementary target RNAs at a fixed distance from the 3' end of the integral psiRNAs. In Pyrococcus furiosus, psiRNAs occur in two size forms that share a common 5' sequence tag but have distinct 3' ends that direct cleavage of a given target RNA at two distinct sites. Our results indicate that prokaryotes possess a unique RNA silencing system that functions by homology-dependent cleavage of invader RNAs.
Alternative splicing is typically thought to be controlled by RNA binding proteins that modulate the activity of the spliceosome. A new study not only demonstrates that alternative splicing can be regulated without the involvement of auxiliary splicing factors, but also provides mechanistic insight into how this can occur.
In this issue of Molecular Cell, Schwer (2008) demonstrates that during the latest stage of the splicing reaction the RNA-dependent helicase Prp22 is deposited upon the downstream exon where it subsequently strips the spliced messenger RNA from the spliceosome.
A new study reveals that extracellular signals can activate a signal-transduction cascade that simultaneously alters alternative splicing and translation of the same target. These concerted efforts probably serve to increase the speed and strength of the cellular response to changes in the extracellular environment.
The Drosophila fruitless (fru) gene encodes a transcription factor that essentially regulates all aspects of male courtship behavior. The use of alternative 5′-splice sites generates fru isoforms that determine gender-appropriate sexual behaviors. Alternative splicing of fru is regulated by TRA and TRA2 and depends on an exonic splicing enhancer (fruRE) consisting of three 13-nucleotide repeat elements, nearly identical to those that regulate alternative sex-specific 3′-splice site choice in the doublesex (dsx) gene. dsx has provided a useful model system to investigate the mechanisms of enhancer-dependent 3′-splice site choice. However, little is known about enhancer-dependent regulation of alternative 5′-splice sites. The mechanisms of this process were investigated using an in vitro system in which recombinant TRA/TRA2 could activate the female-specific 5′-splice site of fru. Mutational analysis demonstrated that one 13-nucleotide repeat element within the fruRE is required and sufficient to activate the regulated female-specific splice site. As was established for dsx, the fruRE can be replaced by a short element encompassing tandem 13-nucleotide repeat elements, by heterologous splicing enhancers, and by artificially tethering a splicing activator to the pre-mRNA. Complementation experiments showed that Ser/Arg-rich proteins facilitate enhancer-dependent 5′-splice site activation. We conclude that splicing enhancers function similarly in activating regulated 5′- and 3′-splice sites. These results suggest that exonic splicing enhancers recruit multiple spliceosomal components required for the initial recognition of 5′- and 3′-splice sites.