Search tips
Search criteria

Results 1-25 (46)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Assessing long-distance RNA sequence connectivity via RNA-templated DNA–DNA ligation 
eLife  null;4:e03700.
Many RNAs, including pre-mRNAs and long non-coding RNAs, can be thousands of nucleotides long and undergo complex post-transcriptional processing. Multiple sites of alternative splicing within a single gene exponentially increase the number of possible spliced isoforms, with most human genes currently estimated to express at least ten. To understand the mechanisms underlying these complex isoform expression patterns, methods are needed that faithfully maintain long-range exon connectivity information in individual RNA molecules. In this study, we describe SeqZip, a methodology that uses RNA-templated DNA–DNA ligation to retain and compress connectivity between distant sequences within single RNA molecules. Using this assay, we test proposed coordination between distant sites of alternative exon utilization in mouse Fn1, and we characterize the extraordinary exon diversity of Drosophila melanogaster Dscam1.
eLife digest
A flow chart can show how an outcome can be achieved from a particular start point by breaking down an activity into a list of possible steps. Often, a flow chart contains several alternative steps, not all of which are taken every time the flow chart is used. The same can be said of genes, which are biological instructions that often contain many options within their DNA sequences.
Proteins—which perform many roles in cells—are built following the instructions contained in genes. First, the DNA sequence of the gene is copied. This produces a molecule of ribonucleic acid (RNA), which is able to move around the cell to find the machinery that can use the genetic information to make a protein. Genes and their RNA copies contain instructions with more steps—called exons—than are necessary to make a working protein, so extra exons are removed (‘spliced’) from the RNA copies. Different combinations of exons can be removed, so splicing can make different versions of the RNA called isoforms. These allow a single gene to build many different proteins. In fruit flies, for example, the different exons of the gene Dscam1 can be spliced into one of 38,016 unique RNA isoforms.
Current technology only allows researchers to deduce the sequence of RNA molecules by combining sequences recorded from short fragments of the molecule. However, before splicing, RNA molecules tend to be much longer than this, so this restricts our understanding of the RNA isoforms found in cells. Here, Roy et al. devised and tested a new method called SeqZip to solve this problem.
SeqZip uses short fragments of DNA called ligamers that can only stick to the sections of RNA that will remain after the molecule has been spliced. After splicing, the ligamers can be stuck together to make a DNA replica of the spliced RNA. The end product is at least 49 times shorter than the original RNA, so it is easier to sequence. In addition, the combinations of the ligamers in the DNA replica show which exons of a specific gene are kept and which ones are spliced out.
To test the method, Roy et al. studied a mouse gene that has six RNA isoforms. SeqZip reduced the length of the RNA by five times and made it possible to measure how frequently the different isoforms naturally arise. Roy et al. also used SeqZip to work out which isoforms of the Dscam1 gene are used at different stages in the life of fruit fly larvae. SeqZip can provide insights into how complex organisms like flies, mice, and humans have evolved with relatively few—a little over 20,000—genes in their genomes.
PMCID: PMC4442144  PMID: 25866926
ligation; Dscam1; RNA-templated; isoform; alternative splicing; fibronectin; D. melanogaster; mouse
2.  The majority of transcripts in the squid nervous system are extensively recoded by A-to-I RNA editing 
eLife  null;4:e05198.
RNA editing by adenosine deamination alters genetic information from the genomic blueprint. When it recodes mRNAs, it gives organisms the option to express diverse, functionally distinct, protein isoforms. All eumetazoans, from cnidarians to humans, express RNA editing enzymes. However, transcriptome-wide screens have only uncovered about 25 transcripts harboring conserved recoding RNA editing sites in mammals and several hundred recoding sites in Drosophila. These studies on few established models have led to the general assumption that recoding by RNA editing is extremely rare. Here we employ a novel bioinformatic approach with extensive validation to show that the squid Doryteuthis pealeii recodes proteins by RNA editing to an unprecedented extent. We identify 57,108 recoding sites in the nervous system, affecting the majority of the proteins studied. Recoding is tissue-dependent, and enriched in genes with neuronal and cytoskeletal functions, suggesting it plays an important role in brain physiology.
eLife digest
For living cells to create a protein, a genetic code found in its DNA must first be ‘transcribed’ to create a corresponding molecule of messenger RNA (mRNA). DNA and RNA are both made from smaller molecules called nucleotides that are linked together into long chains; the information in both DNA and RNA is contained in the sequence of these molecules. The mRNA nucleotides coding for proteins are ‘translated’ in groups of three, and most of these nucleotide triplets instruct for a specific amino acid to be added to the newly forming protein.
DNA sequences were thought to exactly correspond with the sequence of amino acids in the resulting protein. However, it is now known that processes called RNA editing can change the nucleotide sequence of the mRNA molecules after they have been transcribed from the DNA. One such editing process, called A-to-I editing, alters the ‘A’ nucleotide so that the translation machinery reads it as a ‘G’ nucleotide instead. In some—but not all—cases, this event will change, or ‘recode’, the amino acid encoded by this stretch of mRNA, which may change how the protein behaves. This ability to create a range of proteins from a single DNA sequence could help organisms to evolve new traits.
Evidence of amino acid recoding has only been found to a very limited extent in the few species investigated so far. There has been some evidence that suggests that recoding might occur more often, and alter more proteins, in squids and octopuses. However, this could not be confirmed as the genomes of these species have not been sequenced, and these sequences were required to investigate RNA recoding using existing techniques.
Alon et al. have now developed a new approach that allows the recoding sites to be identified in organisms whose genomes have not been sequenced. Using this technique—which compares mRNA sequences with the DNA sequence they have been transcribed from—to examine the squid nervous system revealed over 57,000 recoding sites where an ‘A’ nucleotide had been modified to ‘G’ and thereby changed the coded amino acid. Many of the identified mRNA molecules had been recoded in more than one place, and many more of these than expected changed the amino acid sequence of the protein translated from them. Alon et al. therefore suggest that RNA editing may have been crucial in the evolution of the squid's nervous system, and suggest that recoding should be considered a normal part of the process used by squids to make proteins.
PMCID: PMC4384741  PMID: 25569156
Doryteuthis pealeii; recoding; RNA editing; other
3.  Dynamic integration of splicing within gene regulatory pathways 
Cell  2013;152(6):1252-1269.
Precursor mRNA splicing is one of the most highly regulated processes in metazoan species. In addition to generating vast repertoires of RNAs and proteins, splicing has a profound impact on other gene regulatory layers, including mRNA transcription, turnover, transport and translation. Conversely, factors regulating chromatin and transcription complexes impact the splicing process. This extensive cross-talk between gene regulatory layers takes advantage of dynamic spatial, physical and temporal organizational properties of the cell nucleus, and further emphasizes the importance of developing a multidimensional understanding of splicing control.
PMCID: PMC3642998  PMID: 23498935
4.  Genomewide analysis of Drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation 
Cell reports  2014;9(5):1966-1980.
Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues and cultured cells, to rigorously annotate >2500 fruitfly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1000 well-conserved canonical miRNA seed matches, especially within coding regions, and coding conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs, and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase dramatically relative to linear isoforms during CNS aging, and constitute a novel aging biomarker.
PMCID: PMC4279448  PMID: 25544350
5.  Genome-wide Identification of Zero Nucleotide Recursive Splicing in Drosophila 
Nature  2015;521(7552):376-379.
Recursive splicing is a process in which large introns are removed in multiple steps by resplicing at ratchet points - 5′ splice sites recreated after splicing1. Recursive splicing was first identified in the Drosophila Ultrabithorax (Ubx) gene1 and only three additional Drosophila genes have since been experimentally shown to undergo recursive splicing2,3. Here, we identify 197 zero nucleotide exon ratchet points in 130 introns of 115 Drosophila genes from total RNA sequencing data generated from developmental time points, dissected tissues, and cultured cells. The sequential nature of recursive splicing was confirmed by identification of lariat introns generated by splicing to and from the ratchet points. We also show that recursive splicing is a constitutive process, that depletion of U2AF inhibits recursive splicing, and that the sequence and function of ratchet points are evolutionarily conserved in Drosophila. Finally, we identified four recursively spliced human genes, one of which is also recursively spliced in Drosophila. Together these results indicate that recursive splicing is commonly used in Drosophila, occurs in human and provides insight into the mechanisms by which some large introns are removed.
PMCID: PMC4529404  PMID: 25970244
6.  Determining exon connectivity in complex mRNAs by nanopore sequencing 
Genome Biology  2015;16:204.
Short-read high-throughput RNA sequencing, though powerful, is limited in its ability to directly measure exon connectivity in mRNAs that contain multiple alternative exons located farther apart than the maximum read length. Here, we use the Oxford Nanopore MinION sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be a powerful method for comprehensive transcriptome characterization.
PMCID: PMC4588896  PMID: 26420219
7.  The three major types of CRISPR-Cas systems function independently in CRISPR RNA biogenesis in Streptococcus thermophilus 
Molecular microbiology  2014;93(1):98-112.
CRISPR-Cas systems are small RNA-based immune systems that protect prokaryotes from invaders such as viruses and plasmids. We have investigated the features and biogenesis of the CRISPR (cr)RNAs in Streptococcus thermophilus (Sth) strain DGCC7710, which possesses four different CRISPR-Cas systems including representatives from the three major types of CRISPR-Cas systems. Our results indicate that the crRNAs from each CRISPR locus are specifically processed into divergent crRNA species by Cas proteins (and non-coding RNAs) associated with the respective locus. We find that the Csm Type III-A and Cse Type I-E crRNAs are specifically processed by Cas6 and Cse3 (Cas6e), respectively, and retain an 8-nucleotide CRISPR repeat sequence tag 5′ of the invader-targeting sequence. The Cse Type I-E crRNAs also retain a 21-nucleotide 3′ repeat tag. The crRNAs from the two Csn Type II-A systems in Sth consist of a 5′-truncated targeting sequence and a 3′ tag; however these are distinct in size between the two. Moreover, the Csn1 (Cas9) protein associated with one Csn locus functions specifically in the production of crRNAs from that locus. Our findings indicate that multiple CRISPR-Cas systems can function independently in crRNA biogenesis within a given organism – an important consideration in engineering co-existing CRISPR-Cas pathways.
PMCID: PMC4095994  PMID: 24811454
CRISPR RNA biogenesis; Cas6; Cas9; tracrRNA; Streptococcus thermophilus
8.  Complex Alternative Splicing 
Alternative splicing is a powerful means of controlling gene expression and increasing protein diversity. Most genes express a limited number of mRNA isoforms, but there are several examples of genes that use alternative splicing to generate hundreds, thousands, and even tens of thousands of isoforms. Collectively such genes are considered to undergo complex alternative splicing. The best example is the Drosophila Down syndrome cell adhesion molecule (Dscam) gene, which can generate 38,016 isoforms by the alternative splicing of 95 variable exons. In this review, we will describe several genes that use complex alternative splicing to generate large repertoires of mRNAs and what is known about the mechanisms by which they do so.
PMCID: PMC4387867  PMID: 18380340
9.  Molecular Biology 
Nature  2008;453(7199):1197-1198.
Advances in DNA-sequencing technology provide unprecedented insight into the entire collection of four genomes' transcribed sequences; they herald a new era in the study of gene regulation and genome function.
PMCID: PMC4386836  PMID: 18580940
10.  Diversity and dynamics of the Drosophila transcriptome 
Nature  2014;512(7515):393-399.
Animal transcriptomes are dynamic, each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. We identified new genes, transcripts, and proteins using poly(A)+ RNA sequence from Drosophila melanogaster cultured cell lines, dissected organ systems, and environmental perturbations. We found a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long noncoding RNAs (lncRNAs) some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized arising from combinatorial usage of promoters, splice sites, and polyadenylation sites.
PMCID: PMC4152413  PMID: 24670639
11.  Comparative Analysis of the Transcriptome across Distant Species 
Gerstein, Mark B. | Rozowsky, Joel | Yan, Koon-Kiu | Wang, Daifeng | Cheng, Chao | Brown, James B. | Davis, Carrie A | Hillier, LaDeana | Sisu, Cristina | Li, Jingyi Jessica | Pei, Baikang | Harmanci, Arif O. | Duff, Michael O. | Djebali, Sarah | Alexander, Roger P. | Alver, Burak H. | Auerbach, Raymond | Bell, Kimberly | Bickel, Peter J. | Boeck, Max E. | Boley, Nathan P. | Booth, Benjamin W. | Cherbas, Lucy | Cherbas, Peter | Di, Chao | Dobin, Alex | Drenkow, Jorg | Ewing, Brent | Fang, Gang | Fastuca, Megan | Feingold, Elise A. | Frankish, Adam | Gao, Guanjun | Good, Peter J. | Guigó, Roderic | Hammonds, Ann | Harrow, Jen | Hoskins, Roger A. | Howald, Cédric | Hu, Long | Huang, Haiyan | Hubbard, Tim J. P. | Huynh, Chau | Jha, Sonali | Kasper, Dionna | Kato, Masaomi | Kaufman, Thomas C. | Kitchen, Robert R. | Ladewig, Erik | Lagarde, Julien | Lai, Eric | Leng, Jing | Lu, Zhi | MacCoss, Michael | May, Gemma | McWhirter, Rebecca | Merrihew, Gennifer | Miller, David M. | Mortazavi, Ali | Murad, Rabi | Oliver, Brian | Olson, Sara | Park, Peter J. | Pazin, Michael J. | Perrimon, Norbert | Pervouchine, Dmitri | Reinke, Valerie | Reymond, Alexandre | Robinson, Garrett | Samsonova, Anastasia | Saunders, Gary I. | Schlesinger, Felix | Sethi, Anurag | Slack, Frank J. | Spencer, William C. | Stoiber, Marcus H. | Strasbourger, Pnina | Tanzer, Andrea | Thompson, Owen A. | Wan, Kenneth H. | Wang, Guilin | Wang, Huaien | Watkins, Kathie L. | Wen, Jiayu | Wen, Kejia | Xue, Chenghai | Yang, Li | Yip, Kevin | Zaleski, Chris | Zhang, Yan | Zheng, Henry | Brenner, Steven E. | Graveley, Brenton R. | Celniker, Susan E. | Gingeras, Thomas R | Waterston, Robert
Nature  2014;512(7515):445-448.
PMCID: PMC4155737  PMID: 25164755
12.  Circuitous route to transcription regulation 
Molecular cell  2013;51(6):10.1016/j.molcel.2013.09.012.
In this issue of Molecular Cell, Zhang and colleagues (2013) identify a new class of intron-derived circular RNAs (ciRNAs) and show that they have the potential to enhance transcription of their host gene.
PMCID: PMC3839245  PMID: 24074951
13.  Gene expression analysis of human induced pluripotent stem cell-derived neurons carrying copy number variants of chromosome 15q11-q13.1 
Molecular Autism  2014;5:44.
Duplications of the chromosome 15q11-q13.1 region are associated with an estimated 1 to 3% of all autism cases, making this copy number variation (CNV) one of the most frequent chromosome abnormalities associated with autism spectrum disorder (ASD). Several genes located within the 15q11-q13.1 duplication region including ubiquitin protein ligase E3A (UBE3A), the gene disrupted in Angelman syndrome (AS), are involved in neural function and may play important roles in the neurobehavioral phenotypes associated with chromosome 15q11-q13.1 duplication (Dup15q) syndrome.
We have generated induced pluripotent stem cell (iPSC) lines from five different individuals containing CNVs of 15q11-q13.1. The iPSC lines were differentiated into mature, functional neurons. Gene expression across the 15q11-q13.1 locus was compared among the five iPSC lines and corresponding iPSC-derived neurons using quantitative reverse transcription PCR (qRT-PCR). Genome-wide gene expression was compared between neurons derived from three iPSC lines using mRNA-Seq.
Analysis of 15q11-q13.1 gene expression in neurons derived from Dup15q iPSCs reveals that gene copy number does not consistently predict expression levels in cells with interstitial duplications of 15q11-q13.1. mRNA-Seq experiments show that there is substantial overlap in the genes differentially expressed between 15q11-q13.1 deletion and duplication neurons, Finally, we demonstrate that UBE3A transcripts can be pharmacologically rescued to normal levels in iPSC-derived neurons with a 15q11-q13.1 duplication.
Chromatin structure may influence gene expression across the 15q11-q13.1 region in neurons. Genome-wide analyses suggest that common neuronal pathways may be disrupted in both the Angelman and Dup15q syndromes. These data demonstrate that our disease-specific stem cell models provide a new tool to decipher the underlying cellular and genetic disease mechanisms of ASD and may also offer a pathway to novel therapeutic intervention in Dup15q syndrome.
PMCID: PMC4332023  PMID: 25694803
UBE3A; autism; induced pluripotent stem cells; 15q duplication; Angelman syndrome
14.  Probabilistic Splicing of Dscam1 Establishes Identity at the Level of Single Neurons 
Cell  2013;155(5):1166-1177.
The Drosophila Dscam1 gene encodes a vast number of cell recognition molecules through alternative splicing. These exhibit isoform-specific homophilic binding and regulate self-avoidance, the tendency of neurites from the same cell to repel one another. Genetic experiments indicate that different cells must express different isoforms. How this is achieved is not known, as the expression of alternative exons in vivo has not been shown. Here, we modified the endogenous Dscam1 locus to generate splicing reporters for all variants of exon 4. We demonstrate that splicing does not occur in a cell-type specific fashion, that cells identified by their unique locations express different exon 4 variants in different animals, and that splicing in identified neurons can change over time. Probabilistic expression is compatible with a widespread role in neural circuit assembly through self-avoidance and is incompatible with models in which specific isoforms of Dscam1 mediate recognition between processes of different cells.
PMCID: PMC3950301  PMID: 24267895
15.  Sixty years of genome biology 
Genome Biology  2013;14(4):113.
Sixty years after Watson and Crick published the double helix model of DNA's structure, thirteen members of Genome Biology's Editorial Board select key advances in the field of genome biology subsequent to that discovery.
PMCID: PMC3663092  PMID: 23651518
16.  New insights from existing sequence data: generating breakthroughs without a pipette 
Molecular cell  2013;49(4):605-617.
With the rapidly declining cost of data generation, and the accumulation of massive datasets, molecular biology is entering an era in which incisive analysis of existing data will play an increasingly prominent role in the discovery of new biological phenomena and the elucidation of molecular mechanisms. Here, we discuss resources of publicly available sequencing data most useful for interrogating the mechanisms of gene expression. Existing next-generation sequence datasets, however, come with significant challenges in the form of technical and bioinformatic artifacts, which we discuss in detail. We also recount several breakthroughs made largely through the analysis of existing data, primarily in the RNA field.
PMCID: PMC3590807  PMID: 23438857
17.  Effects of Cocaine and Withdrawal on the Mouse Nucleus Accumbens Transcriptome 
Genes, brain, and behavior  2012;12(1):21-33.
Genetic association studies, pharmacological investigations, and analysis of mice lacking individual genes have made it clear that cocaine administration and withdrawal have a profound impact on multiple neurotransmitter systems. The GABAergic medium spiny neurons of the nucleus accumbens (NAc) exhibit changes in the expression of genes encoding receptors for glutamate and in the signaling pathways triggered by dopamine binding to G-protein coupled dopamine receptors. Deep sequence analysis provides a sensitive, quantitative and global analysis of the effects of cocaine on the NAc transcriptome. RNA prepared from the NAc of adult male mice receiving daily injections of saline or cocaine, or cocaine followed by a period of withdrawal, was used for high-throughput sequence analysis. Changes were validated by qPCR or Western blot. Based on pathway analysis, a preponderance of the genes affected by cocaine and withdrawal were involved in the cadherin, heterotrimeric G-protein, and Wnt signaling pathways. Distinct subsets of cadherins and protocadherins exhibited a sustained increase or decrease in expression. Sustained down-regulation of several heterotrimeric G-protein β- and γ-subunits was observed. In addition to altered expression of receptors for small molecule neurotransmitters, neuropeptides and endocannabinoids, changes in the expression of plasma membrane transporters and vesicular neurotransmitter transporters were also observed. The effects of chronic cocaine and withdrawal on the expression of genes essential to cholinergic, glutamatergic, GABAergic, peptidergic, and endocannabinoid signaling are as profound as their effects on dopaminergic transmission. Simultaneous targeting of multiple withdrawal-specific changes in gene expression may facilitate development of new therapeutic approaches that are better able to prevent relapse.
PMCID: PMC3553295  PMID: 23094851
RNA-Seq; pathway analysis; Wnt/cadherin signaling; heterotrimeric G-protein; glutamate; neuropeptide; acetylcholine; GABA
18.  The Genome of Anopheles darlingi, the main neotropical malaria vector 
Marinotti, Osvaldo | Cerqueira, Gustavo C. | de Almeida, Luiz Gonzaga Paula | Ferro, Maria Inês Tiraboschi | Loreto, Elgion Lucio da Silva | Zaha, Arnaldo | Teixeira, Santuza M. R. | Wespiser, Adam R. | Almeida e Silva, Alexandre | Schlindwein, Aline Daiane | Pacheco, Ana Carolina Landim | da Silva, Artur Luiz da Costa | Graveley, Brenton R. | Walenz, Brian P. | Lima, Bruna de Araujo | Ribeiro, Carlos Alexandre Gomes | Nunes-Silva, Carlos Gustavo | de Carvalho, Carlos Roberto | Soares, Célia Maria de Almeida | de Menezes, Claudia Beatriz Afonso | Matiolli, Cleverson | Caffrey, Daniel | Araújo, Demetrius Antonio M. | de Oliveira, Diana Magalhães | Golenbock, Douglas | Grisard, Edmundo Carlos | Fantinatti-Garboggini, Fabiana | de Carvalho, Fabíola Marques | Barcellos, Fernando Gomes | Prosdocimi, Francisco | May, Gemma | de Azevedo Junior, Gilson Martins | Guimarães, Giselle Moura | Goldman, Gustavo Henrique | Padilha, Itácio Q. M. | Batista, Jacqueline da Silva | Ferro, Jesus Aparecido | Ribeiro, José M. C. | Fietto, Juliana Lopes Rangel | Dabbas, Karina Maia | Cerdeira, Louise | Agnez-Lima, Lucymara Fassarella | Brocchi, Marcelo | de Carvalho, Marcos Oliveira | Teixeira, Marcus de Melo | Diniz Maia, Maria de Mascena | Goldman, Maria Helena S. | Cruz Schneider, Maria Paula | Felipe, Maria Sueli Soares | Hungria, Mariangela | Nicolás, Marisa Fabiana | Pereira, Maristela | Montes, Martín Alejandro | Cantão, Maurício E. | Vincentz, Michel | Rafael, Miriam Silva | Silverman, Neal | Stoco, Patrícia Hermes | Souza, Rangel Celso | Vicentini, Renato | Gazzinelli, Ricardo Tostes | Neves, Rogério de Oliveira | Silva, Rosane | Astolfi-Filho, Spartaco | Maciel, Talles Eduardo Ferreira | Ürményi, Turán P. | Tadei, Wanderli Pedro | Camargo, Erney Plessmann | de Vasconcelos, Ana Tereza Ribeiro
Nucleic Acids Research  2013;41(15):7387-7400.
Anopheles darlingi is the principal neotropical malaria vector, responsible for more than a million cases of malaria per year on the American continent. Anopheles darlingi diverged from the African and Asian malaria vectors ∼100 million years ago (mya) and successfully adapted to the New World environment. Here we present an annotated reference A. darlingi genome, sequenced from a wild population of males and females collected in the Brazilian Amazon. A total of 10 481 predicted protein-coding genes were annotated, 72% of which have their closest counterpart in Anopheles gambiae and 21% have highest similarity with other mosquito species. In spite of a long period of divergent evolution, conserved gene synteny was observed between A. darlingi and A. gambiae. More than 10 million single nucleotide polymorphisms and short indels with potential use as genetic markers were identified. Transposable elements correspond to 2.3% of the A. darlingi genome. Genes associated with hematophagy, immunity and insecticide resistance, directly involved in vector–human and vector–parasite interactions, were identified and discussed. This study represents the first effort to sequence the genome of a neotropical malaria vector, and opens a new window through which we can contemplate the evolutionary history of anopheline mosquitoes. It also provides valuable information that may lead to novel strategies to reduce malaria transmission on the South American continent. The A. darlingi genome is accessible at
PMCID: PMC3753621  PMID: 23761445
19.  Programmable plasmid interference by the CRISPR-Cas system in Thermococcus kodakarensis 
RNA Biology  2013;10(5):828-840.
CRISPR-Cas systems are RNA-guided immune systems that protect prokaryotes against viruses and other invaders. The CRISPR locus encodes crRNAs that recognize invading nucleic acid sequences and trigger silencing by the associated Cas proteins. There are multiple CRISPR-Cas systems with distinct compositions and mechanistic processes. Thermococcus kodakarensis (Tko) is a hyperthermophilic euryarchaeon that has both a Type I-A Csa and a Type I-B Cst CRISPR-Cas system. We have analyzed the expression and composition of crRNAs from the three CRISPRs in Tko by RNA deep sequencing and northern analysis. Our results indicate that crRNAs associated with these two CRISPR-Cas systems include an 8-nucleotide conserved sequence tag at the 5′ end. We challenged Tko with plasmid invaders containing sequences targeted by endogenous crRNAs and observed active CRISPR-Cas-mediated silencing. Plasmid silencing was dependent on complementarity with a crRNA as well as on a sequence element found immediately adjacent to the crRNA recognition site in the target termed the PAM (protospacer adjacent motif). Silencing occurred independently of the orientation of the target sequence in the plasmid, and appears to occur at the DNA level, presumably via DNA degradation. In addition, we have directed silencing of an invader plasmid by genetically engineering the chromosomal CRISPR locus to express customized crRNAs directed against the plasmid. Our results support CRISPR engineering as a feasible approach to develop prokaryotic strains that are resistant to infection for use in industry.
PMCID: PMC3737340  PMID: 23535213
CRISPR; Cas; archaea; Thermococcus; hyperthermophile; immune; RNA; DNA; silencing; interference
20.  Essential features and rational design of CRISPR RNAs that function with the Cas RAMP module complex to cleave RNAs 
Molecular Cell  2012;45(3):292-302.
Small RNAs target invaders for silencing in the CRISPR-Cas pathways that protect bacteria and archaea from viruses and plasmids. The CRISPR RNAs (crRNAs) contain sequence elements acquired from invaders that guide CRISPR-associated (Cas) proteins back to the complementary invading DNA or RNA. Here, we have analyzed essential features of the crRNAs associated with the Cas RAMP module (Cmr) effector complex, which cleaves targeted RNAs. We show that Cmr crRNAs contain an 8-nucleotide 5’ sequence tag (also found on crRNAs associated with other CRISPR-Cas pathways) that is critical for crRNA function and can be used to engineer crRNAs that direct cleavage of novel targets. We also present data that indicates that the Cmr complex cleaves an endogenous complementary RNA in Pyrococcus furiosus, providing direct in vivo evidence of RNA targeting by the CRISPR-Cas system. Our findings indicate that the CRISPR RNA-Cmr protein pathway may be exploited to cleave RNAs of interest.
PMCID: PMC3278580  PMID: 22227116
21.  Genomic imprinting absent in Drosophila melanogaster adult females 
Cell reports  2012;2(1):69-75.
Genomic imprinting occurs when expression of an allele differs based on the sex of the parent that transmitted the allele. In D. melanogaster, imprinting can occur, but its impact on allelic expression genome-wide is unclear. Here, we search for imprinted genes in D. melanogaster using RNA-seq to compare allele-specific expression between pools of 7–10 day old adult female progeny from reciprocal crosses. 119 genes with allelic expression patterns consistent with imprinting were identified and showed significant clustering within the genome. Surprisingly, additional analysis of several of these genes showed that either genomic heterogeneity or high levels of intrinsic noise caused imprinting-like allelic expression. Consequently, our data provide no convincing evidence of imprinting for D. melanogaster genes in their native genomic context. Elucidating sources of false positive signals for imprinting in allele-specific RNA-seq data, as done here, is critical given the growing popularity of this method for identifying imprinted genes.
PMCID: PMC3565465  PMID: 22840398
22.  Expansion of the eukaryotic proteome by alternative splicing 
Nature  2010;463(7280):457-463.
The collection of components required to carry out the intricate processes involved in generating and maintaining a living, breathing and, sometimes, thinking organism is staggeringly complex. Where do all of the parts come from? Early estimates stated that about 100,000 genes would be required to make up a mammal; however, the actual number is less than one-quarter of that, barely four times the number of genes in budding yeast. It is now clear that the ‘missing’ information is in large part provided by alternative splicing, the process by which multiple different functional messenger RNAs, and therefore proteins, can be synthesized from a single gene.
PMCID: PMC3443858  PMID: 20110989
23.  RNA structure and the mechanisms of alternative splicing 
Alternative splicing is a widespread means of increasing protein diversity and regulating gene expression in eukaryotes. Much progress has been made in understanding the proteins involved in regulating alternative splicing, the sequences they bind to, and how these interactions lead to changes in splicing patterns. However, several recent studies have identified other players involved in regulating alternative splicing. A major theme emerging from these studies is that RNA secondary structures play an under appreciated role in the regulation of alternative splicing. This review provides and overview of the basic aspects of splicing regulation and highlights recent progress in understanding the role of RNA secondary structure in this process.
PMCID: PMC3149766  PMID: 21530232
24.  Global Patterns of Tissue-Specific Alternative Polyadenylation in Drosophila 
Cell reports  2012;1(3):277-289.
We analyzed the usage and consequences of alternative cleavage and polyadenylation (APA) in Drosophila melanogaster by using >1 billion reads of stranded mRNA-seq across a variety of dissected tissues. Beyond demonstrating that a majority of fly transcripts are subject to APA, we observed broad trends for 3′ untranslated region (UTR) shortening in the testis and lengthening in the central nervous system (CNS); the latter included hundreds of unannotated extensions ranging up to 18 kb. Extensive northern analyses validated the accumulation of full-length neural extended transcripts, and in situ hybridization indicated their spatial restriction to the CNS. Genes encoding RNA binding proteins (RBPs) and transcription factors were preferentially subject to 3′ UTR extensions. Motif analysis indicated enrichment of miRNA and RBP sites in the neural extensions, and their termini were enriched in canonical cis elements that promote cleavage and polyadenylation. Altogether, we reveal broad tissue-specific patterns of APA in Drosophila and transcripts with unprecedented 3′ UTR length in the nervous system.
PMCID: PMC3368434  PMID: 22685694
25.  Transcriptome Analysis Reveals Strain-Specific and Conserved Stemness Genes in Schmidtea mediterranea 
PLoS ONE  2012;7(4):e34447.
The planarian Schmidtea mediterranea is a powerful model organism for studying stem cell biology due to its extraordinary regenerative ability mediated by neoblasts, a population of adult somatic stem cells. Elucidation of the S. mediterranea transcriptome and the dynamics of transcript expression will increase our understanding of the gene regulatory programs that regulate stem cell function and differentiation. Here, we have used RNA-Seq to characterize the S. mediterranea transcriptome in sexual and asexual animals and in purified neoblast and differentiated cell populations. Our analysis identified many uncharacterized genes, transcripts, and alternatively spliced isoforms that are differentially expressed in a strain or cell type-specific manner. Transcriptome profiling of purified neoblasts and differentiated cells identified neoblast-enriched transcripts, many of which likely play important roles in regeneration and stem cell function. Strikingly, many of the neoblast-enriched genes are orthologs of genes whose expression is enriched in human embryonic stem cells, suggesting that a core set of genes that regulate stem cell function are conserved across metazoan species.
PMCID: PMC3319590  PMID: 22496805

Results 1-25 (46)