Congenital heart disease (CHD) is the most frequent birth defect, affecting 0.8% of live births1. Many cases occur sporadically and impair reproductive fitness, suggesting a role for de novo mutations. By analysis of exome sequencing of parent-offspring trios, we compared the incidence of de novo mutations in 362 severe CHD cases and 264 controls. CHD cases showed a significant excess of protein-altering de novo mutations in genes expressed in the developing heart, with an odds ratio of 7.5 for damaging mutations. Similar odds ratios were seen across major classes of severe CHD. We found a marked excess of de novo mutations in genes involved in production, removal or reading of H3K4 methylation (H3K4me), or ubiquitination of H2BK120, which is required for H3K4 methylation2–4. There were also two de novo mutations in SMAD2; SMAD2 signaling in the embryonic left-right organizer induces demethylation of H3K27me5. H3K4me and H3K27me mark `poised' promoters and enhancers that regulate expression of key developmental genes6. These findings implicate de novo point mutations in several hundred genes that collectively contribute to ~10% of severe CHD.
Lampreys are representatives of an ancient vertebrate lineage that diverged from our own ~500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (P. marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and the underlying principles of vertebrate biology. Here, we present the first lamprey whole-genome sequence and assembly. We note challenges faced owing to its high content of repetitive elements and GC bases, as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole-genome duplications likely occurred before the divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin-associated proteins and the development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and the evolutionary events that have shaped the genomes of extant organisms.
MiST is a novel approach to variant calling from deep sequencing data, using the inverted mapping approach developed for Geoseq. Reads that can map to a targeted exonic region are identified using exact matches to tiles from the region. The reads are then aligned to the targets to discover variants. MiST carefully handles paralogous reads that map ambiguously to the genome and clonal reads arising from PCR bias, which are the two major sources of errors in variant calling. The reduced computational complexity of mapping selected reads to targeted regions of the genome improves speed, specificity and sensitivity of variant detection. Compared with variant calls from the GATK platform, MiST showed better concordance with SNPs from dbSNP and genotypes determined by an exonic-SNP array. Variant calls made only by MiST confirm at a high rate (>90%) by Sanger sequencing. Thus, MiST is a valuable alternative tool to analyse variants in deep sequencing data.
Influenza A virus (IAV) is an unremitting virus that results in significant morbidity and mortality worldwide. Key to the viral life cycle is the RNA-dependent RNA polymerase (RdRp), a heterotrimeric complex responsible for both transcription and replication of the segmented genome. Here, we demonstrate that the viral polymerase utilizes a small RNA enhancer to regulate enzymatic activity and maintain stoichiometric balance of the viral genome. We demonstrate that IAV synthesizes small viral RNAs (svRNAs) that interact with the viral RdRp in order to promote genome replication in a segment-specific manner. svRNAs localize to the nucleus, the site of IAV replication, are synthesized from the positive-sense genomic intermediate, and interact within a novel RNA binding channel of the polymerase PA subunit. Synthetic svRNAs promote polymerase activity in vitro, while loss of svRNA inhibits viral RNA synthesis in a segment-specific manner. Taking these observations together, we mechanistically define svRNA as a small regulatory enhancer RNA, which functions to promote genome replication and maintain segment balance through allosteric modulation of polymerase activity.
We introduce two large-scale resources for functional analysis of microRNA—a decoy/sponge library for inhibiting microRNA function and a sensor library for monitoring microRNA activity. To take advantage of the sensor library, we developed a high-throughput assay called Sensor-seq, which permits the activity of hundreds of microRNAs to be quantified simultaneously. Using this approach, we show that only the most abundant microRNAs within a cell mediate significant target suppression. Over 60% of detected microRNAs had no discernible activity, indicating that the functional ‘miRNome’ of a cell is considerably smaller than currently inferred from profiling studies. Moreover, some highly expressed microRNAs exhibit relatively weak activity, which in some cases correlated with a high target-to-microRNA ratio or increased nuclear localization of the microRNA. Finally, we show that the microRNA decoy library can be used for pooled loss-of-function studies. These tools provide valuable resources for studying microRNA biology and for microRNA-based therapeutics.
In animal gonads, PIWI proteins and their bound 23–30 nt piRNAs guard genome integrity by the sequence specific silencing of transposons. Two branches of piRNA biogenesis, namely primary processing and ping-pong amplification, have been proposed. Despite an overall conceptual understanding of piRNA biogenesis, identity and/or function of the involved players are largely unknown. Here, we demonstrate an essential role for the female sterility gene shutdown in piRNA biology. Shutdown, an evolutionarily conserved cochaperone collaborates with Hsp90 during piRNA biogenesis, potentially at the loading step of RNAs into PIWI proteins. We demonstrate that Shutdown is essential for both primary and secondary piRNA populations in Drosophila. An extension of our study to previously described piRNA pathway members revealed three distinct groups of biogenesis factors. Together with data on how PIWI proteins are wired into primary and secondary processing, we propose a unified model for piRNA biogenesis.
► The cochaperone Shutdown is an essential piRNA biogenesis factor ► Primary and secondary piRNA biogenesis feed into a common biogenesis step ► Piwi and Aub, but not AGO3 are loaded with primary piRNAs
Considerable details about microRNA (miRNA) biogenesis and regulation have been uncovered, however, little is known about the fate of the miRNA subsequent to target regulation. To gain insight into this process, we carried out kinetic analysis of a miRNA’s turnover following termination of its biogenesis, and during regulation of a target that is not subject to Ago2-mediated catalytic cleavage. By quantitating the number of molecules of the miRNA and its target in steady-state, and in the course of its decay, we found that each miRNA molecule was able to regulate at least 2 target transcripts, providing in vivo evidence that the miRNA is not irreversibly sequestered with its target, and that the non-slicing pathway of miRNA regulation is multiple-turnover. Using deep-sequencing, we further show that miRNA recycling is limited by target regulation, which promotes post-transcriptional modifications to the 3′ end of the miRNA, and accelerates the miRNA’s rate of decay. These studies provide new insight into the efficiency of miRNA regulation, which help to explain how a miRNA can regulate a vast number of transcripts, and identify one of the mechanisms that impart specificity to miRNA decay in mammalian cells.
Protecting the genome from transposable element (TE) mobilization is critical for germline development. In Drosophila, Piwi proteins and their bound small RNAs (piRNAs) provide a potent defense against TE activity. TE targeting piRNAs are processed from TE-dense heterochromatic loci termed ‘piRNA clusters’. While piRNA biogenesis from cluster precursors is beginning to be understood, little is known about piRNA cluster transcriptional regulation. Here we show that deposition of histone 3 lysine 9 by the methyltransferase dSETDB1 (egg) is required for piRNA cluster transcription. In the absence of dSETDB1, cluster precursor transcription collapses in germline and somatic gonadal cells and TEs are activated, resulting in germline loss and a block in germline stem cell differentiation. We propose that heterochromatin protects the germline by activating the piRNA pathway.
RNA-Seq allows a theoretically unbiased analysis of both genome-wide transcription levels and mutation status of a tumor. Using this technique we sought to identify novel candidate therapeutic targets expressed in epithelial ovarian cancer (EOC).
Specifically, we sought candidate invasion/migration targets based on expression levels across all tumors, novelty of expression in EOC, and known function. RNA-Seq analysis revealed the high expression of CD151, a transmembrane protein, across all stages of EOC. Expression was confirmed at both the mRNA and protein levels using RT-PCR and immunohistochemical staining, respectively.
In both EOC tumors and normal ovarian surface epithelial cells we demonstrated CD151 to be localized to the membrane and cell-cell junctions in patient-derived and established EOC cell lines. We next evaluated its role in EOC dissemination using two ovarian cancer-derived cell lines with differential levels of CD151 expression. Targeted antibody-mediated and siRNA inhibition or loss of CD151 in SKOV3 and OVCAR5 cell lines effectively inhibited their migration and invasion.
Taken together, these findings provide the first proof-of-principle demonstration for a next generation sequencing approach to identifying candidate therapeutic targets and reveal CD151 to play a role in EOC dissemination.
CD151; Epithelial Ovarian Cancer; Invasion; Migration; Metastasis; RNA-Seq
We sought to identify candidate serum biomarkers for the detection and surveillance of EOC. Based on RNA-Seq transcriptome analysis of patient-derived tumors, highly expressed secreted proteins were identified using a bioinformatic approach.
RNA-Seq was used to quantify papillary serous ovarian cancer transcriptomes. Paired end sequencing of 22 flash frozen tumors was performed. Sequence alignments were processed with the program ELAND, expression levels with ERANGE and then bioinformatically screened for secreted protein signatures. Serum samples from women with benign and malignant pelvic masses and serial samples from women during chemotherapy regimens were measured for IGFBP-4 by ELISA. Student's t Test, ANOVA, and ROC curves were used for statistical analysis.
Insulin-like growth factor binding protein (IGFBP-4) was consistently present in the top 7.5% of all expressed genes in all tumor samples. We then screened serum samples to determine if increased tumor expression correlated with serum expression. In an initial discovery set of 21 samples, IGFBP-4 levels were found to be elevated in patients, including those with early stage disease and normal CA125 levels. In a larger and independent validation set (82 controls, 78 cases), IGFBP-4 levels were significantly increased (p < 5 × 10-5). IGFBP-4 levels were ~3× greater in women with malignant pelvic masses compared to women with benign masses. ROC sensitivity was 73% at 93% specificity (AUC 0.816). In women receiving chemotherapy, average IGFBP-4 levels were below the ROC-determined threshold and lower in NED patients compared to AWD patients.
This study, the first to our knowledge to use RNA-Seq for biomarker discovery, identified IGFBP-4 as overexpressed in ovarian cancer patients. Beyond this, these studies identified two additional intriguing findings. First, IGFBP-4 can be elevated in early stage disease without elevated CA125. Second, IGFBP-4 levels are significantly elevated with malignant versus benign disease. These findings provide the rationale for future validation studies.
IGFBP-4; epithelial ovarian cancer; serum biomarker; RNA-Seq; transcriptome
In Drosophila, Piwi proteins associate with Piwi-interacting RNAs (piRNAs) and protect the germline genome by silencing mobile genetic elements. This defense system acts in germline and gonadal somatic tissue to preserve germline development. Genetic control for these silencing pathways varies greatly between tissues of the gonad. Here, we identified Vreteno (Vret), a novel gonad-specific protein essential for germline development. Vret is required for piRNA-based transposon regulation in both germline and somatic gonadal tissues. We show that Vret, which contains Tudor domains, associates physically with Piwi and Aubergine (Aub), stabilizing these proteins via a gonad-specific mechanism that is absent in other fly tissues. In the absence of vret, Piwi-bound piRNAs are lost without changes in piRNA precursor transcript production, supporting a role for Vret in primary piRNA biogenesis. In the germline, piRNAs can engage in an Aub- and Argonaute 3 (AGO3)-dependent amplification in the absence of Vret, suggesting that Vret function can distinguish between primary piRNAs loaded into Piwi-Aub complexes and piRNAs engaged in the amplification cycle. We propose that Vret plays an essential role in transposon regulation at an early stage of primary piRNA processing.
Germline stem cell; Soma; Transposon; Piwi; Aubergine; piRNAs; Tudor; Drosophila
Deep sequencing of small RNAs (sRNA-seq) is now the gold standard for small RNA profiling and discovery. Biases in sRNA-seq have been reported, but their etiology remains unidentified. Through a comprehensive series of sRNA-seq experiments, we establish that the predominant cause of the bias is the RNA ligases. We further demonstrate that RNA ligases have strong sequence-specific biases which distort the small RNA profiles considerably. We have devised a pooled adapter strategy to overcome this bias, and validated the method through data derived from microarray and qPCR. In light of our findings, published small RNA profiles, as well as barcoding strategies using adapter-end modifications, may need to be revisited. Importantly, by providing a wide spectrum of substrate for the ligase, the pooled-adapter strategy developed here provides a means to overcome issues of bias, and generate more accurate small RNA profiles.
Pseudogenes populate the mammalian genome as remnants of artefactual incorporation of coding messenger RNAs into transposon pathways1. Here we show that a subset of pseudogenes generates endogenous small interfering RNAs (endo-siRNAs) in mouse oocytes. These endo-siRNAs are often processed from double-stranded RNAs formed by hybridization of spliced transcripts from protein-coding genes to antisense transcripts from homologous pseudogenes. An inverted repeat pseudogene can also generate abundant small RNAs directly. A second class of endo-siRNAs may enforce repression of mobile genetic elements, acting together with Piwi-interacting RNAs. Loss of Dicer, a protein integral to small RNA production, increases expression of endo-siRNA targets, demonstrating their regulatory activity. Our findings indicate a function for pseudogenes in regulating gene expression by means of the RNA interference pathway and may, in part, explain the evolutionary pressure to conserve argonaute-mediated catalysis in mammals.
Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest.
Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment.
Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Drosophila endogenous small RNAs are categorized according to their mechanisms of biogenesis and the Argonaute protein to which they bind. MicroRNAs are a class of ubiquitously expressed RNAs of ~22 nucleotides in length, which arise from structured precursors through the action of Drosha–Pasha and Dicer-1–Loquacious complexes1–7. These join Argonaute-1 to regulate gene expression8,9. A second endogenous small RNA class, the Piwi-interacting RNAs, bind Piwi proteins and suppress transposons10,11. Piwi-interacting RNAs are restricted to the gonad, and at least a subset of these arises by Piwi-catalysed cleavage of single-stranded RNAs12,13. Here we show that Drosophila generates a third small RNA class, endogenous small interfering RNAs, in both gonadal and somatic tissues. Production of these RNAs requires Dicer-2, but a subset depends preferentially on Loquacious1,4,5 rather than the canonical Dicer-2 partner, R2D2 (ref. 14). Endogenous small interfering RNAs arise both from convergent transcription units and from structured genomic loci in a tissue-specific fashion. They predominantly join Argonaute-2 and have the capacity, as a class, to target both protein-coding genes and mobile elements. These observations expand the repertoire of small RNAs in Drosophila, adding a class that blurs distinctions based on known biogenesis mechanisms and functional roles.
In Drosophila gonads, Piwi proteins and associated piRNAs collaborate with additional factors to form a small RNA-based immune system that silences mobile elements. Here, we analyzed nine Drosophila piRNA pathway mutants for their impacts on both small RNA populations and the subcellular localization patterns of Piwi proteins. We find that distinct piRNA pathways with differing components function in ovarian germ and somatic cells. In the soma, Piwi acts singularly with the conserved flamenco piRNA cluster to enforce silencing of retroviral elements that may propagate by infecting neighboring germ cells. In the germline, silencing programs encoded within piRNA clusters are optimized via a slicer-dependent amplification loop to suppress a broad spectrum of elements. The classes of transposons targeted by germline and somatic piRNA clusters, though not the precise elements, are conserved among Drosophilids, demonstrating that the architecture of piRNA clusters has coevolved with the transposons that they are tasked to control.
In plants and mammals, small RNAs indirectly mediate epigenetic inheritance by specifying cytosine methylation. We found that small RNAs themselves serve as vectors for epigenetic information. Crosses between Drosophila strains that differ in the presence of a particular transposon can produce sterile progeny, a phenomenon called hybrid dysgenesis. This phenotype manifests itself only if the transposon is paternally inherited, suggesting maternal transmission of a factor that maintains fertility. In both P- and I-element–mediated hybrid dysgenesis models, daughters show a markedly different content of Piwi-interacting RNAs (piRNAs) targeting each element, depending on their parents of origin. Such differences persist from fertilization through adulthood. This indicates that maternally deposited piRNAs are important for mounting an effective silencing response and that a lack of maternal piRNA inheritance underlies hybrid dysgenesis.
piRNAs and Piwi proteins have been implicated in transposon control and are linked to transposon methylation in mammals. Here, we examined the construction of the piRNA system in the restricted developmental window in which methylation patterns are set during mammalian embryogenesis. We find robust expression of two Piwi family proteins, MIWI2 and MILI. Their associated piRNA profiles reveal differences from Drosophila wherein large piRNA clusters act as master regulators of silencing. Instead, in mammals, dispersed transposon copies initiate the pathway, producing primary piRNAs, which predominantly join MILI in the cytoplasm. MIWI2, whose nuclear localization and association with piRNAs depend upon MILI, is enriched for secondary piRNAs antisense to the elements that it controls. The Piwi pathway lies upstream of known mediators of DNA methylation, since piRNAs are still produced in Dnmt3L mutants, which fail to methylate transposons. This implicates piRNAs as specificity determinants of DNA methylation in germ cells.
The transcriptomes of eukaryotic cells are incredibly complex. Individual non-coding RNAs dwarf the number of protein-coding genes, and include classes that are well understood as well as classes for which the nature, extent and functional roles are obscure1. Deep sequencing of small RNAs (<200 nucleotides) from human HeLa and HepG2 cells revealed a remarkable breadth of species. These arose both from within annotated genes and from unannotated intergenic regions. Overall, small RNAs tended to align with CAGE (cap-analysis of gene expression) tags2, which mark the 5′ ends of capped, long RNA transcripts. Many small RNAs, including the previously described promoter-associated small RNAs3, appeared to possess cap structures. Members of an extensive class of both small RNAs and CAGE tags were distributed across internal exons of annotated protein coding and non-coding genes, sometimes crossing exon–exon junctions. Here we show that processing of mature mRNAs through an as yet unknown mechanism may generate complex populations of both long and short RNAs whose apparently capped 5′ ends coincide. Supplying synthetic promoter-associated small RNAs corresponding to the c-MYC transcriptional start site reduced MYC messenger RNA abundance. The studies presented here expand the catalogue of cellular small RNAs and demonstrate a biological impact for at least one class of non-canonical small RNAs.
There is great interest in probing the temporal and spatial patterns of cytosine methylation states in genomes of a variety of organisms. It is hoped that this will shed light on the biological roles of DNA methylation in the epigenetic control of gene expression. Bisulfite sequencing refers to the treatment of isolated DNA with sodium bisulfite to convert unmethylated cytosine to uracil, with PCR converting the uracil to thymidine followed by sequencing of the resultant DNA to detect DNA methylation. For the study of DNA methylation, plants provide an excellent model system, since they can tolerate major changes in their DNA methylation patterns and have long been studied for the effects of DNA methylation on transposons and epimutations. However, in contrast to the situation in animals, there aren't many tools that analyze bisulfite data in plants, which can exhibit methylation of cytosines in a variety of sequence contexts (CG, CHG, and CHH).
Kismeth is a web-based tool for bisulfite sequencing analysis. Kismeth was designed to be used with plants, since it considers potential cytosine methylation in any sequence context (CG, CHG, and CHH). It provides a tool for the design of bisulfite primers as well as several tools for the analysis of the bisulfite sequencing results. Kismeth is not limited to data from plants, as it can be used with data from any species.
Kismeth simplifies bisulfite sequencing analysis. It is the only publicly available tool for the design of bisulfite primers for plants, and one of the few tools for the analysis of methylation patterns in plants. It facilitates analysis at both global and local scales, demonstrated in the examples cited in the text, allowing dissection of the genetic pathways involved in DNA methylation. Kismeth can also be used to study methylation states in different tissues and disease cells compared to a reference sequence.
The idea that conversion of glucose to ATP is an attractive target for cancer therapy has been supported in part by the observation that glucose deprivation induces apoptosis in rodent cells transduced with the proto-oncogene MYC, but not in the parental line. Here, we found that depletion of glucose killed normal human cells irrespective of induced MYC activity and by a mechanism different from apoptosis. However, depletion of glutamine, another major nutrient consumed by cancer cells, induced apoptosis depending on MYC activity. This apoptosis was preceded by depletion of the Krebs cycle intermediates, was prevented by two Krebs cycle substrates, but was unrelated to ATP synthesis or several other reported consequences of glutamine starvation. Our results suggest that the fate of normal human cells should be considered in evaluating nutrient deprivation as a strategy for cancer therapy, and that understanding how glutamine metabolism is linked to cell viability might provide new approaches for treatment of cancer.
Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal.
lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales.
lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired.
We have collected over half a million splice sites from five species—Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana—and classified them into four subtypes: U2-type GT–AG and GC–AG and U12-type GT–AG and AT–AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT–AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3′ splice sites (3′ss) and (iv) distinct evolutionary histories of 5′ and 3′ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing.
Independent identification of genes in different organisms and assays has led to a multitude of names for each gene. This balkanization makes it difficult to use gene names to locate genomic resources, homologs in other species and relevant publications.
We solve the naming problem by collecting data from a variety of sources and building a name-translation database. We have also built a table of homologs across several model organisms: H. sapiens, M. musculus, R. norvegicus, D. melanogaster, C. elegans, S. cerevisiae, S. pombe and A. thaliana. This allows GeneSeer to draw phylogenetic trees and identify the closest homologs. This, in turn, allows the use of names from one species to identify homologous genes in another species. A website is connected to the database to allow user-friendly access to our tools and external genomic resources using familiar gene names.
GeneSeer allows access to gene information through common names and can map sequences to names. GeneSeer also allows identification of homologs and paralogs for a given gene. A variety of genomic data such as sequences, SNPs, splice variants, expression patterns and others can be accessed through the GeneSeer interface. It is freely available over the web and can be incorporated in other tools through an http-based software interface described on the website. It is currently used as the search engine in the RNAi codex resource, which is a portal for short hairpin RNA (shRNA) gene-silencing constructs.