By coupling laser capture microdissection to nanoCAGE technology and next-generation sequencing we have identified the genome-wide collection of active promoters in the mouse Main Olfactory Epithelium (MOE). Transcription start sites (TSSs) for the large majority of Olfactory Receptors (ORs) have been previously mapped increasing our understanding of their promoter architecture. Here we show that in our nanoCAGE libraries of the mouse MOE we detect a large number of tags mapped in loci hosting Type-1 and Type-2 Vomeronasal Receptors genes (V1Rs and V2Rs). These loci also show a massive expression of Long Interspersed Nuclear Elements (LINEs). We have validated the expression of selected receptors detected by nanoCAGE with in situ hybridization, RT-PCR and qRT-PCR. This work extends the repertory of receptors capable of sensing chemical signals in the MOE, suggesting intriguing interplays between MOE and VNO for pheromone processing and positioning transcribed LINEs as candidate regulatory RNAs for VRs expression.
vomeronasal receptors; main olfactory epithelium; vomeronasal organ; VNO; MOE; V1Rs; V2Rs
Brain function is shaped by postnatal experience and vulnerable to disruption of Methyl-CpG-binding protein, Mecp2, in multiple neurodevelopmental disorders. How Mecp2 contributes to the experience-dependent refinement of specific cortical circuits and their impairment remains unknown. We analyzed vision in gene-targeted mice and observed an initial normal development in the absence of Mecp2. Visual acuity then rapidly regressed after postnatal day P35–40 and cortical circuits largely fell silent by P55-60. Enhanced inhibitory gating and an excess of parvalbumin-positive, perisomatic input preceded the loss of vision. Both cortical function and inhibitory hyperconnectivity were strikingly rescued independent of Mecp2 by early sensory deprivation or genetic deletion of the excitatory NMDA receptor subunit, NR2A. Thus, vision is a sensitive biomarker of progressive cortical dysfunction and may guide novel, circuit-based therapies for Mecp2 deficiency.
PROM1 is the gene encoding prominin-1 or CD133, an important cell surface marker for the isolation of both normal and cancer stem cells. PROM1 transcripts initiate at a range of transcription start sites (TSS) associated with distinct tissue and cancer expression profiles. Using high resolution Cap Analysis of Gene Expression (CAGE) sequencing we characterize TSS utilization across a broad range of normal and developmental tissues. We identify a novel proximal promoter (P6) within CD133+ melanoma cell lines and stem cells. Additional exon array sampling finds P6 to be active in populations enriched for mesenchyme, neural stem cells and within CD133+ enriched Ewing sarcomas. The P6 promoter is enriched with respect to previously characterized PROM1 promoters for a HMGI/Y (HMGA1) family transcription factor binding site motif and exhibits different epigenetic modifications relative to the canonical promoter region of PROM1.
PROM1 protein; human; AC133 antigen; transcription start site; promoter regions; genetic; melanoma; cancer stem cells
Analyzing the RNA pool or transcription start sites requires effective means to convert RNA into cDNA libraries for digital expression counting. With current high-speed sequencers, it is necessary to flank the cDNAs with specific adapters. Adding template-switching oligonucleotides to reverse transcription reactions is the most commonly used approach when working with very small quantities of RNA even from single cells.
Here we compared the performance of DNA-RNA, DNA-LNA and DNA oligonucleotides in template-switching during nanoCAGE library preparation. Test libraries from rat muscle and HeLa cell RNA were prepared in technical triplicates and sequenced for comparison of the gene coverage and distribution of the reads within transcripts. The DNA-RNA oligonucleotide showed the highest specificity for capped 5′ ends of mRNA, whereas the DNA-LNA provided similar gene coverage with more reads falling within exons.
While confirming the cap-specific preference of DNA-RNA oligonucleotides in template-switching reactions, our data indicate that DNA-LNA hybrid oligonucleotides could potentially find other applications in random RNA sequencing.
CAGE; Template-switching; LNA; Transcriptome; Quantitative sequencing
Efficient isolation of specific, intact, living neurons from the adult brain is problematic due to the complex nature of the extracellular matrix consolidating the neuronal network. Here, we present significant improvements to the protocol for isolation of pure populations of neurons from mature postnatal mouse brain using fluorescence activated cell sorting (FACS). The 10-fold increase in cell yield enables cell-specific transcriptome analysis by protocols such as nano-CAGE and RNA seq.
FACS; parvalbumin; pyramidal; nanoCAGE; RNA seq
LINE-1 (L1) retrotransposons are mobile genetic elements comprising ∼17% of the human genome. New L1 insertions can profoundly alter gene function and cause disease, though their significance in cancer remains unclear. Here, we applied enhanced retrotransposon capture sequencing (RC-seq) to 19 hepatocellular carcinoma (HCC) genomes and elucidated two archetypal L1-mediated mechanisms enabling tumorigenesis. In the first example, 4/19 (21.1%) donors presented germline retrotransposition events in the tumor suppressor mutated in colorectal cancers (MCC). MCC expression was ablated in each case, enabling oncogenic β-catenin/Wnt signaling. In the second example, suppression of tumorigenicity 18 (ST18) was activated by a tumor-specific L1 insertion. Experimental assays confirmed that the L1 interrupted a negative feedback loop by blocking ST18 repression of its enhancer. ST18 was also frequently amplified in HCC nodules from Mdr2−/− mice, supporting its assignment as a candidate liver oncogene. These proof-of-principle results substantiate L1-mediated retrotransposition as an important etiological factor in HCC.
► L1 retrotransposons promote tumorigenesis in hepatocellular carcinoma (HCC) ► Germline L1 and Alu insertions in MCC activate β-catenin/Wnt signaling ► L1 mobilization in tumor cells accelerates transformation of the HCC genome ► A tumor-specific L1 insertion interrupts a negative feedback loop regulating ST18
L1 retrotransposons, which are widespread in the human genome, can mobilize and activate oncogenes in the livers of individuals infected with the hepatitis B or hepatitis C virus, promoting the development and growth of hepatocellular carcinoma. Genes identified by the L1 insertions present new options for cancer screening and intervention.
Non-coding RNAs (ncRNAs) are involved in an increasing number of cellular events1. Some ncRNAs are processed by DICER and DROSHA ribonucleases to give rise to small double-stranded RNAs involved in RNA interference (RNAi)2. The DNA-damage response (DDR) is a signaling pathway that originates from the DNA lesion and arrests cell proliferation3. So far, DICER or DROSHA RNA products have not been reported to control DDR activation. Here we show that DICER and DROSHA, but not downstream elements of the RNAi pathway, are necessary to activate DDR upon oncogene-induced genotoxic stress and exogenous DNA damage, as studied also by DDR foci formation in mammalian cells and zebrafish and by checkpoint assays. DDR foci are sensitive to RNase A treatment, and DICER- and DROSHA-dependent RNA products are required to restore DDR foci in treated cells. Through RNA deep sequencing and studies of DDR activation at an inducible unique DNA double-strand break (DSB), we demonstrate that DDR foci formation requires site-specific DICER- and DROSHA-dependent small RNAs, named DDRNAs, which act in a MRE11-RAD50-NBS1 (MRN) complex-dependent manner. Chemically synthesized or in vitro-generated by DICER cleavage, DDRNAs are sufficient to restore DDR in RNase A-treated cells, also in the absence of other cellular RNAs. Our results describe an unanticipated direct role of a novel class of ncRNAs in the control of DDR activation at sites of DNA damage.
DICER; DROSHA; small non coding RNAs; DNA damage response (DDR); ATM; cellular senescence; zebrafish
Mutations in the MECP2 gene are found in a large proportion of girls with Rett Syndrome. Despite extensive research, the principal role of MeCP2 protein remains elusive. Is MeCP2 a regulator of genes, acting in concert with co-activators and co-repressors, predominantly as an activator of target genes or is it a methyl CpG binding protein acting globally to change the chromatin state and to supress transcription from repeat elements? If MeCP2 has no specific targets in the genome, what causes the differential expression of specific genes in the Mecp2 knockout mouse brain? We discuss the discrepancies in current data and propose a hypothesis to reconcile some differences in the two viewpoints. Since transcripts from repeat elements contribute to piRNA biogenesis, we propose that piRNA levels may be higher in the absence of MeCP2 and that increased piRNA levels may contribute to the mis-regulation of some genes seen in the Mecp2 knockout mouse brain. We provide preliminary data showing an increase in piRNAs in the Mecp2 knockout mouse cerebellum. Our investigation suggests that global piRNA levels may be elevated in the Mecp2 knockout mouse cerebellum and strongly supports further investigation of piRNAs in Rett syndrome.
Rett Syndrome; MeCP2; piRNAs; LINE 1; short RNAs
Template switching (TS) has been an inherent mechanism of reverse transcriptase, which
has been exploited in several transcriptome analysis methods, such as CAGE, RNA-Seq and
short RNA sequencing. TS is an attractive option, given the simplicity of the protocol,
which does not require an adaptor mediated step and thus minimizes sample loss. As such,
it has been used in several studies that deal with limited amounts of RNA, such as in
single cell studies. Additionally, TS has also been used to introduce DNA barcodes or
indexes into different samples, cells or molecules. This labeling allows one to pool
several samples into one sequencing flow cell, increasing the data throughput of
sequencing and takes advantage of the increasing throughput of current sequences. Here, we
report TS artifacts that form owing to a process called strand invasion. Due to the way in
which barcodes/indexes are introduced by TS, strand invasion becomes more problematic by
introducing unsystematic biases. We describe a strategy that eliminates these artifacts
in silico and propose an experimental solution that suppresses biases
Retrotransposons are mobile genetic elements that employ a germ line “copy-and-paste” mechanism to spread throughout metazoan genomes1. At least 50% of the human genome is derived from retrotransposons, with three active families (L1, Alu and SVA) associated with insertional mutagenesis and disease2-3. Epigenetic and post-transcriptional suppression block retrotransposition in somatic cells4-5, excluding early embryo development and some malignancies6-7. Recent reports of L1 expression8-9 and copy number variation10-11 (CNV) in the human brain suggest L1 mobilization may also occur during later development. However, the corresponding integration sites have not been mapped. Here we apply a high-throughput method to identify numerous L1, Alu and SVA germ line mutations, as well as 7,743 putative somatic L1 insertions in the hippocampus and caudate nucleus of three individuals. Surprisingly, we also found 13,692 and 1,350 somatic Alu and SVA insertions, respectively. Our results demonstrate that retrotransposons mobilize to protein-coding genes differentially expressed and active in the brain. Thus, somatic genome mosaicism driven by retrotransposition may reshape the genetic circuitry that underpins normal and abnormal neurobiological processes.
Cap analysis of gene expression (CAGE) is a 5′ sequence tag technology to globally determine transcriptional starting sites in the genome and their expression levels and has most recently been adapted to the HeliScope single molecule sequencer. Despite significant simplifications in the CAGE protocol, it has until now been a labour intensive protocol.
In this study we set out to adapt the protocol to a robotic workflow, which would increase throughput and reduce handling. The automated CAGE cDNA preparation system we present here can prepare 96 ‘HeliScope ready’ CAGE cDNA libraries in 8 days, as opposed to 6 weeks by a manual operator.We compare the results obtained using the same RNA in manual libraries and across multiple automation batches to assess reproducibility.
We show that the sequencing was highly reproducible and comparable to manual libraries with an 8 fold increase in productivity. The automated CAGE cDNA preparation system can prepare 96 CAGE sequencing samples simultaneously. Finally we discuss how the system could be used for CAGE on Illumina/SOLiD platforms, RNA-seq and full-length cDNA generation.
The pathway for RNA interference is widespread in metazoans and participates in numerous cellular tasks, from gene silencing to chromatin remodeling and protection against retrotransposition. The unicellular eukaryote Trypanosoma cruzi is missing the canonical RNAi pathway and is unable to induce RNAi-related processes. To further understand alternative RNA pathways operating in this organism, we have performed deep sequencing and genome-wide analyses of a size-fractioned cDNA library (16–61 nt) from the epimastigote life stage. Deep sequencing generated 582,243 short sequences of which 91% could be aligned with the genome sequence. About 95–98% of the aligned data (depending on the haplotype) corresponded to small RNAs derived from tRNAs, rRNAs, snRNAs and snoRNAs. The largest class consisted of tRNA-derived small RNAs which primarily originated from the 3′ end of tRNAs, followed by small RNAs derived from rRNA. The remaining sequences revealed the presence of 92 novel transcribed loci, of which 79 did not show homology to known RNA classes.
Chagas' disease is a major health problem in Latin America and is caused by the protozoan parasite Trypanosoma cruzi. T. cruzi lacks the pathway for RNA interference, which is widespread among eukaryotes, and is therefore unable to induce RNAi-related processes. In many organisms, small RNAs play an important role in regulating gene expression and other cellular processes. In order to understand if other small RNA pathways are operating in this organism, we performed high throughput sequencing and genome-wide analyses of the short transcriptome. We identified an abundance of small RNAs derived from non-coding RNA genes, including transfer RNAs, ribosomal RNAs as well as small nucleolar RNAs and small nuclear RNAs. Certain tRNA types were overrepresented as precursors for small RNAs. Further, we identified 79 novel small non-coding RNAs, not previously reported. We did not identify canonical small RNAs, like microRNAs and small interfering RNAs, and concluded that these do not exist in T. cruzi. This study has provided insights into the short transcriptome of a major human pathogen and provided starting points for further functional investigation of small RNAs and their biological roles.
Human DICER1 protein cleaves double-stranded RNA into small sizes, a crucial step in production of single-stranded RNAs which are mediating factors of cytoplasmic RNA interference. Here, we clearly demonstrate that human DICER1 protein localizes not only to the cytoplasm but also to the nucleoplasm. We also find that human DICER1 protein associates with the NUP153 protein, one component of the nuclear pore complex. This association is detected predominantly in the cytoplasm but is also clearly distinguishable at the nuclear periphery. Additional characterization of the NUP153-DICER1 association suggests NUP153 plays a crucial role in the nuclear localization of the DICER1 protein.
Genetic cases of congenital pituitary hormone deficiency are common and many are caused by transcription factor defects. Mouse models with orthologous mutations are invaluable for uncovering the molecular mechanisms that lead to problems in organ development and typical patient characteristics. We are using mutant mice defective in the transcription factors PROP1 and POU1F1 for gene expression profiling to identify target genes for these critical transcription factors and candidates for cases of pituitary hormone deficiency of unknown etiology. These studies reveal critical roles for Wnt signalling pathways including the TCF/LEF transcription factors and interacting proteins of the groucho family, bone morphogenetic proteins antagonists, and targets of notch signalling. Current studies are investigating roles of novel homeobox genes and pathways that regulate the transition from proliferation to differentiation, cell adhesion and cell migration.
Pituitary adenomas are a common human health problem, yet most cases are sporadic, necessitating alternative approaches to traditional Mendelian genetic studies. Mouse models of adenoma formation offer the opportunity for gene expression profiling during progressive stages of hyperplasia, adenoma and tumorigenesis. This approach holds promise for identification of relevant pathways and candidate genes as risk factors for adenoma formation, understanding mechanisms of progression, and identifying drug targets and clinically relevant biomarkers.
cell proliferation; apoptosis; transcription factors; Prop1; Emx2
Despite recent controversies, the evidence that the majority of the human genome is transcribed into RNA remains strong.
Large-scale sequencing projects have revealed an unexpected complexity in the origins, structures and functions of mammalian transcripts. Many loci are known to produce overlapping coding and non-coding RNAs with capped 5′ ends that vary in size. Methods that identify the 5′ ends of transcripts will facilitate the discovery of novel promoters and 5′ ends derived from secondary capping events. Such methods often require high input amounts of RNA not obtainable from highly refined samples such as tissue microdissections and subcellular fractions. Therefore, we have developed nanoCAGE (Cap Analysis of Gene Expression), a method that captures the 5′ ends of transcripts from as little as 10 nanograms of total RNA and CAGEscan, a mate-pair adaptation of nanoCAGE that captures the transcript 5′ ends linked to a downstream region. Both of these methods allow further annotation-agnostic studies of the complex human transcriptome.
Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3′UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5′ completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.
Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.
Our analysis of human chromosomes 4, 21 and 22 estimates ∼46%, ∼41% and ∼27% of these chromosomes, respectively, as being NTLs. This suggests that on average more than 40% of the human genome can be expected to be highly unlikely to initiate transcription. Our method represents the first one that utilizes high-sensitivity TSS prediction to identify, with high accuracy, large portions of mammalian genomes as NTLs. The server with our algorithm implemented is available at http://cbrc.kaust.edu.sa/ddm/.
The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5′-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP–chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community (http://fantom.gsc.riken.jp/4/). Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release.
Non-coding RNA (ncRNA) transcripts are RNA molecules that do not code for proteins, but elicit function by other mechanisms. The vast majority of RNA produced in a cell is non-coding ribosomal RNA, produced from relatively few loci, however more recently complementary DNA (cDNA) cloning, tag sequencing, and genome tiling array studies suggest that ncRNAs also account for the majority of RNA species produced by a cell. ncRNA based regulation has been referred to as a ‘hidden layer’ of signals or ‘dark matter’ that control gene expression in cellular processes by poorly described mechanisms. These terms have appeared as ncRNAs until recently have been ignored by expression profiling and cDNA annotation projects and their mode of action is diverse (e.g. influencing chromatin structure and epigenetics, translational silencing, transcriptional silencing). Here, we highlight recent functional genomics strategies toward identifying and assigning function to ncRNA transcription.
non-coding RNA; Sequencing; transcription; annotation
We report a catalog of the mouse embryonic pituitary gland transcriptome consisting of five cDNA libraries including wild type tissue from E12.5 and E14.5, Prop1df/df mutant at E14.5, and two cDNA subtractions: E14.5 WT-E14.5 Prop1df/df and E14.5 WT-E12.5 WT. DNA sequence information is assembled into a searchable database with gene ontology terms representing 12,009 expressed genes. We validated coverage of the libraries by detecting most known homeobox gene transcription factor cDNAs. A total of 45 homeobox genes were detected as part of the pituitary transcriptome, representing most expected ones, which validated library coverage, and many novel ones, underscoring the utility of this resource as a discovery tool. We took a similar approach for signaling-pathway members with novel pituitary expression and found 157 genes related to the BMP, FGF, WNT, SHH and NOTCH pathways. These genes are exciting candidates for regulators of pituitary development and function.
Cap trapper; Homeobox gene; Prop1; Gene expression; Ames dwarf
Perturbation and time-course data sets, in combination with computational approaches, can be used to infer transcriptional regulatory networks which ultimately govern the developmental pathways and responses of cells. Here, we individually knocked down the four transcription factors PU.1, IRF8, MYB and SP1 in the human monocyte leukemia THP-1 cell line and profiled the genome-wide transcriptional response of individual transcription starting sites using deep sequencing based Cap Analysis of Gene Expression. From the proximal promoter regions of the responding transcription starting sites, we derived de novo binding-site motifs, characterized their biological function and constructed a network. We found a previously described composite motif for PU.1 and IRF8 that explains the overlapping set of transcriptional responses upon knockdown of either factor.