Identification of functionally relevant differences between induced pluripotent stem cells (iPSC) and reference embryonic stem cells (ESC) remains a central question for therapeutic applications. Differences in gene expression between iPSC and ESC have been examined by microarray and more recently with RNA-SEQ technologies. We here report an in depth analyses of nuclear and cytoplasmic transcriptomes, using the CAGE (cap analysis of gene expression) technology, for 5 iPSC clones derived from mouse lymphocytes B and 3 ESC lines. This approach reveals nuclear transcriptomes significantly more complex in ESC than in iPSC. Hundreds of yet not annotated putative non-coding RNAs and enhancer-associated transcripts specifically transcribed in ESC have been detected and supported with epigenetic and chromatin-chromatin interactions data. We identified super-enhancers transcriptionally active specifically in ESC and associated with genes implicated in the maintenance of pluripotency. Similarly, we detected non-coding transcripts of yet unknown function being regulated by ESC specific super-enhancers. Taken together, these results demonstrate that current protocols of iPSC reprogramming do not trigger activation of numerous cis-regulatory regions. It thus reinforces the need for already suggested deeper monitoring of the non-coding transcriptome when characterizing iPSC clones. Such differences in regulatory transcript expression may indeed impact their potential for clinical applications.
iPS; lncRNAs; non-coding RNA; pluripotency; Stem cells; super-enhancers; transcriptome
Mammalian chromosomes fold into arrays of megabase‐sized topologically associating domains (TADs), which are arranged into compartments spanning multiple megabases of genomic DNA. TADs have internal substructures that are often cell type specific, but their higher‐order organization remains elusive. Here, we investigate TAD higher‐order interactions with Hi‐C through neuronal differentiation and show that they form a hierarchy of domains‐within‐domains (metaTADs) extending across genomic scales up to the range of entire chromosomes. We find that TAD interactions are well captured by tree‐like, hierarchical structures irrespective of cell type. metaTAD tree structures correlate with genetic, epigenomic and expression features, and structural tree rearrangements during differentiation are linked to transcriptional state changes. Using polymer modelling, we demonstrate that hierarchical folding promotes efficient chromatin packaging without the loss of contact specificity, highlighting a role far beyond the simple need for packing efficiency.
chromatin contacts; chromosome architecture; epigenetics; gene expression; polymer modelling; Chromatin, Epigenetics, Genomics & Functional Genomics; Development & Differentiation; Genome-Scale & Integrative Biology
Although it is generally accepted that cellular differentiation requires changes to transcriptional networks, dynamic regulation of promoters and enhancers at specific sets of genes has not been previously studied en masse. Exploiting the fact that active promoters and enhancers are transcribed, we simultaneously measured their activity in 19 human and 14 mouse time courses covering a wide range of cell types and biological stimuli. Enhancer RNAs, then messenger RNAs encoding transcription factors, dominated the earliest responses. Binding sites for key lineage transcription factors were simultaneously overrepresented in enhancers and promoters active in each cellular system. Our data support a highly generalizable model in which enhancer transcription is the earliest event in successive waves of transcriptional change during cellular differentiation or activation.
The capacity for plasticity in the adult brain is limited by the anatomical traces laid down during early postnatal life. Removing certain molecular brakes, such as histone deacetylases (HDACs), has proven to be effective in recapitulating juvenile plasticity in the mature visual cortex (V1). We investigated the chromatin structure and transcriptional control by genome-wide sequencing of DNase I hypersensitive sites (DHSS) and cap analysis of gene expression (CAGE) libraries after HDAC inhibition by valproic acid (VPA) in adult V1.
We found that VPA reliably reactivates the critical period plasticity and induces a dramatic change of chromatin organization in V1 yielding significantly greater accessibility distant from promoters, including at enhancer regions. VPA also induces nucleosome eviction specifically from retrotransposon (in particular SINE) elements. The transiently accessible SINE elements overlap with transcription factor-binding sites of the Fox family. Mapping of transcription start site activity using CAGE revealed transcription of epigenetic and neural plasticity-regulating genes following VPA treatment, which may help to re-program the genomic landscape and reactivate plasticity in the adult cortex.
Treatment with HDAC inhibitors increases accessibility to enhancers and repetitive elements underlying brain-specific gene expression and reactivation of visual cortical plasticity.
Electronic supplementary material
The online version of this article (doi:10.1186/s13072-015-0043-3) contains supplementary material, which is available to authorized users.
Visual cortex plasticity; DHSS; Chromatin; Retrotransposon elements; HDAC inhibitors; Enhancers
Understanding how cells use complex transcriptional programs to alter their fate in response to specific stimuli is an important question in biology. For the MCF-7 human breast cancer cell line, we applied gene expression trajectory models to identify the genes involved in driving cell fate transitions. We modified trajectory models to account for the scenario where cells were exposed to different stimuli, in this case epidermal growth factor and heregulin, to arrive at different cell fates, i.e. proliferation and differentiation respectively. Using genome-wide CAGE time series data collected from the FANTOM5 consortium, we identified the sets of promoters that were involved in the transition of MCF-7 cells to their specific fates versus those with expression changes that were generic to both stimuli. Of the 1,552 promoters identified, 1,091 had stimulus-specific expression while 461 promoters had generic expression profiles over the time course surveyed. Many of these stimulus-specific promoters mapped to key regulators of the ERK (extracellular signal-regulated kinases) signaling pathway such as FHL2 (four and a half LIM domains 2). We observed that in general, generic promoters peaked in their expression early on in the time course, while stimulus-specific promoters tended to show activation of their expression at a later stage. The genes that mapped to stimulus-specific promoters were enriched for pathways that control focal adhesion, p53 signaling and MAPK signaling while generic promoters were enriched for cell death, transcription and the cell cycle. We identified 162 genes that were controlled by an alternative promoter during the time course where a subset of 37 genes had separate promoters that were classified as stimulus-specific and generic. The results of our study highlighted the degree of complexity involved in regulating a cell fate transition where multiple promoters mapping to the same gene can demonstrate quite divergent expression profiles.
Mammals are composed of hundreds of different cell types with specialized functions. Each of these cellular phenotypes are controlled by different combinations of transcription factors. Using a human non islet cell insulinoma cell line (TC-YIK) which expresses insulin and the majority of known pancreatic beta cell specific genes as an example, we describe a general approach to identify key cell-type-specific transcription factors (TFs) and their direct and indirect targets. By ranking all human TFs by their level of enriched expression in TC-YIK relative to a broad collection of samples (FANTOM5), we confirmed known key regulators of pancreatic function and development. Systematic siRNA mediated perturbation of these TFs followed by qRT-PCR revealed their interconnections with NEUROD1 at the top of the regulation hierarchy and its depletion drastically reducing insulin levels. For 15 of the TF knock-downs (KD), we then used Cap Analysis of Gene Expression (CAGE) to identify thousands of their targets genome-wide (KD-CAGE). The data confirm NEUROD1 as a key positive regulator in the transcriptional regulatory network (TRN), and ISL1, and PROX1 as antagonists. As a complimentary approach we used ChIP-seq on four of these factors to identify NEUROD1, LMX1A, PAX6, and RFX6 binding sites in the human genome. Examining the overlap between genes perturbed in the KD-CAGE experiments and genes with a ChIP-seq peak within 50 kb of their promoter, we identified direct transcriptional targets of these TFs. Integration of KD-CAGE and ChIP-seq data shows that both NEUROD1 and LMX1A work as the main transcriptional activators. In the core TRN (i.e., TF-TF only), NEUROD1 directly transcriptionally activates the pancreatic TFs HSF4, INSM1, MLXIPL, MYT1, NKX6-3, ONECUT2, PAX4, PROX1, RFX6, ST18, DACH1, and SHOX2, while LMX1A directly transcriptionally activates DACH1, SHOX2, PAX6, and PDX1. Analysis of these complementary datasets suggests the need for caution in interpreting ChIP-seq datasets. (1) A large fraction of binding sites are at distal enhancer sites and cannot be directly associated to their targets, without chromatin conformation data. (2) Many peaks may be non-functional: even when there is a peak at a promoter, the expression of the gene may not be affected in the matching perturbation experiment.
ChIP-seq; transcriptional regulatory network; perturbation; pancreas; CAGE; FANTOM5
The transporter of phosphatidylcholine Mdr2/MDR3 not only plays an essential role for bile formation but also is involved in the maintenance of lipid homeostasis. Deficiency of Mdr2 leads to accumulation of ROS, cell transformation and susceptibility to intestinal carcinogenesis.
Multidrug resistance 2 (Mdr2), also called adenosine triphosphate-binding cassette B4 (ABCB4), is the transporter of phosphatidylcholine (PC) at the canalicular membrane of mouse hepatocytes, which plays an essential role for bile formation. Mutations in human homologue MDR3 are associated with several liver diseases. Knockout of Mdr2 results in hepatic inflammation, liver fibrosis and hepatocellular carcinoma (HCC). Whereas the pathogenesis in Mdr2
−/− mice has been largely attributed to the toxicity of bile acids due to the absence of PC in the bile, the question of whether Mdr2 deficiency per se perturbs biological functions in the cell has been poorly addressed. As Mdr2 is expressed in many cell types, we used mouse embryonic fibroblasts (MEF) derived from Mdr2
−/− embryos to show that deficiency of Mdr2 increases reactive oxygen species accumulation, lipid peroxidation and DNA damage. We found that Mdr2
−/− MEFs undergo spontaneous transformation and that Mdr2
−/− mice are more susceptible to chemical carcinogen-induced intestinal tumorigenesis. Microarray analysis in Mdr2−/− MEFs and cap analysis of gene expression in Mdr2
−/− HCCs revealed extensively deregulated genes involved in oxidation reduction, fatty acid metabolism and lipid biosynthesis. Our findings imply a close link between Mdr2
−/−-associated tumorigenesis and perturbation of these biological processes and suggest potential extrahepatic functions of Mdr2/MDR3.
We have performed cap-analysis gene expression (CAGE) sequencing to identify the regulatory networks that orchestrate genome-wide transcription in human papillomavirus type 16 (HPV16)-positive cervical cell lines of different grades: W12E, SiHa, and CaSki. Additionally, a cervical intraepithelial neoplasia grade 1 (CIN1) lesion was assessed for identifying the transcriptome expression profile. Here we have precisely identified a novel antisense noncoding viral transcript in HPV16. In conclusion, CAGE sequencing should pave the way for understanding a diversity of viral transcript expression.
Big leaps in science happen when scientists from different backgrounds interact. In the past 15 years, the FANTOM Consortium has brought together scientists from different fields to analyze and interpret genomic data produced with novel technologies, including mouse full-length cDNAs and, more recently, expression profiling at single-nucleotide resolution by cap-analysis gene expression. The FANTOM Consortium has provided the most comprehensive mouse cDNA collection for functional studies and extensive maps of the human and mouse transcriptome comprising promoters, enhancers, as well as the network of their regulatory interactions. More importantly, serendipitous observations of the FANTOM dataset led us to realize that the mammalian genome is pervasively transcribed, even from retrotransposon elements, which were previously considered junk DNA. The majority of products from the mammalian genome are long non-coding RNAs (lncRNAs), including sense-antisense, intergenic, and enhancer RNAs. While the biological function has been elucidated for some lncRNAs, more than 98 % of them remain without a known function. We argue that large-scale studies are urgently needed to address the functional role of lncRNAs.
The analysis of CAGE (Cap Analysis of Gene Expression) time-course has been proposed
by the FANTOM5 Consortium to extend the understanding of the sequence of events
facilitating cell state transition at the level of promoter regulation. To identify
the most prominent transcriptional regulations induced by growth factors in human
breast cancer, we apply here the Complexity Invariant Dynamic Time Warping motif
EnRichment (CIDER) analysis approach to the CAGE time-course datasets of MCF-7 cells
stimulated by epidermal growth factor (EGF) or heregulin (HRG). We identify a
multi-level cascade of regulations rooted by the Serum Response Factor (SRF)
transcription factor, connecting the MAPK-mediated transduction of the HRG stimulus
to the negative regulation of the MAPK pathway by the members of the DUSP family
phosphatases. The finding confirms the known primary role of FOS and FOSL1, members
of AP-1 family, in shaping gene expression in response to HRG induction. Moreover,
we identify a new potential regulation of DUSP5 and RARA (known to antagonize the
transcriptional regulation induced by the estrogen receptors) by the activity of the
AP-1 complex, specific to HRG response. The results indicate that a divergence in
AP-1 regulation determines cellular changes of breast cancer cells stimulated by
Understanding the normal state of human tissue transcriptome profiles is essential for recognizing tissue disease states and identifying disease markers. Recently, the Human Protein Atlas and the FANTOM5 consortium have each published extensive transcriptome data for human samples using Illumina-sequenced RNA-Seq and Heliscope-sequenced CAGE. Here, we report on the first large-scale complex tissue transcriptome comparison between full-length versus 5′-capped mRNA sequencing data. Overall gene expression correlation was high between the 22 corresponding tissues analyzed (R > 0.8). For genes ubiquitously expressed across all tissues, the two data sets showed high genome-wide correlation (91% agreement), with differences observed for a small number of individual genes indicating the need to update their gene models. Among the identified single-tissue enriched genes, up to 75% showed consensus of 7-fold enrichment in the same tissue in both methods, while another 17% exhibited multiple tissue enrichment and/or high expression variety in the other data set, likely dependent on the cell type proportions included in each tissue sample. Our results show that RNA-Seq and CAGE tissue transcriptome data sets are highly complementary for improving gene model annotations and highlight biological complexities within tissue transcriptomes. Furthermore, integration with image-based protein expression data is highly advantageous for understanding expression specificities for many genes.
Classically or alternatively activated macrophages (M1 and M2, respectively) play distinct and important roles for microbiocidal activity, regulation of inflammation and tissue homeostasis. Despite this, their transcriptional regulatory dynamics are poorly understood. Using promoter-level expression profiling by non-biased deepCAGE we have studied the transcriptional dynamics of classically and alternatively activated macrophages. Transcription factor (TF) binding motif activity analysis revealed four motifs, NFKB1_REL_RELA, IRF1,2, IRF7 and TBP that are commonly activated but have distinct activity dynamics in M1 and M2 activation. We observe matching changes in the expression profiles of the corresponding TFs and show that only a restricted set of TFs change expression. There is an overall drastic and transient up-regulation in M1 and a weaker and more sustainable up-regulation in M2. Novel TFs, such as Thap6, Maff, (M1) and Hivep1, Nfil3, Prdm1, (M2) among others, were suggested to be involved in the activation processes. Additionally, 52 (M1) and 67 (M2) novel differentially expressed genes and, for the first time, several differentially expressed long non-coding RNA (lncRNA) transcriptome markers were identified. In conclusion, the finding of novel motifs, TFs and protein-coding and lncRNA genes is an important step forward to fully understand the transcriptional machinery of macrophage activation.
The HSA21 encoded Single-minded 2 (SIM2) transcription factor has key neurological functions and is a good candidate to be involved in the cognitive impairment of Down syndrome. We aimed to explore the functional capacity of SIM2 by mapping its DNA binding sites in mouse embryonic stem cells. ChIP-sequencing revealed 1229 high-confidence SIM2-binding sites. Analysis of the SIM2 target genes confirmed the importance of SIM2 in developmental and neuronal processes and indicated that SIM2 may be a master transcription regulator. Indeed, SIM2 DNA binding sites share sequence specificity and overlapping domains of occupancy with master transcription factors such as SOX2, OCT4 (Pou5f1), NANOG or KLF4. The association between SIM2 and these pioneer factors is supported by co-immunoprecipitation of SIM2 with SOX2, OCT4, NANOG or KLF4. Furthermore, the binding of SIM2 marks a particular sub-category of enhancers known as super-enhancers. These regions are characterized by typical DNA modifications and Mediator co-occupancy (MED1 and MED12). Altogether, we provide evidence that SIM2 binds a specific set of enhancer elements thus explaining how SIM2 can regulate its gene network in neuronal features.
The immediate-early response mediates cell fate in response to a variety of extracellular stimuli and is dysregulated in many cancers. However, the specificity of the response across stimuli and cell types, and the roles of non-coding RNAs are not well understood. Using a large collection of densely-sampled time series expression data we have examined the induction of the immediate-early response in unparalleled detail, across cell types and stimuli. We exploit cap analysis of gene expression (CAGE) time series datasets to directly measure promoter activities over time. Using a novel analysis method for time series data we identify transcripts with expression patterns that closely resemble the dynamics of known immediate-early genes (IEGs) and this enables a comprehensive comparative study of these genes and their chromatin state. Surprisingly, these data suggest that the earliest transcriptional responses often involve promoters generating non-coding RNAs, many of which are produced in advance of canonical protein-coding IEGs. IEGs are known to be capable of induction without de novo protein synthesis. Consistent with this, we find that the response of both protein-coding and non-coding RNA IEGs can be explained by their transcriptionally poised, permissive chromatin state prior to stimulation. We also explore the function of non-coding RNAs in the attenuation of the immediate early response in a small RNA sequencing dataset matched to the CAGE data: We identify a novel set of microRNAs responsible for the attenuation of the IEG response in an estrogen receptor positive cancer cell line. Our computational statistical method is well suited to meta-analyses as there is no requirement for transcripts to pass thresholds for significant differential expression between time points, and it is agnostic to the number of time points per dataset.
Cells respond to stimuli through a set of genes that are primed for rapid activation. These genes, known as immediate-early genes (IEGs), are regulated at the level of transcription of the messenger RNA, and at subsequent RNA processing levels. These rapid responders are then rapidly switched off in normal cells. Immediate-early genes are involved in many cellular processes, including differentiation and proliferation, that are often dysregulated in cancer where they become continuously active. We characterise IEGs in a genome-wide sequencing dataset that captures their transcriptional response over time. Using a novel analysis technique, we identify both protein-coding and non-coding genes that are activated comparably to IEGs and investigate their properties. We examine how IEGs are switched off, including through microRNAs, small non-coding RNAs that act to control the level of key IEGs. We identify a novel set of microRNAs responsible for the attenuation of the IEG response in an estrogen receptor positive cancer cell line.
Animal transcriptomes are dynamic, each cell type, tissue and organ system expressing an ensemble of transcript isoforms that give rise to substantial diversity. We identified new genes, transcripts, and proteins using poly(A)+ RNA sequence from Drosophila melanogaster cultured cell lines, dissected organ systems, and environmental perturbations. We found a small set of mostly neural-specific genes has the potential to encode thousands of transcripts each through extensive alternative promoter usage and RNA splicing. The magnitudes of splicing changes are larger between tissues than between developmental stages, and most sex-specific splicing is gonad-specific. Gonads express hundreds of previously unknown coding and long noncoding RNAs (lncRNAs) some of which are antisense to protein-coding genes and produce short regulatory RNAs. Furthermore, previously identified pervasive intergenic transcription occurs primarily within newly identified introns. The fly transcriptome is substantially more complex than previously recognized arising from combinatorial usage of promoters, splice sites, and polyadenylation sites.
Analysis of the myeloid transcriptome by integrating 91 samples from the myeloid lineage and AML cell lines to predict novel regulatory interactions, enhancers, miRNAs, and lincRNAs.
The generation of myeloid cells from their progenitors is regulated at the level of transcription by combinatorial control of key transcription factors influencing cell-fate choice. To unravel the global dynamics of this process at the transcript level, we generated transcription profiles for 91 human cell types of myeloid origin by use of CAGE profiling. The CAGE sequencing of these samples has allowed us to investigate diverse aspects of transcription control during myelopoiesis, such as identification of novel transcription factors, miRNAs, and noncoding RNAs specific to the myeloid lineage. We further reconstructed a transcription regulatory network by clustering coexpressed transcripts and associating them with enriched cis-regulatory motifs. With the use of the bidirectional expression as a proxy for enhancers, we predicted over 2000 novel enhancers, including an enhancer 38 kb downstream of IRF8 and an intronic enhancer in the KIT gene locus. Finally, we highlighted relevance of these data to dissect transcription dynamics during progressive maturation of granulocyte precursors. A multifaceted analysis of the myeloid transcriptome is made available (www.myeloidome.roslin.ed.ac.uk). This high-quality dataset provides a powerful resource to study transcriptional regulation during myelopoiesis and to infer the likely functions of unannotated genes in human innate immunity.
transcriptome; CAGE; hematopoiesis
Cap analysis of gene expression (CAGE) is a high-throughput method for transcriptome analysis that provides a single base-pair resolution map of transcription start sites (TSS) and their relative usage. Despite their high resolution and functional significance, published CAGE data are still underused in promoter analysis due to the absence of tools that enable its efficient manipulation and integration with other genome data types. Here we present CAGEr, an R implementation of novel methods for the analysis of differential TSS usage and promoter dynamics, integrated with CAGE data processing and promoterome mining into a first comprehensive CAGE toolbox on a common analysis platform. Crucially, we provide collections of TSSs derived from most published CAGE datasets, as well as direct access to FANTOM5 resource of TSSs for numerous human and mouse cell/tissue types from within R, greatly increasing the accessibility of precise context-specific TSS data for integrative analyses. The CAGEr package is freely available from Bioconductor at http://www.bioconductor.org/packages/release/bioc/html/CAGEr.html.
The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0560-6) contains supplementary material, which is available to authorized users.
MicroRNAs are small non-coding RNAs that inhibit the translation of target mRNAs. In humans, most microRNAs are transcribed by RNA polymerase II as long primary transcripts and processed by sequential cleavage of the two RNase III enzymes, DROSHA and DICER, into precursor and mature microRNAs, respectively. Although the fundamental functions of microRNAs in RNA silencing have been gradually uncovered, less is known about the regulatory mechanisms of microRNA expression. Here, we report that telomerase reverse transcriptase (TERT) extensively affects the expression levels of mature microRNAs. Deep sequencing-based screens of short RNA populations revealed that the suppression of TERT resulted in the downregulation of microRNAs expressed in THP-1 cells and HeLa cells. Primary and precursor microRNA levels were also reduced under the suppression of TERT. Similar results were obtained with the suppression of either BRG1 (also called SMARCA4) or nucleostemin, which are proteins interacting with TERT and functioning beyond telomeres. These results suggest that TERT regulates microRNAs at the very early phases in their biogenesis, presumably through non-telomerase mechanism(s).
telomerase reverse transcriptase; microRNA; RNA-dependent RNA polymerase; cancer
Antisense (AS) transcripts are RNA molecules that are transcribed from the opposite strand to sense (S) genes forming S/AS pairs. The most prominent configuration is when a lncRNA is antisense to a protein coding gene. Increasing evidences prove that antisense transcription may control sense gene expression acting at distinct regulatory levels. However, its contribution to brain function and neurodegenerative diseases remains unclear. We have recently identified AS Uchl1 as an antisense to the mouse Ubiquitin carboxy-terminal hydrolase L1 (Uchl1) gene (AS Uchl1), the synthenic locus of UCHL1/PARK5. This is mutated in rare cases of early-onset familial Parkinson's Disease (PD) and loss of UCHL1 activity has been reported in many neurodegenerative diseases. Importantly, manipulation of UchL1 expression has been proposed as tool for therapeutic intervention. AS Uchl1 induces UchL1 expression by increasing its translation. It is the representative member of SINEUPs (SINEB2 sequence to UP-regulate translation), a new functional class of natural antisense lncRNAs that activate translation of their sense genes. Here we take advantage of FANTOM5 dataset to identify the transcription start sites associated to S/AS pair at Uchl1 locus. We show that AS Uchl1 expression is under the regulation of Nurr1, a major transcription factor involved in dopaminergic cells' differentiation and maintenance. Furthermore, AS Uch1 RNA levels are strongly down-regulated in neurochemical models of PD in vitro and in vivo. This work positions AS Uchl1 RNA as a component of Nurr1-dependent gene network and target of cellular stress extending our understanding on the role of antisense transcription in the brain.
antisense transcription; long non-coding RNA; Parkinson's disease; Nurr1; dopaminergic cells
Despite recent efforts in discovering novel long non-coding RNAs (lncRNAs) and unveiling their functions in a wide range of biological processes their applications as biotechnological or therapeutic tools are still at their infancy. We have recently shown that AS Uchl1, a natural lncRNA antisense to the Parkinson's disease-associated gene Ubiquitin carboxyl-terminal esterase L1 (Uchl1), is able to increase UchL1 protein synthesis at post-transcriptional level. Its activity requires two RNA elements: an embedded inverted SINEB2 sequence to increase translation and the overlapping region to target its sense mRNA. This functional organization is shared with several mouse lncRNAs antisense to protein coding genes. The potential use of AS Uchl1-derived lncRNAs as enhancers of target mRNA translation remains unexplored. Here we define AS Uchl1 as the representative member of a new functional class of natural and synthetic antisense lncRNAs that activate translation. We named this class of RNAs SINEUPs for their requirement of the inverted SINEB2 sequence to UP-regulate translation in a gene-specific manner. The overlapping region is indicated as the Binding Doman (BD) while the embedded inverted SINEB2 element is the Effector Domain (ED). By swapping BD, synthetic SINEUPs are designed targeting mRNAs of interest. SINEUPs function in an array of cell lines and can be efficiently directed toward N-terminally tagged proteins. Their biological activity is retained in a miniaturized version within the range of small RNAs length. Its modular structure was exploited to successfully design synthetic SINEUPs targeting endogenous Parkinson's disease-associated DJ-1 and proved to be active in different neuronal cell lines. In summary, SINEUPs represent the first scalable tool to increase synthesis of proteins of interest. We propose SINEUPs as reagents for molecular biology experiments, in protein manufacturing as well as in therapy of haploinsufficiencies.
SINEUP; long non-coding RNA; antisense; protein expression; cell lines
Mutations in three functionally diverse genes cause Rett Syndrome. Although the functions of Forkhead box G1 (FOXG1), Methyl CpG binding protein 2 (MECP2) and Cyclin-dependent kinase-like 5 (CDKL5) have been studied individually, not much is known about their relation to each other with respect to expression levels and regulatory regions. Here we analyzed data from hundreds of mouse and human samples included in the FANTOM5 project, to identify transcript initiation sites, expression levels, expression correlations and regulatory regions of the three genes.
Our investigations reveal the predominantly used transcription start sites (TSSs) for each gene including novel transcription start sites for FOXG1. We show that FOXG1 expression is poorly correlated with the expression of MECP2 and CDKL5. We identify promoter shapes for each TSS, the predicted location of enhancers for each gene and the common transcription factors likely to regulate the three genes. Our data imply Polycomb Repressive Complex 2 (PRC2) mediated silencing of Foxg1 in cerebellum.
Our analyses provide a comprehensive picture of the regulatory regions of the three genes involved in Rett Syndrome.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1177) contains supplementary material, which is available to authorized users.
Rett Syndrome; CAGE; Transcriptomics; Promoter architecture
The mesencephalic dopaminergic (mDA) cell system is composed of two major groups of projecting cells in the Substantia Nigra (SN) (A9 neurons) and the Ventral Tegmental Area (VTA) (A10 cells). Selective degeneration of A9 neurons occurs in Parkinson’s disease (PD) while abnormal function of A10 cells has been linked to schizophrenia, attention deficit and addiction. The molecular basis that underlies selective vulnerability of A9 and A10 neurons is presently unknown.
By taking advantage of transgenic labeling, laser capture microdissection coupled to nano Cap-Analysis of Gene Expression (nanoCAGE) technology on isolated A9 and A10 cells, we found that a subset of Olfactory Receptors (OR)s is expressed in mDA neurons. Gene expression analysis was integrated with the FANTOM5 Helicos CAGE sequencing datasets, showing the presence of these ORs in selected tissues and brain areas outside of the olfactory epithelium. OR expression in the mesencephalon was validated by RT-PCR and in situ hybridization. By screening 16 potential ligands on 5 mDA ORs recombinantly expressed in an heterologous in vitro system, we identified carvone enantiomers as agonists at Olfr287 and able to evoke an intracellular Ca2+ increase in solitary mDA neurons. ORs were found expressed in human SN and down-regulated in PD post mortem brains.
Our study indicates that mDA neurons express ORs and respond to odor-like molecules providing new opportunities for pharmacological intervention in disease.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-729) contains supplementary material, which is available to authorized users.
NanoCAGE; Odors; Odorant receptors; Dopaminergic neurons; Ventral midbrain
Most RNA molecules are co- or post-transcriptionally modified to alter their chemical and functional properties to assist in their ultimate biological function. Among these modifications, the addition of 5′ cap structure has been found to regulate turnover and localization. Here we report a study of the cap structure of human short (<200 nt) RNAs (sRNAs), using sequencing of cDNA libraries prepared by enzymatic pretreatment of the sRNAs with cap sensitive-specificity, thin layer chromatographic (TLC) analyses of isolated cap structures and mass spectrometric analyses for validation of TLC analyses. Processed versions of snoRNAs and tRNAs sequences of less than 50 nt were observed in capped sRNA libraries, indicating additional processing and recapping of these annotated sRNAs biotypes. We report for the first time 2,7 dimethylguanosine in human sRNAs cap structures and surprisingly we find multiple type 0 cap structures (mGpppC, 7mGpppG, GpppG, GpppA, and 7mGpppA) in RNA length fractions shorter than 50 nt. Finally, we find the presence of additional uncharacterized cap structures that wait determination by the creation of needed reference compounds to be used in TLC analyses. These studies suggest the existence of novel biochemical pathways leading to the processing of primary and sRNAs and the modifications of their RNA 5′ ends with a spectrum of chemical modifications.