Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (∼30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non–TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ∼30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ∼35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
An unexpected layer of complexity in the genomes of humans and other vertebrates lies in the abundance of genes that do not appear to encode proteins but produce a variety of non-coding RNAs. In particular, the human genome is currently predicted to contain 5,000–10,000 independent gene units generating long (>200 nucleotides) noncoding RNAs (lncRNAs). While there is growing evidence that a large fraction of these lncRNAs have cellular functions, notably to regulate protein-coding gene expression, almost nothing is known on the processes underlying the evolutionary origins and diversification of lncRNA genes. Here we show that transposable elements, through their capacity to move and spread in genomes in a lineage-specific fashion, as well as their ability to introduce regulatory sequences upon chromosomal insertion, represent a major force shaping the lncRNA repertoire of humans, mice, and zebrafish. Not only do TEs make up a substantial fraction of mature lncRNA transcripts, they are also enriched in the vicinity of lncRNA genes, where they frequently contribute to their transcriptional regulation. Through specific examples we provide evidence that some TE sequences embedded in lncRNAs are critical for the biogenesis of lncRNAs and likely important for their function.
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases.
Long non-coding RNAs (lncRNAs) comprise a novel, fascinating class of RNAs with largely unknown biological functions. Parkinson's-disease (PD) is the most frequent motor disorder, and Deep-brain-stimulation (DBS) treatment alleviates the symptoms, but early disease biomarkers are still unknown and new future genetic interference targets are urgently needed. Using RNA-sequencing technology and a novel computational workflow for in-depth exploration of whole-transcriptome RNA-seq datasets, we detected and analyzed lncRNAs in sequenced libraries from PD patients' leukocytes pre and post-treatment and the brain, adding this full profile resource of over 7,000 lncRNAs to the few human tissues-derived lncRNA datasets that are currently available. Our study includes sample-specific database construction, detecting disease-derived changes in known and novel lncRNAs, exons and junctions and predicting corresponding changes in Polyadenylation choices, protein domains and miRNA binding sites. We report widespread transcript structure variations at the splice junction and exons levels, including novel exons and junctions and alteration of lncRNAs followed by experimental validation in PD leukocytes and two PD brain regions compared with controls. Our results suggest lncRNAs involvement in neurodegenerative diseases, and specifically PD. This comprehensive workflow will be of use to the increasing number of laboratories producing RNA-Seq data in a wide range of biomedical studies.
Long non-coding RNAs (lncRNAs) are transcripts that are 200 bp or longer, do not encode proteins, and potentially play important roles in eukaryotic gene regulation. However, the number, characteristics and expression inheritance pattern of lncRNAs in maize are still largely unknown.
By exploiting available public EST databases, maize whole genome sequence annotation and RNA-seq datasets from 30 different experiments, we identified 20,163 putative lncRNAs. Of these lncRNAs, more than 90% are predicted to be the precursors of small RNAs, while 1,704 are considered to be high-confidence lncRNAs. High confidence lncRNAs have an average transcript length of 463 bp and genes encoding them contain fewer exons than annotated genes. By analyzing the expression pattern of these lncRNAs in 13 distinct tissues and 105 maize recombinant inbred lines, we show that more than 50% of the high confidence lncRNAs are expressed in a tissue-specific manner, a result that is supported by epigenetic marks. Intriguingly, the inheritance of lncRNA expression patterns in 105 recombinant inbred lines reveals apparent transgressive segregation, and maize lncRNAs are less affected by cis- than by trans-genetic factors.
We integrate all available transcriptomic datasets to identify a comprehensive set of maize lncRNAs, provide a unique annotation resource of the maize genome and a genome-wide characterization of maize lncRNAs, and explore the genetic control of their expression using expression quantitative trait locus mapping.
Eukaryotic genomes generate a heterogeneous ensemble of mRNAs and long noncoding RNAs (lncRNAs). LncRNAs and mRNAs are both transcribed by Pol II and acquire 5′ caps and poly(A) tails, but only mRNAs are translated into proteins. To address how these classes are distinguished, we identified the transcriptome-wide targets of 13 RNA processing, export, and turnover factors in budding yeast. Comparing the maturation pathways of mRNAs and lncRNAs revealed that transcript fate is largely determined during 3′ end formation. Most lncRNAs are targeted for nuclear RNA surveillance, but a subset with 3′ cleavage and polyadenylation features resembling the mRNA consensus can be exported to the cytoplasm. The Hrp1 and Nab2 proteins act at this decision point, with dual roles in mRNA cleavage/polyadenylation and lncRNA surveillance. Our data also reveal the dynamic and heterogeneous nature of mRNA maturation, and highlight a subset of “lncRNA-like” mRNAs regulated by the nuclear surveillance machinery.
•Transcriptome-wide analysis shows dynamic assembly of ribonucleoprotein particles•LncRNA and mRNA subclasses undergo distinct maturation and turnover pathways•Transcript fate is determined during 3′ end formation•Transcript classes overlap, with many “mRNA-like” lncRNAs and “lncRNA-like” mRNAs
A transcriptome-wide analysis shows that different classes of mRNAs and lncRNAs are characterized by distinct 3′ end formation and RNP complexes, explaining how cells distinguish among these otherwise similar RNAs.
Long non-coding RNAs (lncRNAs) represent a class of riboregulators that either directly act in long form or are processed to shorter miRNAs and siRNAs. Emerging evidence shows that lncRNAs participate in stress responsive regulation. In this study, to identify the putative maize lncRNAs responsive to drought stress, 8449 drought responsive transcripts were first uploaded to the Coding Potential Calculator website for classification as protein coding or non-coding RNAs, and 1724 RNAs were identified as potential non-coding RNAs. A Perl script was written to screen these 1724 ncRNAs and 664 transcripts were ultimately identified as drought-responsive lncRNAs. Of these 664 transcripts, 126 drought-responsive lncRNAs were highly similar to known maize lncRNAs; the remaining 538 transcripts were considered as novel lncRNAs. Among the 664 lncRNAs identified as drought responsive, 567 were upregulated and 97 were downregulated in drought-stressed leaves of maize. 8 lncRNAs were identified as miRNA precursor lncRNAs, 62 were classified as both shRNA and siRNA precursors, and 279 were classified as siRNA precursors. The remaining 315 lncRNAs were classified as other lncRNAs that are likely to function as longer molecules. Among these 315 lncRNAs, 10 are identified as antisense lncRNAs and 7 could pair with 17 CDS sequences with near-perfect matches. Finally, RT-qPCR results confirmed that all selected lncRNAs could respond to drought stress. These findings extend the current view on lncRNAs as ubiquitous regulators under stress conditions.
Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.
We present a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan. We applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs we detected by differential expression analysis. We identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.
Our method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.
Long noncoding RNAs (lncRNAs) are a recently discovered class of non-protein coding RNAs, which have now increasingly been shown to be involved in a wide variety of biological processes as regulatory molecules. The functional role of many of the members of this class has been an enigma, except a few of them like Malat and HOTAIR. Little is known regarding the regulatory interactions between noncoding RNA classes. Recent reports have suggested that lncRNAs could potentially interact with other classes of non-coding RNAs including microRNAs (miRNAs) and modulate their regulatory role through interactions. We hypothesized that lncRNAs could participate as a layer of regulatory interactions with miRNAs. The availability of genome-scale datasets for Argonaute targets across human transcriptome has prompted us to reconstruct a genome-scale network of interactions between miRNAs and lncRNAs.
We used well characterized experimental Photoactivatable-Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation (PAR-CLIP) datasets and the recent genome-wide annotations for lncRNAs in public domain to construct a comprehensive transcriptome-wide map of miRNA regulatory elements. Comparative analysis revealed that in addition to targeting protein-coding transcripts, miRNAs could also potentially target lncRNAs, thus participating in a novel layer of regulatory interactions between noncoding RNA classes. Furthermore, we have modeled one example of miRNA-lncRNA interaction using a zebrafish model. We have also found that the miRNA regulatory elements have a positional preference, clustering towards the mid regions and 3′ ends of the long noncoding transcripts. We also further reconstruct a genome-wide map of miRNA interactions with lncRNAs as well as messenger RNAs.
This analysis suggests widespread regulatory interactions between noncoding RNAs classes and suggests a novel functional role for lncRNAs. We also present the first transcriptome scale study on miRNA-lncRNA interactions and the first report of a genome-scale reconstruction of a noncoding RNA regulatory interactome involving lncRNAs.
In the last years it has become increasingly clear that the mammalian transcriptome is highly complex and includes a large number of small non-coding RNAs (sncRNAs) and long noncoding RNAs (lncRNAs). Here we review the biogenesis pathways of the three classes of sncRNAs, namely short interfering RNAs (siRNAs), microRNAs (miRNAs) and PIWI-interacting RNAs (piRNAs). These ncRNAs have been extensively studied and are involved in pathways leading to specific gene silencing and the protection of genomes against virus and transposons, for example. Also, lncRNAs have emerged as pivotal molecules for the transcriptional and post-transcriptional regulation of gene expression which is supported by their tissue-specific expression patterns, subcellular distribution, and developmental regulation. Therefore, we also focus our attention on their role in differentiation and development. SncRNAs and lncRNAs play critical roles in defining DNA methylation patterns, as well as chromatin remodeling thus having a substantial effect in epigenetics. The identification of some overlaps in their biogenesis pathways and functional roles raises the hypothesis that these molecules play concerted functions in vivo, creating complex regulatory networks where cooperation with regulatory proteins is necessary. We also highlighted the implications of biogenesis and gene expression deregulation of sncRNAs and lncRNAs in human diseases like cancer.
sncRNAs; lncRNAs; miRNAs; siRNAs; piRNAs; gene expression regulation; epigenetic regulation
Long non-coding RNAs (lncRNAs) as a key group of non-coding RNAs have gained widely attention. Though lncRNAs have been functionally annotated and systematic explored in higher mammals, few are under systematical identification and annotation. Owing to the expression specificity, known lncRNAs expressed in embryonic brain tissues remain still limited. Considering a large number of lncRNAs are only transcribed in brain tissues, studies of lncRNAs in developmental brain are therefore of special interest. Here, publicly available RNA-sequencing (RNA-seq) data in embryonic brain are integrated to identify thousands of embryonic brain lncRNAs by a customized pipeline. A significant proportion of novel transcripts have not been annotated by available genomic resources. The putative embryonic brain lncRNAs are shorter in length, less spliced and show less conservation than known genes. The expression of putative lncRNAs is in one tenth on average of known coding genes, while comparable with known lncRNAs. From chromatin data, putative embryonic brain lncRNAs are associated with active chromatin marks, comparable with known lncRNAs. Embryonic brain expressed lncRNAs are also indicated to have expression though not evident in adult brain. Gene Ontology analysis of putative embryonic brain lncRNAs suggests that they are associated with brain development. The putative lncRNAs are shown to be related to possible cis-regulatory roles in imprinting even themselves are deemed to be imprinted lncRNAs. Re-analysis of one knockdown data suggests that four regulators are associated with lncRNAs. Taken together, the identification and systematic analysis of putative lncRNAs would provide novel insights into uncharacterized mouse non-coding regions and the relationships with mammalian embryonic brain development.
Recent large-scale transcriptome analyses have revealed that transcription is spread throughout the mammalian genomes, yielding large numbers of transcripts, including long non-coding RNAs (lncRNAs) with little or no protein-coding capacity. Dozens of lncRNAs have been identified as biologically significant. In many cases, lncRNAs act as key molecules in the regulation of processes such as chromatin remodeling, transcription, and post-transcriptional processing. Several lncRNAs (e.g., MALAT1, HOTAIR, and ANRIL) are associated with human diseases, including cancer. Those lncRNAs associated with cancer are often aberrantly expressed. Although the underlying molecular mechanisms by which lncRNAs regulate cancer development are unclear, recent studies have revealed that such aberrant expression of lncRNAs affects the progression of cancers. In this review, we highlight recent findings regarding the roles of lncRNAs in cancer biology.
large non-coding RNA; cancer; disease; MALAT1; HOTAIR; ANRIL
The transcriptome of a cell is represented by a myriad of different RNA molecules with and without protein-coding capacities. In recent years, advances in sequencing technologies have allowed researchers to more fully appreciate the complexity of whole transcriptomes, showing that the vast majority of the genome is transcribed, producing a diverse population of non-protein coding RNAs (ncRNAs). Thus, the biological significance of non-coding RNAs (ncRNAs) have been largely underestimated. Amongst these multiple classes of ncRNAs, the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionally diverse. A small but growing number of lncRNAs have been experimentally studied, and a view is emerging that these are key regulators of epigenetic gene regulation in mammalian cells. LncRNAs have already been implicated in human diseases such as cancer and neurodegeneration, highlighting the importance of this emergent field. In this article, we review the catalogs of annotated lncRNAs and the latest advances in our understanding of lncRNAs.
non-coding RNAs; regulation; long non-coding RNA; epigenetics
Comprehensive analysis of the mammalian transcriptome has revealed that long non-coding RNAs (lncRNAs) may make up a large fraction of cellular transcripts. Recent years have seen a surge of studies aimed at functionally characterizing the role of lncRNAs in development and disease. In this review, we discuss new findings implicating lncRNAs in controlling development of the central nervous system (CNS). The evolution of the higher vertebrate brain has been accompanied by an increase in the levels and complexities of lncRNAs expressed within the developing nervous system. Although a limited number of CNS-expressed lncRNAs are now known to modulate the activity of proteins important for neuronal differentiation, the function of the vast majority of neuronal-expressed lncRNAs is still unknown. Topics of intense current interest include the mechanism by which CNS-expressed lncRNAs might function in epigenetic and transcriptional regulation during neuronal development, and how gain and loss of function of individual lncRNAs contribute to neurological diseases.
cell fate; neurogenesis; embryonic stem cells; neural stem cells; transcription factors; epigenetics; long noncoding RNA; molecular scaffold
Long non-coding RNAs (lncRNAs) are emerging as potent regulators of cell physiology, and recent studies highlight their role in tumor development. However, while established protein-coding oncogenes and tumor suppressors often display striking patterns of focal DNA copy-number alteration in tumors, similar evidence is largely lacking for lncRNAs. Here, we report on a genomic analysis of GENCODE lncRNAs in high-grade serous ovarian adenocarcinoma, based on The Cancer Genome Atlas (TCGA) molecular profiles. Using genomic copy-number data and deep coverage transcriptome sequencing, we derived dual copy-number and expression data for 10,419 lncRNAs across 407 primary tumors. We describe global correlations between lncRNA copy-number and expression, and associate established expression subtypes with distinct lncRNA signatures. By examining regions of focal copy-number change that lack protein-coding targets, we identified an intergenic lncRNA on chromosome 1, OVAL, that shows narrow focal genomic amplification in a subset of tumors. While weakly expressed in most tumors, focal amplification coincided with strong OVAL transcriptional activation. Screening of 16 other cancer types revealed similar patterns in serous endometrial carcinomas. This shows that intergenic lncRNAs can be specifically targeted by somatic copy-number amplification, suggestive of functional involvement in tumor initiation or progression. Our analysis provides testable hypotheses and paves the way for further study of lncRNAs based on TCGA and other large-scale cancer genomics datasets.
Mammalian testis development and spermatogenesis play critical roles in male fertility and continuation of a species. Previous research into the molecular mechanisms of testis development and spermatogenesis has largely focused on the role of protein-coding genes and small non-coding RNAs, such as microRNAs and piRNAs. Recently, it has become apparent that large numbers of long (>200 nt) non-coding RNAs (lncRNAs) are transcribed from mammalian genomes and that lncRNAs perform important regulatory functions in various developmental processes. However, the expression of lncRNAs and their biological functions in post-natal testis development remain unknown. In this study, we employed microarray technology to examine lncRNA expression profiles of neonatal (6-day-old) and adult (8-week-old) mouse testes. We found that 8,265 lncRNAs were expressed above background levels during post-natal testis development, of which 3,025 were differentially expressed. Candidate lncRNAs were identified for further characterization by an integrated examination of genomic context, gene ontology (GO) enrichment of their associated protein-coding genes, promoter analysis for epigenetic modification, and evolutionary conservation of elements. Many lncRNAs overlapped or were adjacent to key transcription factors and other genes involved in spermatogenesis, such as Ovol1, Ovol2, Lhx1, Sox3, Sox9, Plzf, c-Kit, Wt1, Sycp2, Prm1 and Prm2. Most differentially expressed lncRNAs exhibited epigenetic modification marks similar to protein-coding genes and tend to be expressed in a tissue-specific manner. In addition, the majority of differentially expressed lncRNAs harbored evolutionary conserved elements. Taken together, our findings represent the first systematic investigation of lncRNA expression in the mammalian testis and provide a solid foundation for further research into the molecular mechanisms of lncRNAs function in mammalian testis development and spermatogenesis.
Ventricular septal defects (VSD) are the most common form of congenital heart disease, which is the leading non-infectious cause of death in children; nevertheless, the exact cause of VSD is not yet fully understood. Long non-coding RNAs (lncRNAs) have been shown to play key roles in various biological processes, such as imprinting control, circuitry controlling pluripotency and differentiation, immune responses and chromosome dynamics. Notably, a growing number of lncRNAs have been implicated in disease etiology, although an association with VSD has not been reported. In the present study, we conducted an integrated analysis of dysregulated lncRNAs, focusing specifically on the identification and characterization of lncRNAs potentially involving in initiation of VSD. Comparison of the transcriptome profiles of cardiac tissues from VSD-affected and normal hearts was performed using a second-generation lncRNA microarray, which covers the vast majority of expressed RefSeq transcripts (29,241 lncRNAs and 30,215 coding transcripts). In total, 880 lncRNAs were upregulated and 628 were downregulated in VSD. Furthermore, our established filtering pipeline indicated an association of two lncRNAs, ENST00000513542 and RP11-473L15.2, with VSD. This dysregulation of the lncRNA profile provides a novel insight into the etiology of VSD and furthermore, illustrates the intricate relationship between coding and ncRNA transcripts in cardiac development. These data may offer a background/reference resource for future functional studies of lncRNAs related to VSD.
To assess the global changes in and characteristics of the transcriptome of long noncoding RNAs (LncRNAs) in heart tissue, whole blood and plasma during heart failure (HF) and association with expression of paired coding genes.
Here we used microarray assay to examine the transcriptome of LncRNAs deregulated in the heart, whole blood, and plasma during HF in mice. We confirmed the changes in LncRNAs by quantitative PCR.
We revealed and confirmed a number of LncRNAs that were deregulated during HF, which suggests a potential role of LncRNAs in HF. Strikingly, the patterns of expression of LncRNA differed between plasma and other tissue during HF. LncRNA expression was associated with LncRNA length in all samples but not in plasma during HF, which suggests that the global association of LncRNA expression and LncRNA length in plasma could be biomarkers for HF. In total, 32 LncRNAs all expressed in the heart, whole blood and plasma showed changed expression with HF, so they may be biomarkers in HF. In addition, sense-overlapped LncRNAs tended to show consistent expression with their paired coding genes, whereas antisense-overlapped LncRNAs tended to show the opposite expression in plasma; so different types of LncRNAs may have different characteristics in HF. Interestingly, we revealed an inverse correlation between changes in expression of LncRNAs in plasma and in heart, so circulating levels of LncRNAs may not represent just passive leakage from the HF heart but also active regulation or release of circulatory cells or other cells during HF.
We reveal stable expression of LncRNAs in plasma during HF, which suggests a newly described component in plasma. The distinct expression patterns of circulatory LncRNAs during HF indicate that LncRNAs may actively respond to stress and thus serve as biomarkers of HF diagnosis and treatment.
Mammalian genomes are extensively transcribed producing thousands of long non-protein-coding RNAs (lncRNAs). The biological significance and function of the vast majority of lncRNAs remain unclear. Recent studies have implicated several lncRNAs as playing important roles in embryonic development and cancer progression. LncRNAs are characterized with different genomic architectures in relationship with their associated protein-coding genes. Our study aimed at bridging lncRNA architecture with dynamical patterns of their expression using differentiating human neuroblastoma cells model.
LncRNA expression was studied in a 120-hours timecourse of differentiation of human neuroblastoma SH-SY5Y cells into neurons upon treatment with retinoic acid (RA), the compound used for the treatment of neuroblastoma. A custom microarray chip was utilized to interrogate expression levels of 9,267 lncRNAs in the course of differentiation. We categorized lncRNAs into 19 architecture classes according to their position relatively to protein-coding genes. For each architecture class, dynamics of expression of lncRNAs was studied in association with their protein-coding partners. It allowed us to demonstrate positive correlation of lncRNAs with their associated protein-coding genes at bidirectional promoters and for sense-antisense transcript pairs. In contrast, lncRNAs located in the introns and downstream of the protein-coding genes were characterized with negative correlation modes. We further classified the lncRNAs by the temporal patterns of their expression dynamics. We found that intronic and bidirectional promoter architectures are associated with rapid RA-dependent induction or repression of the corresponding lncRNAs, followed by their constant expression. At the same time, lncRNAs expressed downstream of protein-coding genes are characterized by rapid induction, followed by transcriptional repression. Quantitative RT-PCR analysis confirmed the discovered functional modes for several selected lncRNAs associated with proteins involved in cancer and embryonic development.
This is the first report detailing dynamical changes of multiple lncRNAs during RA-induced neuroblastoma differentiation. Integration of genomic and transcriptomic levels of information allowed us to demonstrate specific behavior of lncRNAs organized in different genomic architectures. This study also provides a list of lncRNAs with possible roles in neuroblastoma.
Long non-coding RNAs (lncRNAs) are emerging as important regulators of cell physiology, but it is yet unknown to what extent lncRNAs have evolved to be targeted by microRNAs. Comparative genomics has previously revealed widespread evolutionarily conserved microRNA targeting of protein-coding mRNAs, and here we applied a similar approach to lncRNAs.
We used a map of putative microRNA target sites in lncRNAs where site conservation was evaluated based on 46 vertebrate species. We compared observed target site frequencies to those obtained with a random model, at variable prediction stringencies. While conserved sites were not present above random expectation in intergenic lncRNAs overall, we observed a marginal over-representation of highly conserved 8-mer sites in a small subset of cytoplasmic lncRNAs (12 sites in 8 lncRNAs at 56% false discovery rate, P = 0.10).
Evolutionary conservation in lncRNAs is generally low but patch-wise high, and these patches could, in principle, harbor conserved target sites. However, while our analysis efficiently detected conserved targeting of mRNAs, it provided only limited and marginally significant support for conserved microRNA-lncRNA interactions. We conclude that conserved microRNA-lncRNA interactions could not be reliably detected with our methodology.
Long non-coding RNA; lncRNA; microRNA; Comparative genomics
Long non-coding RNAs (lncRNAs), representing a large proportion of non-coding transcripts across the human genome, are evolutionally conserved and biologically functional. At least one-third of the phenotype-related loci identified by genome-wide association studies (GWAS) are mapped to non-coding intervals. However, the relationships between phenotype-related loci and lncRNAs are largely unknown. Utilizing the 1000 Genomes data, we compared single-nucleotide polymorphisms (SNPs) within the sequences of lncRNA and protein-coding genes as defined in the Ensembl database. We further annotated the phenotype-related SNPs reported by GWAS at lncRNA intervals. Because prostate cancer (PCa) risk-related loci were enriched in lncRNAs, we then performed meta-analysis of two existing GWAS for discovery and an additional sample set for replication, revealing PCa risk-related loci at lncRNA regions. The SNP density in regions of lncRNA was similar to that in protein-coding regions, but they were less polymorphic than surrounding regions. Among the 1998 phenotype-related SNPs identified by GWAS, 52 loci were located directly in lncRNA intervals with a 1.5-fold enrichment compared with the entire genome. More than a 5-fold enrichment was observed for eight PCa risk-related loci in lncRNA genes. We also identified a new PCa risk-related SNP rs3787016 in an lncRNA region at 19q13 (per allele odds ratio = 1.19; 95% confidence interval: 1.11–1.27) with P value of 7.22 × 10−7. lncRNAs may be important for interpreting and mining GWAS data. However, the catalog of lncRNAs needs to be better characterized in order to fully evaluate the relationship of phenotype-related loci with lncRNAs.
A growing body of evidence shows that long non-coding RNAs (lncRNAs) are involved in multiple human diseases than previously realized. However, no information is available now about lncRNAs in cardiac fibroblasts. The expression profile of lncRNAs was analyzed in Ang II-treated cardiac fibroblasts using lncRNAs arrays. The analysis showed that 282 of 4376 detected lncRNAs demonstrated >2-fold differential expression in response to the treatment with Ang II (100 nm) for 24 h. Among of them, 22 lncRNAs showed a greater than 4-fold changes. Meanwhile, Ang II also induced a widely expression changes in protein-coding genes in cardiac fibroblasts. Quantitative real time PCR confirmed the changes of six lncRNAs (AF159100, BC086588, MRNR026574, MRAK134679, NR024118, AX765700) and mRNAs (IL6, RGS2, PRG4, TIMP1, Cdkn1c, TIMP3, Col I, Col III and Fibronectin) in cardiac fibroblasts. Bioinformatic analysis indicated the process of cell proliferation. Further studies revealed that the down-regulating of Ang II on the expression of lncRNA-NR024118 was time-dependent, that the level of NR024118 was lowest at 24 h and back at 48 h. Ang II also dynamically down regulated the expression of Cdkn1c in cardiac fibroblasts. Ang II at a range from 10-9 M to 10-6 M induced a decrease of NR024118 and Cdkn1c in cardiac fibroblasts. In conclusion, the expression profile of lncRNAs was significantly altered in the Ang II-treated cardiac fibroblasts and Ang II dynamically regulated the expression of lncRNA-NR024118 and Cdkn1c in cardiac fibroblasts, indicating the potential role of NR024118 in cardiac fibroblasts.
Angiotensin Π; cardiac fibroblasts; long non-coding RNA
Neurodegenerative diseases in general and specifically late-onset Alzheimer’s disease (LOAD) involve a genetically complex and largely obscure ensemble of causative and risk factors accompanied by complex feedback responses. The advent of “high-throughput” transcriptome investigation technologies such as microarray and deep sequencing is increasingly being combined with sophisticated statistical and bioinformatics analysis methods complemented by knowledge-based approaches such as Bayesian Networks or network and graph analyses. Together, such “integrative” studies are beginning to identify co-regulated gene networks linked with biological pathways and potentially modulating disease predisposition, outcome, and progression. Specifically, bioinformatics analyses of integrated microarray and genotyping data in cases and controls reveal changes in gene expression of both protein-coding and small and long regulatory RNAs; highlight relevant quantitative transcriptional differences between LOAD and non-demented control brains and demonstrate reconfiguration of functionally meaningful molecular interaction structures in LOAD. These may be measured as changes in connectivity in “hub nodes” of relevant gene networks (Zhang etal., 2013). We illustrate here the open analytical questions in the transcriptome investigation of neurodegenerative disease studies, proposing “ad hoc” strategies for the evaluation of differential gene expression and hints for a simple analysis of the non-coding RNA (ncRNA) part of such datasets. We then survey the emerging role of long ncRNAs (lncRNAs) in the healthy and diseased brain transcriptome and describe the main current methods for computational modeling of gene networks. We propose accessible modular and pathway-oriented methods and guidelines for bioinformatics investigations of whole transcriptome next generation sequencing datasets. We finally present methods and databases for functional interpretations of lncRNAs and propose a simple heuristic approach to visualize and represent physical and functional interactions of the coding and non-coding components of the transcriptome. Integrating in a functional and integrated vision coding and ncRNA analyses is of utmost importance for current and future analyses of neurodegenerative transcriptomes.
neurodegenerative diseases; bioinformatics and computational biology; next-generation sequencing; non-coding RNA; biological networks
Long non-coding RNAs (lncRNAs) are key regulatory molecules involved in a variety of biological processes and human diseases. However, the pathological effects of lncRNAs on primary varicose great saphenous veins (GSVs) remain unclear. The purpose of the present study was to identify aberrantly expressed lncRNAs involved in the prevalence of GSV varicosities and predict their potential functions. Using microarray with 33,045 lncRNA and 30,215 mRNA probes, 557 lncRNAs and 980 mRNAs that differed significantly in expression between the varicose great saphenous veins and control veins were identified in six pairs of samples. These lncRNAs were sub-grouped and mRNAs expressed at different levels were clustered into several pathways with six focused on metabolic pathways. Quantitative real-time PCR replication of nine lncRNAs was performed in 32 subjects, validating six lncRNAs (AF119885, AK021444, NR_027830, G36810, NR_027927, uc.345-). A coding-non-coding gene co-expression network revealed that four of these six lncRNAs may be correlated with 11 mRNAs and pathway analysis revealed that they may be correlated with another 8 mRNAs associated with metabolic pathways. In conclusion, aberrantly expressed lncRNAs for GSV varicosities were here systematically screened and validated and their functions were predicted. These findings provide novel insight into the physiology of lncRNAs and the pathogenesis of varicose veins for further investigation. These aberrantly expressed lncRNAs may serve as new therapeutic targets for varicose veins. The Human Ethnics Committee of Shanghai East Hospital, Tongji University School of Medicine approved the study (NO.: 2011-DF-53).
Hepatitis C virus (HCV) infection is one of main causes of hepatocellular carcinoma (HCC) and the prevalence of HCV-associated HCC is on the rise worldwide. It is particularly important and helpful to identify potential markers for screening and early diagnosis of HCC among high-risk individuals with chronic hepatitis C, and to identify target molecules for the prevention and treatment of HCV-associated-HCC. Small non-coding RNAs, mainly microRNAs (miRNAs), and long non-coding RNAs (lncRNAs) with size greater than 200 nucleotides, are likely to play important roles in a variety of biological processes, including development and progression of HCC. For the most part their underlying mechanisms of action remain largely unknown. In recent years, with the advance of high-resolution of microarray and application of next generation sequencing techniques, a significant number of non-coding RNAs (ncRNAs) associated with HCC, particularly caused by HCV infection, have been found to be differentially expressed and to be involved in pathogenesis of HCV-associated HCC. In this review, we focus on recent studies of ncRNAs, especially miRNAs and lncRNAs related to HCV-induced HCC. We summarize those ncRNAs aberrantly expressed in HCV-associated HCC and highlight the potential uses of ncRNAs in early detection, diagnosis and therapy of HCV-associated HCC. We also discuss the limitations of recent studies, and suggest future directions for research in the field. miRNAs, lncRNAs and their target genes may represent new candidate molecules for the prevention, diagnosis and treatment of HCC in patients with HCV infection. Studies of the potential uses of miRNAs and lncRNAs as diagnostic tools or therapies are still in their infancy.
MicroRNA; Long non-coding RNAs; Non-coding RNAs; Hepatitis C virus; Hepatocellular carcinoma
Intronic and intergenic long noncoding RNAs (lncRNAs) are emerging gene expression regulators. The molecular pathogenesis of renal cell carcinoma (RCC) is still poorly understood, and in particular, limited studies are available for intronic lncRNAs expressed in RCC.
Microarray experiments were performed with custom-designed arrays enriched with probes for lncRNAs mapping to intronic genomic regions. Samples from 18 primary RCC tumors and 11 nontumor adjacent matched tissues were analyzed. Meta-analyses were performed with microarray expression data from three additional human tissues (normal liver, prostate tumor and kidney nontumor samples), and with large-scale public data for epigenetic regulatory marks and for evolutionarily conserved sequences.
A signature of 29 intronic lncRNAs differentially expressed between RCC and nontumor samples was obtained (false discovery rate (FDR) <5%). A signature of 26 intronic lncRNAs significantly correlated with the RCC five-year patient survival outcome was identified (FDR <5%, p-value ≤0.01). We identified 4303 intronic antisense lncRNAs expressed in RCC, of which 22% were significantly (p <0.05) cis correlated with the expression of the mRNA in the same locus across RCC and three other human tissues. Gene Ontology (GO) analysis of those loci pointed to 'regulation of biological processes’ as the main enriched category. A module map analysis of the protein-coding genes significantly (p <0.05) trans correlated with the 20% most abundant lncRNAs, identified 51 enriched GO terms (p <0.05). We determined that 60% of the expressed lncRNAs are evolutionarily conserved. At the genomic loci containing the intronic RCC-expressed lncRNAs, a strong association (p <0.001) was found between their transcription start sites and genomic marks such as CpG islands, RNA Pol II binding and histones methylation and acetylation.
Intronic antisense lncRNAs are widely expressed in RCC tumors. Some of them are significantly altered in RCC in comparison with nontumor samples. The majority of these lncRNAs is evolutionarily conserved and possibly modulated by epigenetic modifications. Our data suggest that these RCC lncRNAs may contribute to the complex network of regulatory RNAs playing a role in renal cell malignant transformation.
Renal cell carcinoma (RCC); Unspliced intronic long noncoding RNAs; Antisense lncRNAs; Microarray analysis; Molecular markers; Gene expression correlation; Histone methylation; Histone acetylation; Evolutionary lncRNA conservation
Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants.
To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci.
Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms.