Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (∼30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non–TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ∼30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ∼35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
An unexpected layer of complexity in the genomes of humans and other vertebrates lies in the abundance of genes that do not appear to encode proteins but produce a variety of non-coding RNAs. In particular, the human genome is currently predicted to contain 5,000–10,000 independent gene units generating long (>200 nucleotides) noncoding RNAs (lncRNAs). While there is growing evidence that a large fraction of these lncRNAs have cellular functions, notably to regulate protein-coding gene expression, almost nothing is known on the processes underlying the evolutionary origins and diversification of lncRNA genes. Here we show that transposable elements, through their capacity to move and spread in genomes in a lineage-specific fashion, as well as their ability to introduce regulatory sequences upon chromosomal insertion, represent a major force shaping the lncRNA repertoire of humans, mice, and zebrafish. Not only do TEs make up a substantial fraction of mature lncRNA transcripts, they are also enriched in the vicinity of lncRNA genes, where they frequently contribute to their transcriptional regulation. Through specific examples we provide evidence that some TE sequences embedded in lncRNAs are critical for the biogenesis of lncRNAs and likely important for their function.
Long noncoding RNAs (lncRNAs) are noncoding transcripts longer than 200 nucleotides, which show evidence of pervasive transcription and participate in a plethora of cellular regulatory processes. Although several noncoding transcripts have been functionally annotated as lncRNAs within the genome, not all have been proven to fulfill the criteria for a functional regulator and further analyses have to be done in order to include them in a functional cohort. LncRNAs are being classified and reclassified in an ongoing annotation process, and the challenge is fraught with ambiguity, as newer evidences of their biogenesis and functional implication come into light. In our effort to understand the complexity of this still enigmatic biomolecule, we have developed a new database entitled “LncRBase” where we have classified and characterized lncRNAs in human and mouse. It is an extensive resource of human and mouse lncRNA transcripts belonging to fourteen distinct subtypes, with a total of 83,201 entries for mouse and 133,361 entries for human: among these, we have newly annotated 8,507 mouse and 14,813 human non coding RNA transcripts (from UCSC and H-InvDB 8.0) as lncRNAs. We have especially considered protein coding gene loci which act as hosts for non coding transcripts. LncRBase includes different lncRNA transcript variants of protein coding genes within LncRBase. LncRBase provides information about the genomic context of different lncRNA subtypes, their interaction with small non coding RNAs (ncRNAs) viz. piwi interacting RNAs (piRNAs) and microRNAs (miRNAs) and their mode of regulation, via association with diverse other genomic elements. Adequate knowledge about genomic origin and molecular features of lncRNAs is essential to understand their functional and behavioral complexities. Overall, LncRBase provides a thorough study on various aspects of lncRNA origin and function and a user-friendly interface to search for lncRNA information. LncRBase is available at http://bicresources.jcbose.ac.in/zhumur/lncrbase.
Long non-coding RNAs (lncRNAs) are transcripts that are 200 bp or longer, do not encode proteins, and potentially play important roles in eukaryotic gene regulation. However, the number, characteristics and expression inheritance pattern of lncRNAs in maize are still largely unknown.
By exploiting available public EST databases, maize whole genome sequence annotation and RNA-seq datasets from 30 different experiments, we identified 20,163 putative lncRNAs. Of these lncRNAs, more than 90% are predicted to be the precursors of small RNAs, while 1,704 are considered to be high-confidence lncRNAs. High confidence lncRNAs have an average transcript length of 463 bp and genes encoding them contain fewer exons than annotated genes. By analyzing the expression pattern of these lncRNAs in 13 distinct tissues and 105 maize recombinant inbred lines, we show that more than 50% of the high confidence lncRNAs are expressed in a tissue-specific manner, a result that is supported by epigenetic marks. Intriguingly, the inheritance of lncRNA expression patterns in 105 recombinant inbred lines reveals apparent transgressive segregation, and maize lncRNAs are less affected by cis- than by trans-genetic factors.
We integrate all available transcriptomic datasets to identify a comprehensive set of maize lncRNAs, provide a unique annotation resource of the maize genome and a genome-wide characterization of maize lncRNAs, and explore the genetic control of their expression using expression quantitative trait locus mapping.
Long non-coding RNAs (lncRNAs) represent a class of riboregulators that either directly act in long form or are processed to shorter miRNAs and siRNAs. Emerging evidence shows that lncRNAs participate in stress responsive regulation. In this study, to identify the putative maize lncRNAs responsive to drought stress, 8449 drought responsive transcripts were first uploaded to the Coding Potential Calculator website for classification as protein coding or non-coding RNAs, and 1724 RNAs were identified as potential non-coding RNAs. A Perl script was written to screen these 1724 ncRNAs and 664 transcripts were ultimately identified as drought-responsive lncRNAs. Of these 664 transcripts, 126 drought-responsive lncRNAs were highly similar to known maize lncRNAs; the remaining 538 transcripts were considered as novel lncRNAs. Among the 664 lncRNAs identified as drought responsive, 567 were upregulated and 97 were downregulated in drought-stressed leaves of maize. 8 lncRNAs were identified as miRNA precursor lncRNAs, 62 were classified as both shRNA and siRNA precursors, and 279 were classified as siRNA precursors. The remaining 315 lncRNAs were classified as other lncRNAs that are likely to function as longer molecules. Among these 315 lncRNAs, 10 are identified as antisense lncRNAs and 7 could pair with 17 CDS sequences with near-perfect matches. Finally, RT-qPCR results confirmed that all selected lncRNAs could respond to drought stress. These findings extend the current view on lncRNAs as ubiquitous regulators under stress conditions.
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases.
Long non-coding RNAs (lncRNAs) comprise a novel, fascinating class of RNAs with largely unknown biological functions. Parkinson's-disease (PD) is the most frequent motor disorder, and Deep-brain-stimulation (DBS) treatment alleviates the symptoms, but early disease biomarkers are still unknown and new future genetic interference targets are urgently needed. Using RNA-sequencing technology and a novel computational workflow for in-depth exploration of whole-transcriptome RNA-seq datasets, we detected and analyzed lncRNAs in sequenced libraries from PD patients' leukocytes pre and post-treatment and the brain, adding this full profile resource of over 7,000 lncRNAs to the few human tissues-derived lncRNA datasets that are currently available. Our study includes sample-specific database construction, detecting disease-derived changes in known and novel lncRNAs, exons and junctions and predicting corresponding changes in Polyadenylation choices, protein domains and miRNA binding sites. We report widespread transcript structure variations at the splice junction and exons levels, including novel exons and junctions and alteration of lncRNAs followed by experimental validation in PD leukocytes and two PD brain regions compared with controls. Our results suggest lncRNAs involvement in neurodegenerative diseases, and specifically PD. This comprehensive workflow will be of use to the increasing number of laboratories producing RNA-Seq data in a wide range of biomedical studies.
Study on long non-coding RNAs (lncRNAs) has been promoted by high-throughput RNA sequencing (RNA-Seq). However, it is still not trivial to identify lncRNAs from the RNA-Seq data and it remains a challenge to uncover their functions.
We present a computational pipeline for detecting novel lncRNAs from the RNA-Seq data. First, the genome-guided transcriptome reconstruction is used to generate initially assembled transcripts. The possible partial transcripts and artefacts are filtered according to the quantified expression level. After that, novel lncRNAs are detected by further filtering known transcripts and those with high protein coding potential, using a newly developed program called lncRScan. We applied our pipeline to a mouse Klf1 knockout dataset, and discussed the plausible functions of the novel lncRNAs we detected by differential expression analysis. We identified 308 novel lncRNA candidates, which have shorter transcript length, fewer exons, shorter putative open reading frame, compared with known protein-coding transcripts. Of the lncRNAs, 52 large intergenic ncRNAs (lincRNAs) show lower expression level than the protein-coding ones and 13 lncRNAs represent significant differential expression between the wild-type and Klf1 knockout conditions.
Our method can predict a set of novel lncRNAs from the RNA-Seq data. Some of the lncRNAs are showed differentially expressed between the wild-type and Klf1 knockout strains, suggested that those novel lncRNAs can be given high priority in further functional studies.
Long noncoding RNAs (lncRNAs) are a recently discovered class of non-protein coding RNAs, which have now increasingly been shown to be involved in a wide variety of biological processes as regulatory molecules. The functional role of many of the members of this class has been an enigma, except a few of them like Malat and HOTAIR. Little is known regarding the regulatory interactions between noncoding RNA classes. Recent reports have suggested that lncRNAs could potentially interact with other classes of non-coding RNAs including microRNAs (miRNAs) and modulate their regulatory role through interactions. We hypothesized that lncRNAs could participate as a layer of regulatory interactions with miRNAs. The availability of genome-scale datasets for Argonaute targets across human transcriptome has prompted us to reconstruct a genome-scale network of interactions between miRNAs and lncRNAs.
We used well characterized experimental Photoactivatable-Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation (PAR-CLIP) datasets and the recent genome-wide annotations for lncRNAs in public domain to construct a comprehensive transcriptome-wide map of miRNA regulatory elements. Comparative analysis revealed that in addition to targeting protein-coding transcripts, miRNAs could also potentially target lncRNAs, thus participating in a novel layer of regulatory interactions between noncoding RNA classes. Furthermore, we have modeled one example of miRNA-lncRNA interaction using a zebrafish model. We have also found that the miRNA regulatory elements have a positional preference, clustering towards the mid regions and 3′ ends of the long noncoding transcripts. We also further reconstruct a genome-wide map of miRNA interactions with lncRNAs as well as messenger RNAs.
This analysis suggests widespread regulatory interactions between noncoding RNAs classes and suggests a novel functional role for lncRNAs. We also present the first transcriptome scale study on miRNA-lncRNA interactions and the first report of a genome-scale reconstruction of a noncoding RNA regulatory interactome involving lncRNAs.
Long non-coding RNAs (lncRNAs) have been regarded as the primary genetic regulators of several important biological processes. However, the biological functions of lncRNAs in radiation-induced lung damage remain largely unknown. The present study aimed to investigate the potential effects of lncRNAs on radiation-induced lung injury (RILI). Female C57BL/6 mice were exposed to 12 Gy single doses of total body irradiation (TBI). LncRNA microarray screening was conducted at 24 h post-irradiation (IR) to investigate the differentially-expressed lncRNAs during RILI. Following the subsequent bioinformatics analysis and reverse transcription-polymerase chain reaction (RT-PCR) validation, one of the verified differentially-expressed long intergenic radiation-responsive ncRNAs (LIRRs), LIRR1, was selected for further functional study. The normal human bronchial epithelial BEAS-2B cell line was used as the cell model. The recombinant eukaryotic expression vector for the lncRNA was designed, constructed and transfected using lipofectamine. RT-PCR, clonogenic and flow cytometry assays, immunofluorescence detection and western blot analysis were performed to reveal the role of the lncRNA in the radiosensitivity regulation of the RILI target cells. In lung tissues 24 h after 12 Gy TBI, six of the identified differentially-expressed LIRRs near the coding genes were validated using quantitative (q)PCR. The upregulation of two LIRRs was observed and confirmed using qPCR. LIRR1 was chosen for further functional study. Following the stable transfection of LIRR1, identified through G418 screening, increased radiosensitivity, evident cell cycle G1 phase arrest and increased γ-H2AX foci formation were observed in the bronchial epithelial BEAS-2B cell line subsequent to IR. LIRR1 overexpression also led to decreased expression of the KU70, KU80 and RAD50 DNA repair proteins, marked activation of p53, decreased mouse double minute 2 homolog (MDM2) expression, and substantially induced p21 and suppressed cyclin-dependent kinase 2 in BEAS-2B following IR. Subsequent to the use of Pifithrin-α, a specific inhibitor of p53 activation, increased MDM2 expression was observed in the LIRR1-overexpressing cells, suggesting that LIRR1 could mediate the DNA damage response (DDR) signaling in a p53-dependent manner. The present study provides a novel mechanism for RILI, using the concept of lncRNAs.
long non-coding RNA; radiation-induced lung injury; radiosensitivity; cell cycle; DNA damage response; p53
The liver is a vital organ with critical functions in metabolism, protein synthesis, and immune defense. Most of the liver functions are not mature at birth and many changes happen during postnatal liver development. However, it is unclear what changes occur in liver after birth, at what developmental stages they occur, and how the developmental processes are regulated. Long non-coding RNAs (lncRNAs) are involved in organ development and cell differentiation. Here, we analyzed the transcriptome of lncRNAs in mouse liver from perinatal (day −2) to adult (day 60) by RNA-Sequencing, with an attempt to understand the role of lncRNAs in liver maturation. We found around 15,000 genes expressed, including about 2,000 lncRNAs. Most lncRNAs were expressed at a lower level than coding RNAs. Both coding RNAs and lncRNAs displayed three major ontogenic patterns: enriched at neonatal, adolescent, or adult stages. Neighboring coding and non-coding RNAs showed the trend to exhibit highly correlated ontogenic expression patterns. Gene ontology (GO) analysis revealed that some lncRNAs enriched at neonatal ages have their neighbor protein coding genes also enriched at neonatal ages and associated with cell proliferation, immune activation related processes, tissue organization pathways, and hematopoiesis; other lncRNAs enriched at adolescent ages have their neighbor protein coding genes associated with different metabolic processes. These data reveal significant functional transition during postnatal liver development and imply the potential importance of lncRNAs in liver maturation.
Eukaryotic genomes generate a heterogeneous ensemble of mRNAs and long noncoding RNAs (lncRNAs). LncRNAs and mRNAs are both transcribed by Pol II and acquire 5′ caps and poly(A) tails, but only mRNAs are translated into proteins. To address how these classes are distinguished, we identified the transcriptome-wide targets of 13 RNA processing, export, and turnover factors in budding yeast. Comparing the maturation pathways of mRNAs and lncRNAs revealed that transcript fate is largely determined during 3′ end formation. Most lncRNAs are targeted for nuclear RNA surveillance, but a subset with 3′ cleavage and polyadenylation features resembling the mRNA consensus can be exported to the cytoplasm. The Hrp1 and Nab2 proteins act at this decision point, with dual roles in mRNA cleavage/polyadenylation and lncRNA surveillance. Our data also reveal the dynamic and heterogeneous nature of mRNA maturation, and highlight a subset of “lncRNA-like” mRNAs regulated by the nuclear surveillance machinery.
•Transcriptome-wide analysis shows dynamic assembly of ribonucleoprotein particles•LncRNA and mRNA subclasses undergo distinct maturation and turnover pathways•Transcript fate is determined during 3′ end formation•Transcript classes overlap, with many “mRNA-like” lncRNAs and “lncRNA-like” mRNAs
A transcriptome-wide analysis shows that different classes of mRNAs and lncRNAs are characterized by distinct 3′ end formation and RNP complexes, explaining how cells distinguish among these otherwise similar RNAs.
To assess the global changes in and characteristics of the transcriptome of long noncoding RNAs (LncRNAs) in heart tissue, whole blood and plasma during heart failure (HF) and association with expression of paired coding genes.
Here we used microarray assay to examine the transcriptome of LncRNAs deregulated in the heart, whole blood, and plasma during HF in mice. We confirmed the changes in LncRNAs by quantitative PCR.
We revealed and confirmed a number of LncRNAs that were deregulated during HF, which suggests a potential role of LncRNAs in HF. Strikingly, the patterns of expression of LncRNA differed between plasma and other tissue during HF. LncRNA expression was associated with LncRNA length in all samples but not in plasma during HF, which suggests that the global association of LncRNA expression and LncRNA length in plasma could be biomarkers for HF. In total, 32 LncRNAs all expressed in the heart, whole blood and plasma showed changed expression with HF, so they may be biomarkers in HF. In addition, sense-overlapped LncRNAs tended to show consistent expression with their paired coding genes, whereas antisense-overlapped LncRNAs tended to show the opposite expression in plasma; so different types of LncRNAs may have different characteristics in HF. Interestingly, we revealed an inverse correlation between changes in expression of LncRNAs in plasma and in heart, so circulating levels of LncRNAs may not represent just passive leakage from the HF heart but also active regulation or release of circulatory cells or other cells during HF.
We reveal stable expression of LncRNAs in plasma during HF, which suggests a newly described component in plasma. The distinct expression patterns of circulatory LncRNAs during HF indicate that LncRNAs may actively respond to stress and thus serve as biomarkers of HF diagnosis and treatment.
Ventricular septal defects (VSD) are the most common form of congenital heart disease, which is the leading non-infectious cause of death in children; nevertheless, the exact cause of VSD is not yet fully understood. Long non-coding RNAs (lncRNAs) have been shown to play key roles in various biological processes, such as imprinting control, circuitry controlling pluripotency and differentiation, immune responses and chromosome dynamics. Notably, a growing number of lncRNAs have been implicated in disease etiology, although an association with VSD has not been reported. In the present study, we conducted an integrated analysis of dysregulated lncRNAs, focusing specifically on the identification and characterization of lncRNAs potentially involving in initiation of VSD. Comparison of the transcriptome profiles of cardiac tissues from VSD-affected and normal hearts was performed using a second-generation lncRNA microarray, which covers the vast majority of expressed RefSeq transcripts (29,241 lncRNAs and 30,215 coding transcripts). In total, 880 lncRNAs were upregulated and 628 were downregulated in VSD. Furthermore, our established filtering pipeline indicated an association of two lncRNAs, ENST00000513542 and RP11-473L15.2, with VSD. This dysregulation of the lncRNA profile provides a novel insight into the etiology of VSD and furthermore, illustrates the intricate relationship between coding and ncRNA transcripts in cardiac development. These data may offer a background/reference resource for future functional studies of lncRNAs related to VSD.
Long non-coding RNAs (lncRNA) play an important role in carcinogenesis; knowledge on lncRNA expression in renal cell carcinoma is rudimental. As a basis for biomarker development, we aimed to explore the lncRNA expression profile in clear cell renal cell carcinoma (ccRCC) tissue.
Microarray experiments were performed to determine the expression of 32,183 lncRNA transcripts belonging to 17,512 lncRNAs in 15 corresponding normal and malignant renal tissues. Validation was performed using quantitative real-time PCR in 55 ccRCC and 52 normal renal specimens. Computational analysis was performed to determine lncRNA-microRNA (MiRTarget2) and lncRNA-protein (catRAPID omics) interactions. We identified 1,308 dysregulated transcripts (expression change >2-fold; upregulated: 568, downregulated: 740) in ccRCC tissue. Among these, aberrant expression was validated using PCR: lnc-BMP2-2 (mean expression change: 37-fold), lnc-CPN2-1 (13-fold), lnc-FZD1-2 (9-fold), lnc-ITPR2-3 (15-fold), lnc-SLC30A4-1 (15-fold), and lnc-SPAM1-6 (10-fold) were highly overexpressed in ccRCC, whereas lnc-ACACA-1 (135-fold), lnc-FOXG1-2 (19-fold), lnc-LCP2-2 (2-fold), lnc-RP3-368B9 (19-fold), and lnc-TTC34-3 (314-fold) were downregulated. There was no correlation between lncRNA expression with clinical-pathological parameters. Computational analyses revealed that these lncRNAs are involved in RNA-protein networks related to splicing, binding, transport, localization, and processing of RNA. Small interfering RNA (siRNA)-mediated knockdown of lnc-BMP2-2 and lnc-CPN2-1 did not influence cell proliferation.
We identified many novel lncRNA transcripts dysregulated in ccRCC which may be useful for novel diagnostic biomarkers.
Electronic supplementary material
The online version of this article (doi:10.1186/s13148-015-0047-7) contains supplementary material, which is available to authorized users.
The majority of the human genome is transcribed, even though only 2% of transcripts encode proteins. Non-coding transcripts were originally dismissed as evolutionary junk or transcriptional noise, but with the development of whole genome technologies, these non-coding RNAs (ncRNAs) are emerging as molecules with vital roles in regulating gene expression. While shorter ncRNAs have been extensively studied, the functional roles of long ncRNAs (lncRNAs) are still being elucidated. Studies over the last decade show that lncRNAs are emerging as new players in a number of diseases including cancer. Potential roles in both oncogenic and tumor suppressive pathways in cancer have been elucidated, but the biological functions of the majority of lncRNAs remain to be identified. Accumulated data are identifying the molecular mechanisms by which lncRNA mediates both structural and functional roles. LncRNA can regulate gene expression at both transcriptional and post-transcriptional levels, including splicing and regulating mRNA processing, transport, and translation. Much current research is aimed at elucidating the function of lncRNAs in breast cancer and mammary gland development, and at identifying the cellular processes influenced by lncRNAs. In this paper we review current knowledge of lncRNAs contributing to these processes and present lncRNA as a new paradigm in breast cancer development.
long non-coding RNA; breast cancer; mammary gland development; gene regulation; epigenetics
In the last years it has become increasingly clear that the mammalian transcriptome is highly complex and includes a large number of small non-coding RNAs (sncRNAs) and long noncoding RNAs (lncRNAs). Here we review the biogenesis pathways of the three classes of sncRNAs, namely short interfering RNAs (siRNAs), microRNAs (miRNAs) and PIWI-interacting RNAs (piRNAs). These ncRNAs have been extensively studied and are involved in pathways leading to specific gene silencing and the protection of genomes against virus and transposons, for example. Also, lncRNAs have emerged as pivotal molecules for the transcriptional and post-transcriptional regulation of gene expression which is supported by their tissue-specific expression patterns, subcellular distribution, and developmental regulation. Therefore, we also focus our attention on their role in differentiation and development. SncRNAs and lncRNAs play critical roles in defining DNA methylation patterns, as well as chromatin remodeling thus having a substantial effect in epigenetics. The identification of some overlaps in their biogenesis pathways and functional roles raises the hypothesis that these molecules play concerted functions in vivo, creating complex regulatory networks where cooperation with regulatory proteins is necessary. We also highlighted the implications of biogenesis and gene expression deregulation of sncRNAs and lncRNAs in human diseases like cancer.
sncRNAs; lncRNAs; miRNAs; siRNAs; piRNAs; gene expression regulation; epigenetic regulation
The transcriptome of a cell is represented by a myriad of different RNA molecules with and without protein-coding capacities. In recent years, advances in sequencing technologies have allowed researchers to more fully appreciate the complexity of whole transcriptomes, showing that the vast majority of the genome is transcribed, producing a diverse population of non-protein coding RNAs (ncRNAs). Thus, the biological significance of non-coding RNAs (ncRNAs) have been largely underestimated. Amongst these multiple classes of ncRNAs, the long non-coding RNAs (lncRNAs) are apparently the most numerous and functionally diverse. A small but growing number of lncRNAs have been experimentally studied, and a view is emerging that these are key regulators of epigenetic gene regulation in mammalian cells. LncRNAs have already been implicated in human diseases such as cancer and neurodegeneration, highlighting the importance of this emergent field. In this article, we review the catalogs of annotated lncRNAs and the latest advances in our understanding of lncRNAs.
non-coding RNAs; regulation; long non-coding RNA; epigenetics
Recent large-scale transcriptome analyses have revealed that transcription is spread throughout the mammalian genomes, yielding large numbers of transcripts, including long non-coding RNAs (lncRNAs) with little or no protein-coding capacity. Dozens of lncRNAs have been identified as biologically significant. In many cases, lncRNAs act as key molecules in the regulation of processes such as chromatin remodeling, transcription, and post-transcriptional processing. Several lncRNAs (e.g., MALAT1, HOTAIR, and ANRIL) are associated with human diseases, including cancer. Those lncRNAs associated with cancer are often aberrantly expressed. Although the underlying molecular mechanisms by which lncRNAs regulate cancer development are unclear, recent studies have revealed that such aberrant expression of lncRNAs affects the progression of cancers. In this review, we highlight recent findings regarding the roles of lncRNAs in cancer biology.
large non-coding RNA; cancer; disease; MALAT1; HOTAIR; ANRIL
Long non-coding RNAs (lncRNAs) as a key group of non-coding RNAs have gained widely attention. Though lncRNAs have been functionally annotated and systematic explored in higher mammals, few are under systematical identification and annotation. Owing to the expression specificity, known lncRNAs expressed in embryonic brain tissues remain still limited. Considering a large number of lncRNAs are only transcribed in brain tissues, studies of lncRNAs in developmental brain are therefore of special interest. Here, publicly available RNA-sequencing (RNA-seq) data in embryonic brain are integrated to identify thousands of embryonic brain lncRNAs by a customized pipeline. A significant proportion of novel transcripts have not been annotated by available genomic resources. The putative embryonic brain lncRNAs are shorter in length, less spliced and show less conservation than known genes. The expression of putative lncRNAs is in one tenth on average of known coding genes, while comparable with known lncRNAs. From chromatin data, putative embryonic brain lncRNAs are associated with active chromatin marks, comparable with known lncRNAs. Embryonic brain expressed lncRNAs are also indicated to have expression though not evident in adult brain. Gene Ontology analysis of putative embryonic brain lncRNAs suggests that they are associated with brain development. The putative lncRNAs are shown to be related to possible cis-regulatory roles in imprinting even themselves are deemed to be imprinted lncRNAs. Re-analysis of one knockdown data suggests that four regulators are associated with lncRNAs. Taken together, the identification and systematic analysis of putative lncRNAs would provide novel insights into uncharacterized mouse non-coding regions and the relationships with mammalian embryonic brain development.
Accumulating evidence highlights the potential role of long non-coding RNAs (lncRNAs) as biomarkers and therapeutic targets in solid tumors. However, the role of lncRNA expression in human breast cancer biology, prognosis and molecular classification remains unknown. Herein, we established the lncRNA profile of 658 infiltrating ductal carcinomas of the breast from The Cancer Genome Atlas project. We found lncRNA expression to correlate with the gene expression and chromatin landscape of human mammary epithelial cells (non-transformed) and the breast cancer cell line MCF-7. Unsupervised consensus clustering of lncRNA revealed four subgroups that displayed different prognoses. Gene set enrichment analysis for cis- and trans-acting lncRNAs showed enrichment for breast cancer signatures driven by master regulators of breast carcinogenesis. Interestingly, the lncRNA HOTAIR was significantly overexpressed in the HER2-enriched subgroup, while the lncRNA HOTAIRM1 was significantly overexpressed in the basal-like subgroup. Estrogen receptor (ESR1) expression was associated with distinct lncRNA networks in lncRNA clusters III and IV. Importantly, almost two thirds of the lncRNAs were marked by enhancer chromatin modifications (i.e., H3K27ac), suggesting that expressed lncRNA in breast cancer drives carcinogenesis through increased activity of neighboring genes. In summary, our study depicts the first lncRNA subtype classification in breast cancer and provides the framework for future studies to assess the interplay between lncRNAs and the breast cancer epigenome.
breast cancer; enhancers; expression profiling; lncRNA; RNA-Seq
Intron-derived long noncoding RNAs with snoRNA ends (sno-lncRNAs) are highly expressed from the imprinted Prader-Willi syndrome (PWS) region on human chromosome 15. However, sno-lncRNAs from other regions of the human genome or from other genomes have not yet been documented.
By exploring non-polyadenylated transcriptomes from human, rhesus and mouse, we have systematically annotated sno-lncRNAs expressed in all three species. In total, using available data from a limited set of cell lines, 19 sno-lncRNAs have been identified with tissue- and species-specific expression patterns. Although primary sequence analysis revealed that snoRNAs themselves are conserved from human to mouse, sno-lncRNAs are not. PWS region sno-lncRNAs are highly expressed in human and rhesus monkey, but are undetectable in mouse. Importantly, the absence of PWS region sno-lncRNAs in mouse suggested a possible reason why current mouse models fail to fully recapitulate pathological features of human PWS. In addition, a RPL13A region sno-lncRNA was specifically revealed in mouse embryonic stem cells, and its snoRNA ends were reported to influence lipid metabolism. Interestingly, the RPL13A region sno-lncRNA is barely detectable in human. We further demonstrated that the formation of sno-lncRNAs is often associated with alternative splicing of exons within their parent genes, and species-specific alternative splicing leads to unique expression pattern of sno-lncRNAs in different animals.
Comparative transcriptomes of non-polyadenylated RNAs among human, rhesus and mouse revealed that the expression of sno-lncRNAs is species-specific and that their processing is closely linked to alternative splicing of their parent genes. This study thus further demonstrates a complex regulatory network of coding and noncoding parts of the mammalian genome.
lncRNA; sno-lncRNA; Alternative splicing; Species-specific; PWS
Comprehensive analysis of the mammalian transcriptome has revealed that long non-coding RNAs (lncRNAs) may make up a large fraction of cellular transcripts. Recent years have seen a surge of studies aimed at functionally characterizing the role of lncRNAs in development and disease. In this review, we discuss new findings implicating lncRNAs in controlling development of the central nervous system (CNS). The evolution of the higher vertebrate brain has been accompanied by an increase in the levels and complexities of lncRNAs expressed within the developing nervous system. Although a limited number of CNS-expressed lncRNAs are now known to modulate the activity of proteins important for neuronal differentiation, the function of the vast majority of neuronal-expressed lncRNAs is still unknown. Topics of intense current interest include the mechanism by which CNS-expressed lncRNAs might function in epigenetic and transcriptional regulation during neuronal development, and how gain and loss of function of individual lncRNAs contribute to neurological diseases.
cell fate; neurogenesis; embryonic stem cells; neural stem cells; transcription factors; epigenetics; long noncoding RNA; molecular scaffold
A growing body of evidence shows that long non-coding RNAs (lncRNAs) are involved in multiple human diseases than previously realized. However, no information is available now about lncRNAs in cardiac fibroblasts. The expression profile of lncRNAs was analyzed in Ang II-treated cardiac fibroblasts using lncRNAs arrays. The analysis showed that 282 of 4376 detected lncRNAs demonstrated >2-fold differential expression in response to the treatment with Ang II (100 nm) for 24 h. Among of them, 22 lncRNAs showed a greater than 4-fold changes. Meanwhile, Ang II also induced a widely expression changes in protein-coding genes in cardiac fibroblasts. Quantitative real time PCR confirmed the changes of six lncRNAs (AF159100, BC086588, MRNR026574, MRAK134679, NR024118, AX765700) and mRNAs (IL6, RGS2, PRG4, TIMP1, Cdkn1c, TIMP3, Col I, Col III and Fibronectin) in cardiac fibroblasts. Bioinformatic analysis indicated the process of cell proliferation. Further studies revealed that the down-regulating of Ang II on the expression of lncRNA-NR024118 was time-dependent, that the level of NR024118 was lowest at 24 h and back at 48 h. Ang II also dynamically down regulated the expression of Cdkn1c in cardiac fibroblasts. Ang II at a range from 10-9 M to 10-6 M induced a decrease of NR024118 and Cdkn1c in cardiac fibroblasts. In conclusion, the expression profile of lncRNAs was significantly altered in the Ang II-treated cardiac fibroblasts and Ang II dynamically regulated the expression of lncRNA-NR024118 and Cdkn1c in cardiac fibroblasts, indicating the potential role of NR024118 in cardiac fibroblasts.
Angiotensin Π; cardiac fibroblasts; long non-coding RNA
Thousands of long noncoding RNAs (lncRNAs) have been reported in mammalian genomes. These RNAs represent an important subset of pervasive genes involved in a broad range of biological functions. Aberrant expression of lncRNAs is associated with many types of cancers. Here, in order to explore the potential lncRNAs involved in hepatocellular carcinoma (HCC) oncogenesis, we performed lncRNA gene expression profile analysis in 3 pairs of human HCC and adjacent non-tumor (NT) tissues by microarray.
Differentially expressed lncRNAs and mRNAs were detected by human lncRNA microarray containing 33,045 lncRNAs and 30,215 coding transcripts. Bioinformatic analyses (gene ontology, pathway and network analysis) were applied for further study of these differentially expressed mRNAs. By qRT-PCR analysis in nineteen pairs of HCC and adjacent normal tissues, we found that eight lncRNAs were aberrantly expressed in HCC compared with adjacent NT tissues, which is consistent with microarray data.
We identified 214 lncRNAs and 338 mRNAs abnormally expressed in all three HCC tissues (Fold Change ≥2.0, P<0.05 and FDR <0.05) with the genome-wide lncRNAs and mRNAs expression profile analysis. The lncRNA-mRNA co-expression network was constructed, which may be used for predicting target genes of lncRNAs. Furthermore, we demonstrated for the first time that BC017743, ENST00000395084, NR_026591, NR_015378 and NR_024284 were up-regulated, whereas NR_027151, AK056988 and uc003yqb.1 were down-regulated in nineteen pairs of HCC samples compared with adjacent NT samples. Expression of seven lncRNAs was significantly correlated to their nearby coding genes. In conclusion, our results indicated that the lncRNA expression profile in HCC was significantly changed, and we identified a series of new hepatocarcinoma associated lncRNAs. These results provide important insights about the lncRNAs in HCC pathogenesis.
Long non-coding RNAs (lncRNAs) are key regulatory molecules involved in a variety of biological processes and human diseases. However, the pathological effects of lncRNAs on primary varicose great saphenous veins (GSVs) remain unclear. The purpose of the present study was to identify aberrantly expressed lncRNAs involved in the prevalence of GSV varicosities and predict their potential functions. Using microarray with 33,045 lncRNA and 30,215 mRNA probes, 557 lncRNAs and 980 mRNAs that differed significantly in expression between the varicose great saphenous veins and control veins were identified in six pairs of samples. These lncRNAs were sub-grouped and mRNAs expressed at different levels were clustered into several pathways with six focused on metabolic pathways. Quantitative real-time PCR replication of nine lncRNAs was performed in 32 subjects, validating six lncRNAs (AF119885, AK021444, NR_027830, G36810, NR_027927, uc.345-). A coding-non-coding gene co-expression network revealed that four of these six lncRNAs may be correlated with 11 mRNAs and pathway analysis revealed that they may be correlated with another 8 mRNAs associated with metabolic pathways. In conclusion, aberrantly expressed lncRNAs for GSV varicosities were here systematically screened and validated and their functions were predicted. These findings provide novel insight into the physiology of lncRNAs and the pathogenesis of varicose veins for further investigation. These aberrantly expressed lncRNAs may serve as new therapeutic targets for varicose veins. The Human Ethnics Committee of Shanghai East Hospital, Tongji University School of Medicine approved the study (NO.: 2011-DF-53).
Neurodegenerative diseases in general and specifically late-onset Alzheimer’s disease (LOAD) involve a genetically complex and largely obscure ensemble of causative and risk factors accompanied by complex feedback responses. The advent of “high-throughput” transcriptome investigation technologies such as microarray and deep sequencing is increasingly being combined with sophisticated statistical and bioinformatics analysis methods complemented by knowledge-based approaches such as Bayesian Networks or network and graph analyses. Together, such “integrative” studies are beginning to identify co-regulated gene networks linked with biological pathways and potentially modulating disease predisposition, outcome, and progression. Specifically, bioinformatics analyses of integrated microarray and genotyping data in cases and controls reveal changes in gene expression of both protein-coding and small and long regulatory RNAs; highlight relevant quantitative transcriptional differences between LOAD and non-demented control brains and demonstrate reconfiguration of functionally meaningful molecular interaction structures in LOAD. These may be measured as changes in connectivity in “hub nodes” of relevant gene networks (Zhang etal., 2013). We illustrate here the open analytical questions in the transcriptome investigation of neurodegenerative disease studies, proposing “ad hoc” strategies for the evaluation of differential gene expression and hints for a simple analysis of the non-coding RNA (ncRNA) part of such datasets. We then survey the emerging role of long ncRNAs (lncRNAs) in the healthy and diseased brain transcriptome and describe the main current methods for computational modeling of gene networks. We propose accessible modular and pathway-oriented methods and guidelines for bioinformatics investigations of whole transcriptome next generation sequencing datasets. We finally present methods and databases for functional interpretations of lncRNAs and propose a simple heuristic approach to visualize and represent physical and functional interactions of the coding and non-coding components of the transcriptome. Integrating in a functional and integrated vision coding and ncRNA analyses is of utmost importance for current and future analyses of neurodegenerative transcriptomes.
neurodegenerative diseases; bioinformatics and computational biology; next-generation sequencing; non-coding RNA; biological networks