Gene profiling of diffuse large B cell lymphoma (DLBCL) has revealed broad gene expression deregulation compared to normal B cells. While many studies have interrogated well known and annotated genes in DLBCL, none have yet performed a systematic analysis to uncover novel unannotated long non-coding RNAs (lncRNA) in DLBCL. In this study we sought to uncover these lncRNAs by examining RNA-seq data from primary DLBCL tumors and performed supporting analysis to identify potential role of these lncRNAs in DLBCL.
We performed a systematic analysis of novel lncRNAs from the poly-adenylated transcriptome of 116 primary DLBCL samples. RNA-seq data were processed using de novo transcript assembly pipeline to discover novel lncRNAs in DLBCL. Systematic functional, mutational, cross-species, and co-expression analyses using numerous bioinformatics tools and statistical analysis were performed to characterize these novel lncRNAs.
We identified 2,632 novel, multi-exonic lncRNAs expressed in more than one tumor, two-thirds of which are not expressed in normal B cells. Long read single molecule sequencing supports the splicing structure of many of these lncRNAs. More than one-third of novel lncRNAs are differentially expressed between the two major DLBCL subtypes, ABC and GCB. Novel lncRNAs are enriched at DLBCL super-enhancers, with a fraction of them conserved between human and dog lymphomas. We see transposable elements (TE) overlap in the exonic regions; particularly significant in the last exon of the novel lncRNAs suggest potential usage of cryptic TE polyadenylation signals. We identified highly co-expressed protein coding genes for at least 88 % of the novel lncRNAs. Functional enrichment analysis of co-expressed genes predicts a potential function for about half of novel lncRNAs. Finally, systematic structural analysis of candidate point mutations (SNVs) suggests that such mutations frequently stabilize lncRNA structures instead of destabilizing them.
Discovery of these 2,632 novel lncRNAs in DLBCL significantly expands the lymphoma transcriptome and our analysis identifies potential roles of these lncRNAs in lymphomagenesis and/or tumor maintenance. For further studies, these novel lncRNAs also provide an abundant source of new targets for antisense oligonucleotide pharmacology, including shared targets between human and dog lymphomas.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0230-7) contains supplementary material, which is available to authorized users.
Long non-coding RNAs (lncRNAs) are emerging as molecules that significantly impact many cellular processes and have been associated with almost every human cancer. Compared to protein-coding genes, lncRNA genes are often associated with transposable elements, particularly with endogenous retroviral elements (ERVs). ERVs can have potentially deleterious effects on genome structure and function, so these elements are typically silenced in normal somatic tissues, albeit with varying efficiency. The aberrant regulation of ERVs associated with lncRNAs (ERV-lncRNAs), coupled with the diverse range of lncRNA functions, creates significant potential for ERV-lncRNAs to impact cancer biology.
We used RNA-seq analysis to identify and profile the expression of a novel lncRNA in six large cohorts, including over 7,500 samples from The Cancer Genome Atlas (TCGA).
We identified the tumor-specific expression of a novel lncRNA that we have named Endogenous retroViral-associated ADenocarcinoma RNA or ‘EVADR’, by analyzing RNA-seq data derived from colorectal tumors and matched normal control tissues. Subsequent analysis of TCGA RNA-seq data revealed the striking association of EVADR with adenocarcinomas, which are tumors of glandular origin. Moderate to high levels of EVADR were detected in 25 to 53% of colon, rectal, lung, pancreas and stomach adenocarcinomas (mean = 30 to 144 FPKM), and EVADR expression correlated with decreased patient survival (Cox regression; hazard ratio = 1.47, 95% confidence interval = 1.06 to 2.04, P = 0.02). In tumor sites of non-glandular origin, EVADR expression was detectable at only very low levels and in less than 10% of patients. For EVADR, a MER48 ERV element provides an active promoter to drive its transcription. Genome-wide, MER48 insertions are associated with nine lncRNAs, but none of the MER48-associated lncRNAs other than EVADR were consistently expressed in adenocarcinomas, demonstrating the specific activation of EVADR. The sequence and structure of the EVADR locus is highly conserved among Old World monkeys and apes but not New World monkeys or prosimians, where the MER48 insertion is absent. Conservation of the EVADR locus suggests a functional role for this novel lncRNA in humans and our closest primate relatives.
Our results describe the specific activation of a highly conserved ERV-lncRNA in numerous cancers of glandular origin, a finding with diagnostic, prognostic and therapeutic implications.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0142-6) contains supplementary material, which is available to authorized users.
Thousands of long noncoding RNAs (lncRNAs) have been reported in mammalian genomes. These RNAs represent an important subset of pervasive genes involved in a broad range of biological functions. Aberrant expression of lncRNAs is associated with many types of cancers. Here, in order to explore the potential lncRNAs involved in hepatocellular carcinoma (HCC) oncogenesis, we performed lncRNA gene expression profile analysis in 3 pairs of human HCC and adjacent non-tumor (NT) tissues by microarray.
Differentially expressed lncRNAs and mRNAs were detected by human lncRNA microarray containing 33,045 lncRNAs and 30,215 coding transcripts. Bioinformatic analyses (gene ontology, pathway and network analysis) were applied for further study of these differentially expressed mRNAs. By qRT-PCR analysis in nineteen pairs of HCC and adjacent normal tissues, we found that eight lncRNAs were aberrantly expressed in HCC compared with adjacent NT tissues, which is consistent with microarray data.
We identified 214 lncRNAs and 338 mRNAs abnormally expressed in all three HCC tissues (Fold Change ≥2.0, P<0.05 and FDR <0.05) with the genome-wide lncRNAs and mRNAs expression profile analysis. The lncRNA-mRNA co-expression network was constructed, which may be used for predicting target genes of lncRNAs. Furthermore, we demonstrated for the first time that BC017743, ENST00000395084, NR_026591, NR_015378 and NR_024284 were up-regulated, whereas NR_027151, AK056988 and uc003yqb.1 were down-regulated in nineteen pairs of HCC samples compared with adjacent NT samples. Expression of seven lncRNAs was significantly correlated to their nearby coding genes. In conclusion, our results indicated that the lncRNA expression profile in HCC was significantly changed, and we identified a series of new hepatocarcinoma associated lncRNAs. These results provide important insights about the lncRNAs in HCC pathogenesis.
A novel class of transcripts, long non-coding RNAs (lncRNAs), has recently emerged as a key player in several biological processes, and important roles for these molecules have been reported in a number of complex human diseases, such as autoimmune diseases, neurological disorders, and various cancers. However, the aberrant lncRNAs implicated in myasthenia gravis (MG) remain unknown. The aim of the present study was to explore the abnormal expression of lncRNAs in peripheral blood mononuclear cells (PBMCs) and examine mRNA regulatory relationship networks among MG patients with or without thymoma.
Microarray assays were performed, and the outstanding differences between lncRNAs or mRNA expression were verified through RT-PCR. The lncRNAs functions were annotated for the target genes using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) biological pathway. The potential regulatory relationships between the lncRNAs and target genes were analyzed using the ‘cis’ and ‘trans’ model. Outstanding lncRNAs were organized to generate a TF-lncRNA-gene network using Cytoscape software.
The lncRNA and mRNA expression profile analysis revealed subsets of differentially expressed genes in MG patients with or without thymoma. A total of 12 outstanding dysregulated expression lncRNAs, such as lncRNA oebiotech_11933, were verified through real-time PCR. Several GO terms including the cellular response to interferon-γ, platelet degranulation, chemokine receptor binding and cytokine interactions were very important in MG pathogenesis. The chromosome locations of some lncRNAs and associated co-expression genes were demonstrated using ‘cis’ analysis. The results of the ‘trans’ analysis revealed that some TFs (i.e., CTCF, TAF1and MYC) regulate lncRNA and gene expression. The outstanding lncRNAs in each group were implicated in the regulation of the TF-lncRNA-target gene network.
The results of the present study provide a perspective on lncRNA expression in MG. We identify a subset of aberrant lncRNAs and mRNAs as potential biomarkers for the diagnosis of MG. The GO and KEGG pathway analysis provides an annotation to determine the functions of these lncRNAs. The results of the ‘cis’ and ‘trans’ analyses provide information concerning the modular regulation of lncRNAs.
Electronic supplementary material
The online version of this article (doi:10.1186/s12920-015-0087-z) contains supplementary material, which is available to authorized users.
Long non-coding RNAs (lncRNAs); Myasthenia gravis(MG); Thymoma; Cis; Transcription factors
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases.
Long non-coding RNAs (lncRNAs) comprise a novel, fascinating class of RNAs with largely unknown biological functions. Parkinson's-disease (PD) is the most frequent motor disorder, and Deep-brain-stimulation (DBS) treatment alleviates the symptoms, but early disease biomarkers are still unknown and new future genetic interference targets are urgently needed. Using RNA-sequencing technology and a novel computational workflow for in-depth exploration of whole-transcriptome RNA-seq datasets, we detected and analyzed lncRNAs in sequenced libraries from PD patients' leukocytes pre and post-treatment and the brain, adding this full profile resource of over 7,000 lncRNAs to the few human tissues-derived lncRNA datasets that are currently available. Our study includes sample-specific database construction, detecting disease-derived changes in known and novel lncRNAs, exons and junctions and predicting corresponding changes in Polyadenylation choices, protein domains and miRNA binding sites. We report widespread transcript structure variations at the splice junction and exons levels, including novel exons and junctions and alteration of lncRNAs followed by experimental validation in PD leukocytes and two PD brain regions compared with controls. Our results suggest lncRNAs involvement in neurodegenerative diseases, and specifically PD. This comprehensive workflow will be of use to the increasing number of laboratories producing RNA-Seq data in a wide range of biomedical studies.
Pancreatic ductal adenocarcinoma (PDAC) is known by its aggressiveness and lack of effective therapeutic options. Thus, improvement in current knowledge of molecular changes associated with pancreatic cancer is urgently needed to explore novel venues of diagnostics and treatment of this dismal disease. While there is mounting evidence that long noncoding RNAs (lncRNAs) transcribed from intronic and intergenic regions of the human genome may play different roles in the regulation of gene expression in normal and cancer cells, their expression pattern and biological relevance in pancreatic cancer is currently unknown. In the present work we investigated the relative abundance of a collection of lncRNAs in patients' pancreatic tissue samples aiming at identifying gene expression profiles correlated to pancreatic cancer and metastasis.
Custom 3,355-element spotted cDNA microarray interrogating protein-coding genes and putative lncRNA were used to obtain expression profiles from 38 clinical samples of tumor and non-tumor pancreatic tissues. Bioinformatics analyses were performed to characterize structure and conservation of lncRNAs expressed in pancreatic tissues, as well as to identify expression signatures correlated to tissue histology. Strand-specific reverse transcription followed by PCR and qRT-PCR were employed to determine strandedness of lncRNAs and to validate microarray results, respectively.
We show that subsets of intronic/intergenic lncRNAs are expressed across tumor and non-tumor pancreatic tissue samples. Enrichment of promoter-associated chromatin marks and over-representation of conserved DNA elements and stable secondary structure predictions suggest that these transcripts are generated from independent transcriptional units and that at least a fraction is under evolutionary selection, and thus potentially functional.
Statistically significant expression signatures comprising protein-coding mRNAs and lncRNAs that correlate to PDAC or to pancreatic cancer metastasis were identified. Interestingly, loci harboring intronic lncRNAs differentially expressed in PDAC metastases were enriched in genes associated to the MAPK pathway. Orientation-specific RT-PCR documented that intronic transcripts are expressed in sense, antisense or both orientations relative to protein-coding mRNAs. Differential expression of a subset of intronic lncRNAs (PPP3CB, MAP3K14 and DAPK1 loci) in metastatic samples was confirmed by Real-Time PCR.
Our findings reveal sets of intronic lncRNAs expressed in pancreatic tissues whose abundance is correlated to PDAC or metastasis, thus pointing to the potential relevance of this class of transcripts in biological processes related to malignant transformation and metastasis in pancreatic cancer.
pancreatic cancer; molecular markers; noncoding RNAs; intronic transcription; metastasis; MAPK ; pathway; cDNA microarrays
Glioblastoma multiforme (GBM) is one of the most common brain tumors and the first cancer studied by The Cancer Genome Atlas (TCGA). GBM is a very aggressive disease with one of the worst 5-year survival rates among all human cancers. Molecular analysis has revealed several intrinsic subtypes based on gene expression relevant for GBM, including the mesenchymal subtype. Long non-coding RNAs (lncRNAs) have not been well studied in GBM and their function is largely unknown. A custom RNA-seq pre-processing pipeline was used on GBM primary tumor samples to extract lncRNA expression with respect to the mesenchymal subtype. We used the STAR algorithm in combination with the Gencode lncRNA database to derive lncRNA expression calls. Next, we used the FeatureCounts algorithm to quantify the expression calls. The counts were normalized using the DESeq2 rlog transformation algorithm. After alignment using STAR, 101 samples passed the 75% alignment filter, representing all gene expression subtypes. Next, using FeatureCounts, we had expression calls for 6667 lncRNAs. After removing lncRNAs with variance below .01, there were 4433 lncRNAs with expression counts across all samples. Out of the top10 varying lncRNAs, a lncRNA with unknown function was significantly negatively correlated with the mesenchymal subtype (p-value of 9.4E-05). Next, we used k-means clustering on the top 4000 most varying coding genes to divide them in co-expressed clusters, or metagenes. We correlated the novel lncRNA with these metagenes to identify its function. The top positively correlated metagene was enriched in neurogenesis, neuron differentiation and neuron apoptosis. We identified a potential role for a novel lncRNA strongly related to the mesenchymal subtype and crucial for normal neurogenesis. We hypothesize that this lncRNA is essential for neurogenesis and must be removed for glioblastomas to undergo neural to mesenchymal (NMT) transition.
Although analysis pipelines have been developed to use RNA-seq to identify long non-coding RNAs (lncRNAs), inference of their biological and pathological relevance remains a challenge. As a result, most transcriptome studies of autoimmune disease have only assessed protein-coding transcripts.
We used RNA-seq data from 99 lesional psoriatic, 27 uninvolved psoriatic, and 90 normal skin biopsies, and applied computational approaches to identify and characterize expressed lncRNAs. We detect 2,942 previously annotated and 1,080 novel lncRNAs which are expected to be skin specific. Notably, over 40% of the novel lncRNAs are differentially expressed and the proportions of differentially expressed transcripts among protein-coding mRNAs and previously-annotated lncRNAs are lower in psoriasis lesions versus uninvolved or normal skin. We find that many lncRNAs, in particular those that are differentially expressed, are co-expressed with genes involved in immune related functions, and that novel lncRNAs are enriched for localization in the epidermal differentiation complex. We also identify distinct tissue-specific expression patterns and epigenetic profiles for novel lncRNAs, some of which are shown to be regulated by cytokine treatment in cultured human keratinocytes.
Together, our results implicate many lncRNAs in the immunopathogenesis of psoriasis, and our results provide a resource for lncRNA studies in other autoimmune diseases.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0570-4) contains supplementary material, which is available to authorized users.
Long noncoding RNAs (lncRNAs) have crucial roles in cancer biology. We performed a genome-wide analysis of lncRNA expression in hepatoblastoma tissues to identify novel targets for further study of hepatoblastoma. Hepatoblastoma and normal liver tissue samples were obtained from hepatoblastoma patients. The genome-wide analysis of lncRNA expression in these tissues was performed using a 4×180 K lncRNA microarray and Sureprint G3 Human lncRNA Chips. Quantitative RT-PCR (qRT-PCR) was performed to confirm these results. The differential expressions of lncRNAs and mRNAs were identified through fold-change filtering. Gene Ontology (GO) and pathway analyses were performed using the standard enrichment computation method. Associations between lncRNAs and adjacent protein-coding genes were determined through complex transcriptional loci analysis. We found that 2736 lncRNAs were differentially expressed in hepatoblastoma tissues. Among these, 1757 lncRNAs were upregulated more than two-fold relative to normal tissues and 979 lncRNAs were downregulated. Moreover, in hepatoblastoma there were 420 matched lncRNA-mRNA pairs for 120 differentially expressed lncRNAs, and 167 differentially expressed mRNAs. The co-expression network analysis predicted 252 network nodes and 420 connections between 120 lncRNAs and 132 coding genes. Within this co-expression network, 369 pairs were positive, and 51 pairs were negative. Lastly, qRT-PCR data verified six upregulated and downregulated lncRNAs in hepatoblastoma, plus endothelial cell-specific molecule 1 (ESM1) mRNA. Our results demonstrated that expression of these aberrant lncRNAs could respond to hepatoblastoma development. Further study of these lncRNAs could provide useful insight into hepatoblastoma biology.
Recent advances in sequencing technology have opened a new era in RNA studies. Novel types of RNAs such as long non-coding RNAs (lncRNAs) have been discovered by transcriptomic sequencing and some lncRNAs have been found to play essential roles in biological processes. However, only limited information is available for lncRNAs in Drosophila melanogaster, an important model organism. Therefore, the characterization of lncRNAs and identification of new lncRNAs in D. melanogaster is an important area of research. Moreover, there is an increasing interest in the use of ChIP-seq data (H3K4me3, H3K36me3 and Pol II) to detect signatures of active transcription for reported lncRNAs.
We have developed a computational approach to identify new lncRNAs from two tissue-specific RNA-seq datasets using the poly(A)-enriched and the ribo-zero method, respectively. In our results, we identified 462 novel lncRNA transcripts, which we combined with 4137 previously published lncRNA transcripts into a curated dataset. We then utilized 61 RNA-seq and 32 ChIP-seq datasets to improve the annotation of the curated lncRNAs with regards to transcriptional direction, exon regions, classification, expression in the brain, possession of a poly(A) tail, and presence of conventional chromatin signatures. Furthermore, we used 30 time-course RNA-seq datasets and 32 ChIP-seq datasets to investigate whether the lncRNAs reported by RNA-seq have active transcription signatures. The results showed that more than half of the reported lncRNAs did not have chromatin signatures related to active transcription. To clarify this issue, we conducted RT-qPCR experiments and found that ~95.24 % of the selected lncRNAs were truly transcribed, regardless of whether they were associated with active chromatin signatures or not.
In this study, we discovered a large number of novel lncRNAs, which suggests that many remain to be identified in D. melanogaster. For the lncRNAs that are known, we improved their characterization by integrating a large number of sequencing datasets (93 sets in total) from multiple sources (lncRNAs, RNA-seq and ChIP-seq). The RT-qPCR experiments demonstrated that RNA-seq is a reliable platform to discover lncRNAs. This set of curated lncRNAs with improved annotations can serve as an important resource for investigating the function of lncRNAs in D. melanogaster.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-2457-0) contains supplementary material, which is available to authorized users.
Long non-coding RNA; Active transcription; ChIP-seq; RNA-seq; Drosophila melanogaster
Long non-coding RNAs (lncRNA) play an important role in carcinogenesis; knowledge on lncRNA expression in renal cell carcinoma is rudimental. As a basis for biomarker development, we aimed to explore the lncRNA expression profile in clear cell renal cell carcinoma (ccRCC) tissue.
Microarray experiments were performed to determine the expression of 32,183 lncRNA transcripts belonging to 17,512 lncRNAs in 15 corresponding normal and malignant renal tissues. Validation was performed using quantitative real-time PCR in 55 ccRCC and 52 normal renal specimens. Computational analysis was performed to determine lncRNA-microRNA (MiRTarget2) and lncRNA-protein (catRAPID omics) interactions. We identified 1,308 dysregulated transcripts (expression change >2-fold; upregulated: 568, downregulated: 740) in ccRCC tissue. Among these, aberrant expression was validated using PCR: lnc-BMP2-2 (mean expression change: 37-fold), lnc-CPN2-1 (13-fold), lnc-FZD1-2 (9-fold), lnc-ITPR2-3 (15-fold), lnc-SLC30A4-1 (15-fold), and lnc-SPAM1-6 (10-fold) were highly overexpressed in ccRCC, whereas lnc-ACACA-1 (135-fold), lnc-FOXG1-2 (19-fold), lnc-LCP2-2 (2-fold), lnc-RP3-368B9 (19-fold), and lnc-TTC34-3 (314-fold) were downregulated. There was no correlation between lncRNA expression with clinical-pathological parameters. Computational analyses revealed that these lncRNAs are involved in RNA-protein networks related to splicing, binding, transport, localization, and processing of RNA. Small interfering RNA (siRNA)-mediated knockdown of lnc-BMP2-2 and lnc-CPN2-1 did not influence cell proliferation.
We identified many novel lncRNA transcripts dysregulated in ccRCC which may be useful for novel diagnostic biomarkers.
Electronic supplementary material
The online version of this article (doi:10.1186/s13148-015-0047-7) contains supplementary material, which is available to authorized users.
Long intergenic non-coding RNAs (lncRNAs) represent an emerging and under-studied class of transcripts that play a significant role in human cancers. Due to the tissue- and cancer-specific expression patterns observed for many lncRNAs it is believed that they could serve as ideal diagnostic biomarkers. However, until each tumor type is examined more closely, many of these lncRNAs will remain elusive.
Here we characterize the lncRNA landscape in lung cancer using publicly available transcriptome sequencing data from a cohort of 567 adenocarcinoma and squamous cell carcinoma tumors. Through this compendium we identify over 3,000 unannotated intergenic transcripts representing novel lncRNAs. Through comparison of both adenocarcinoma and squamous cell carcinomas with matched controls we discover 111 differentially expressed lncRNAs, which we term lung cancer-associated lncRNAs (LCALs). A pan-cancer analysis of 324 additional tumor and adjacent normal pairs enable us to identify a subset of lncRNAs that display enriched expression specific to lung cancer as well as a subset that appear to be broadly deregulated across human cancers. Integration of exome sequencing data reveals that expression levels of many LCALs have significant associations with the mutational status of key oncogenes in lung cancer. Functional validation, using both knockdown and overexpression, shows that the most differentially expressed lncRNA, LCAL1, plays a role in cellular proliferation.
Our systematic characterization of publicly available transcriptome data provides the foundation for future efforts to understand the role of LCALs, develop novel biomarkers, and improve knowledge of lung tumor biology.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0429-8) contains supplementary material, which is available to authorized users.
Mammalian genomes are extensively transcribed producing thousands of long non-protein-coding RNAs (lncRNAs). The biological significance and function of the vast majority of lncRNAs remain unclear. Recent studies have implicated several lncRNAs as playing important roles in embryonic development and cancer progression. LncRNAs are characterized with different genomic architectures in relationship with their associated protein-coding genes. Our study aimed at bridging lncRNA architecture with dynamical patterns of their expression using differentiating human neuroblastoma cells model.
LncRNA expression was studied in a 120-hours timecourse of differentiation of human neuroblastoma SH-SY5Y cells into neurons upon treatment with retinoic acid (RA), the compound used for the treatment of neuroblastoma. A custom microarray chip was utilized to interrogate expression levels of 9,267 lncRNAs in the course of differentiation. We categorized lncRNAs into 19 architecture classes according to their position relatively to protein-coding genes. For each architecture class, dynamics of expression of lncRNAs was studied in association with their protein-coding partners. It allowed us to demonstrate positive correlation of lncRNAs with their associated protein-coding genes at bidirectional promoters and for sense-antisense transcript pairs. In contrast, lncRNAs located in the introns and downstream of the protein-coding genes were characterized with negative correlation modes. We further classified the lncRNAs by the temporal patterns of their expression dynamics. We found that intronic and bidirectional promoter architectures are associated with rapid RA-dependent induction or repression of the corresponding lncRNAs, followed by their constant expression. At the same time, lncRNAs expressed downstream of protein-coding genes are characterized by rapid induction, followed by transcriptional repression. Quantitative RT-PCR analysis confirmed the discovered functional modes for several selected lncRNAs associated with proteins involved in cancer and embryonic development.
This is the first report detailing dynamical changes of multiple lncRNAs during RA-induced neuroblastoma differentiation. Integration of genomic and transcriptomic levels of information allowed us to demonstrate specific behavior of lncRNAs organized in different genomic architectures. This study also provides a list of lncRNAs with possible roles in neuroblastoma.
The GENCODE project has collected over 10,000 human long non-coding RNA (lncRNA) genes. However, the vast majority of them remain to be functionally characterized. Computational investigation of potential functions of human lncRNA genes is helpful to guide further experimental studies on lncRNAs.
In this study, based on expression correlation between lncRNAs and protein-coding genes across 19 human normal tissues, we used the hypergeometric test to functionally annotate a single lncRNA or a set of lncRNAs with significantly enriched functional terms among the protein-coding genes that are significantly co-expressed with the lncRNA(s). The functional terms include all nodes in the Gene Ontology (GO) and 4,380 human biological pathways collected from 12 pathway databases. We successfully mapped 9,625 human lncRNA genes to GO terms and biological pathways, and then developed the first ontology-driven user-friendly web interface named lncRNA2Function, which enables researchers to browse the lncRNAs associated with a specific functional term, the functional terms associated with a specific lncRNA, or to assign functional terms to a set of human lncRNA genes, such as a cluster of co-expressed lncRNAs. The lncRNA2Function is freely available at http://mlg.hit.edu.cn/lncrna2function.
The LncRNA2Function is an important resource for further investigating the functions of a single human lncRNA, or functionally annotating a set of human lncRNAs of interest.
Ventricular septal defects (VSD) are the most common form of congenital heart disease, which is the leading non-infectious cause of death in children; nevertheless, the exact cause of VSD is not yet fully understood. Long non-coding RNAs (lncRNAs) have been shown to play key roles in various biological processes, such as imprinting control, circuitry controlling pluripotency and differentiation, immune responses and chromosome dynamics. Notably, a growing number of lncRNAs have been implicated in disease etiology, although an association with VSD has not been reported. In the present study, we conducted an integrated analysis of dysregulated lncRNAs, focusing specifically on the identification and characterization of lncRNAs potentially involving in initiation of VSD. Comparison of the transcriptome profiles of cardiac tissues from VSD-affected and normal hearts was performed using a second-generation lncRNA microarray, which covers the vast majority of expressed RefSeq transcripts (29,241 lncRNAs and 30,215 coding transcripts). In total, 880 lncRNAs were upregulated and 628 were downregulated in VSD. Furthermore, our established filtering pipeline indicated an association of two lncRNAs, ENST00000513542 and RP11-473L15.2, with VSD. This dysregulation of the lncRNA profile provides a novel insight into the etiology of VSD and furthermore, illustrates the intricate relationship between coding and ncRNA transcripts in cardiac development. These data may offer a background/reference resource for future functional studies of lncRNAs related to VSD.
Long non-coding RNAs (lncRNAs) as a key group of non-coding RNAs have gained widely attention. Though lncRNAs have been functionally annotated and systematic explored in higher mammals, few are under systematical identification and annotation. Owing to the expression specificity, known lncRNAs expressed in embryonic brain tissues remain still limited. Considering a large number of lncRNAs are only transcribed in brain tissues, studies of lncRNAs in developmental brain are therefore of special interest. Here, publicly available RNA-sequencing (RNA-seq) data in embryonic brain are integrated to identify thousands of embryonic brain lncRNAs by a customized pipeline. A significant proportion of novel transcripts have not been annotated by available genomic resources. The putative embryonic brain lncRNAs are shorter in length, less spliced and show less conservation than known genes. The expression of putative lncRNAs is in one tenth on average of known coding genes, while comparable with known lncRNAs. From chromatin data, putative embryonic brain lncRNAs are associated with active chromatin marks, comparable with known lncRNAs. Embryonic brain expressed lncRNAs are also indicated to have expression though not evident in adult brain. Gene Ontology analysis of putative embryonic brain lncRNAs suggests that they are associated with brain development. The putative lncRNAs are shown to be related to possible cis-regulatory roles in imprinting even themselves are deemed to be imprinted lncRNAs. Re-analysis of one knockdown data suggests that four regulators are associated with lncRNAs. Taken together, the identification and systematic analysis of putative lncRNAs would provide novel insights into uncharacterized mouse non-coding regions and the relationships with mammalian embryonic brain development.
Long non-coding RNAs (lncRNAs), which have evolved as important gene expression modulators, are involved in human malignancies. The down-regulation of lncRNA growth arrest specific transcript 5 (GAS5) has been reported in several cancers, however, the underlying mechanism of lncRNA GAS5 in stomach cancer is poorly understood. In this study, we found that lncRNA GAS5 had lower expression in stomach cancer tissues than the normal counterparts. lncRNA GAS5 was shown to interact with Y-box binding protein 1 (YBX1), and lncRNA GAS5 knockdown was shown to accelerate YBX1 protein turnover without affecting YBX1 transcription. lncRNA GAS5 down-regulation reduced the YBX1 protein level, which decreased YBX1-transactivated p21 expression and abolished G1 phase cell cycle arrest in stomach cancer. These results delineate a novel mechanism of lncRNA GAS5 in suppressing stomach carcinogenesis, and the lncRNA GAS5/YBX1/p21 pathway we discovered may provide useful targets for developing lncRNA-based therapies for stomach cancer.
Long non-coding RNAs (lncRNAs) are pervasively transcribed in the genome. They have important regulatory functions in chromatin remodeling and gene expression. Dysregulated lncRNAs have been studied in cancers, but their role in esophageal squamous cell carcinoma (ESCC) remains largely unknown. We have conducted lncRNA expression screening and a genome-wide analysis of lncRNA and coding gene expression on primary tumor and adjacent normal tissue from four ESCC patients, tend to understand the functionality of lncRNAs in carcinogenesis of esopheagus in combination with experimental and bioinformatics approach.
LncRNA array was used for coding and non-coding RNA expression. R program and Bioconductor packages (limma and RedeR) were used for differential expression and co-expression network analysis, followed by independent confirmation and functional studies of inferred onco-lncRNA ESCCAL-1 using quantitative real time polymerase chain reaction, small interfering RNA-mediated knockdown, apoptosis and invasion assays in vitro.
The global coding and lncRNA gene expression pattern is able to distinguish ESCC from adjacent normal tissue. The co-expression network from differentially expressed coding and lncRNA genes in ESCC was constructed, and the lncRNA function may be inferred from the co-expression network. LncRNA ESCCAL-1 is such an example as a predicted novel onco-lncRNA, and it is overexpressed in 65% of an independent ESCC patient cohort (n = 26). More over, knockdown of ESCCAL-1 expression increases esophageal cancer cell apoptosis and reduces the invasion in vitro.
Our study uncovered the landscape of ESCC-associated lncRNAs. The systematic analysis of coding and lncRNAs co-expression network increases our understanding of lncRNAs in biological network. ESCCAL-1 is a novel putative onco-lncRNA in esophageal cancer development.
Electronic supplementary material
The online version of this article (doi:10.1186/s12885-015-1179-z) contains supplementary material, which is available to authorized users.
Long noncoding RNA; lncRNA; Co-expression network; Esophageal cancer; ESCC; ESCCAL-1; Cancer initiatome
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community.
Database URL: http://www.bio-bigdata.com/Co-LncRNA/
Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (∼30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non–TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ∼30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ∼35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
An unexpected layer of complexity in the genomes of humans and other vertebrates lies in the abundance of genes that do not appear to encode proteins but produce a variety of non-coding RNAs. In particular, the human genome is currently predicted to contain 5,000–10,000 independent gene units generating long (>200 nucleotides) noncoding RNAs (lncRNAs). While there is growing evidence that a large fraction of these lncRNAs have cellular functions, notably to regulate protein-coding gene expression, almost nothing is known on the processes underlying the evolutionary origins and diversification of lncRNA genes. Here we show that transposable elements, through their capacity to move and spread in genomes in a lineage-specific fashion, as well as their ability to introduce regulatory sequences upon chromosomal insertion, represent a major force shaping the lncRNA repertoire of humans, mice, and zebrafish. Not only do TEs make up a substantial fraction of mature lncRNA transcripts, they are also enriched in the vicinity of lncRNA genes, where they frequently contribute to their transcriptional regulation. Through specific examples we provide evidence that some TE sequences embedded in lncRNAs are critical for the biogenesis of lncRNAs and likely important for their function.
Long non-coding RNAs (lncRNAs) are larger than 200 nucleotides in length and pervasively expressed across the genome. An increasing number of studies indicate that lncRNA transcripts play integral regulatory roles in cellular growth, division, differentiation and apoptosis. Deregulated lncRNAs have been observed in a variety of human cancers, including hepatocellular carcinoma (HCC). We determined the expression profiles of 90 lncRNAs for 65 paired HCC tumor and adjacent non-tumor tissues, and 55 lncRNAs were expressed in over 90% of samples. Eight lncRNAs were significantly down-regulated in HCC tumor compared to non-tumor tissues (p < 0.05), but no lncRNA achieved statistical significance after Bonferroni correction for multiple comparisons. Within tumor tissues, carrying more aberrant lncRNAs (6–7) was associated with a borderline significant reduction in survival (HR = 8.5, 95% CI: 1.0–72.5). The predictive accuracy depicted by the AUC was 0.93 for HCC survival when using seven deregulated lncRNAs (likelihood ratio test p = 0.001), which was similar to that combining the seven lncRNAs with tumor size and treatment (AUC = 0.96, sensitivity = 87%, specificity = 87%). These data suggest the potential association of deregulated lncRNAs with hepatocarcinogenesis and HCC survival.
long non-coding RNAs; deregulation; HCC; HBV; survival
Recent genome-wide expression profiling studies have uncovered a huge amount of novel, long non-protein-coding RNA transcripts (lncRNA). In general, these transcripts possess a low, but tissue-specific expression, and their nucleotide sequences are often poorly conserved. However, several studies showed that lncRNAs can have important roles for normal tissue development and regulate cellular pluripotency as well as differentiation. Moreover, lncRNAs are implicated in the control of multiple molecular pathways leading to gene expression changes and thus, ultimately modulate cell proliferation, migration and apoptosis. Consequently, deregulation of lncRNA expression contributes to carcinogenesis and is associated with human diseases, e.g., neurodegenerative disorders like Alzheimer’s Disease. Here, we will focus on some major challenges of lncRNA research, especially loss-of-function studies. We will delineate strategies for lncRNA gene targeting in vivo, and we will briefly discuss important consideration and pitfalls when investigating lncRNA functions in knockout animal models. Finally, we will highlight future opportunities for lncRNAs research by applying the concept of cross-species comparison, which might contribute to novel disease biomarker discovery and might identify lncRNAs as potential therapeutic targets.
functional genomics; genetically engineered mouse models (GEMM); long intergenic RNA (lincRNA); metastasis; metastasis-associated lung adenocarcinoma transcript 1 (MALAT1); HOX transcript antisense RNA (HOTAIR)
An increasing number of long noncoding RNAs (lncRNAs) have been discovered with the recent advances in RNA-sequencing technologies. lncRNAs play key roles across diverse biological processes, and are involved in developmental regulation. However, knowledge about how the genome-wide expression of lncRNAs is developmentally regulated is still limited. We here performed a whole-genome identification of lncRNAs followed by a global expression profiling of these lncRNAs during development in Drosophila melanogaster. We combined bioinformatic prediction of lncRNAs with stringent filtering of protein-coding transcripts and experimental validation to define a high-confidence set of Drosophila lncRNAs. We identified 1,077 lncRNAs in the given transcriptomes that contain 43,967 transcripts; among these, 646 lncRNAs are novel. In vivo expression profiling of these lncRNAs in 27 developmental processes revealed that the expression of lncRNAs is highly temporally restricted relative to that of protein-coding genes. Remarkably, 21% and 42% lncRNAs were significantly upregulated at late embryonic and larval stage, the critical time for developmental transition. The results highlight the developmental specificity of lncRNA expression, and reflect the regulatory significance of a large subclass of lncRNAs for the onset of metamorphosis. The systematic annotation and expression analysis of lncRNAs during Drosophila development form the foundation for future functional exploration.
Long non-coding RNAs (lncRNAs) are novel transcripts that may play important roles in cancer. Our study aimed to resolve the lncRNA profile of larynx squamous cell carcinoma (LSCC) and to determine its clinical significance. The global lncRNA expression profile in LSCC tissues was measured by lncRNA microarray. Distinctly expressed lncRNAs were identified and levels of AC026166.2-001 and RP11-169D4.1-001 lncRNAs in 87 LSCC samples and paired adjacent normal tissue were analyzed by real-time quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR). The clinical significance of these lncRNAs in laryngeal cancer was analyzed and survival data were estimated by the Kaplan–Meier method and the log-rank test. A receiver operating characteristic (ROC) curve was constructed to check the diagnostic value. In the lncRNA expression profile of tumor samples, 684 lncRNAs were upregulated and 747 lncRNAs were downregulated (fold-change >2.0). Of these, AC026166.2-001 and RP11-169D4.1-001 were distinctly dysregulated, with AC026166.2-001 exhibiting lower expression in cancer tissues and RP11-169D4.1-001 higher expression. We verified that both AC026166.2-001 and RP11-169D4.1-001 were expressed at a lower level in cervical lymph nodes compared with paired laryngeal cancer tissues and paired normal tissues. RP11-169D4.1-001 levels were positively correlated with lymph node metastasis (P = 0.007). From the survival analysis, decreased levels of AC026166.2-001 and RP11-169D4.1-001 were associated with poorer prognosis. The area under the ROC curve was up to 0.65 and 0.67, respectively, and the cut-off point of ΔCt was 11.23 and 10.53, respectively. AC026166.2-001 and RP11-169D4.1-001 may act as novel biomarkers in LSCC and may be potential therapeutic targets for LSCC patients. Both AC026166.2-001 and RP11-169D4.1-001 could be independent prognostic factors for survival in LSCC.
Background: Recent research indicates that long non-coding RNAs (lncRNA) represent a new family of RNAs that is of fundamental importance for controlling transcription and translation. Thereby, there is increasing evidence that lncRNAs are also important in tumourigenesis. Thereby valid expression profiling using quantitative PCR requires suitable, stably expressed normalisers to achieve reliable and reproducible data. However, no systematic analysis of suitable references in lncRNA studies in human glioma has been performed yet.
Methods: In this study, we investigated 90 lncRNAs in 30 tissue specimen for the expression stability in human diffuse astrocytoma (WHO-Grade II), anaplastic astrocytoma (WHO-Grade III) and glioblastoma (WHO-Grade IV) both alone as well as in comparison with normal white matter. Our identification procedure included a rigorous bioinformatical selection process that resulted in the inclusion of only highly abundant, equally expressed lncRNAs for further analysis. Additionally, lncRNAs were classified according to their stability value using the NormFinder algorithm.
Results: We identified 24 appropriate normalisers suitable for studies in diffuse astrocytoma, 22 for studies in anaplastic astrocytoma and 12 for studies in glioblastoma. Comparing all three glioma entities 7 lncRNAs showed stable expression levels. Addition of normal brain tissue resulted in only 4 suitable lncRNAs.
Conclusions: Our findings indicate that 4 lncRNAs (HOXA6as, H19 upstream conserved 1 and 2, Zfhx2as and BC200) are suitable as normalisers in glioma and normal brain. These lncRNAs may thus be regarded as universal references being applicable for the accurate normalisation of lncRNA expression profiling in various glioma (WHO-Grades II-IV) alone and in combination with brain tissue. This enables to perform valid longitudinal studies, e.g. of glioma before and after malignisation to identify changes of lncRNA expressions probably driving malignant transformation.
long non-coding RNA; lncRNA; Glioma; References; qPCR; Profiling.