In recent years, the concept of the functional genome has been re-written to include a multitude of newly discovered classes of ncRNA transcripts
[42],
[43],
[44],
[45]. Although the functional significance of long non-coding RNAs has long been recognized
[46],
[47], the abundance and scale of lncRNA expression changes in cancer is just beginning to come to light. For this reason, charting the transcriptional landscape of lncRNAs across human tissue and cancer types is a key step in understanding lncRNA functional significance in cancer.
Here, we present the first multi-tissue, cross-cancer lncRNA expression profiling study. Large-scale expression profiling datasets, such as SAGE, represent a valuable resource for investigating the expression pattern of polyadenylated lncRNAs. While this approach excludes the profiling of non-polyadenylated lncRNAs, it nonetheless facilitates the simultaneous profiling of thousands of polyadenylated lncRNAs in a wide range of human tissues and cancers. Using 272 SAGE libraries, representing 26 non-malignant human tissues, 19 human cancer types and 9 cancer cell lines, we have produced a first generation atlas of cross-cancer lncRNA expression profiles as a resource for this fast growing area of cancer research. Current estimates of the number of lncRNAs encoded in the human genome vary widely, ranging from ~7,000 to 23,000 or more
[7]. These estimates rival the abundance of the estimated 20,000+ protein-coding genes. Our analysis showed that lncRNAs are distributed on all 22 autosomes and sex chromosomes, yet the distribution pattern did not correlate with either protein-coding genes or miRNAs (,
Figure S3).
Examination of 72 SAGE libraries of normal human tissues revealed lncRNA expression in brain, breast, esophagus, gall bladder, heart, liver, lung, lymph node, muscle, peritoneum, placenta, prostate, retina, spinal cord, stomach, thyroid, vascular tissue, embryonic stem cells and white blood cells. We find extensive and highly differential patterns of lncRNA expression in normal human tissues ( and ), corroborating a previous report of tissue-specific ncRNA patterns
[34]. For example, the lncRNA NCRNA00116 was highly expressed in the contractile tissues, namely heart (TPM

=

349) and muscle (TPM

=

399). LncRNAs ENSG00000230658 and ENSG00000235621 showed very high expression (TPM

=

888) in placenta and esophagus (TPM

=

820) respectively, but low or undetectable expression in other tissues, which may indicate a tissue-specific role for these transcripts. The brain-associated and putative tumor suppressor lncRNA maternally expressed 3 (
MEG3)
[48], displayed the highest expression in brain in our dataset (TPM

=

677), but showed low level expression in other tissue types (). Collectively, these data suggest some lncRNAs may function in a tissue-specific manner.
Only ~1% of the lncRNAs were ubiquitously expressed across all tissues examined. These constantly expressed lncRNAs are reminiscent of the expression patterns of “housekeeping” protein-coding genes
[49]. The eleven lncRNAs in were expressed in at least 90% of 272 SAGE libraries in our dataset, implicating that these transcripts may participate in common biological processes. However, the absolute expression level varied for each tissue, sometimes by hundreds of TPM (). This suggests certain lncRNAs may be required at different cellular levels in different tissues or under different conditions, much like many constitutively expressed protein-coding genes
[50],
[51],
[52]. The concept of lncRNAs functioning as constitutively expressed regulators has been previously proposed. For example, the lncRNA
XIST is critical for female development due to its functional role in X-chromosome inactivation
[47],
[53]. Concordantly, a number of the most highly and frequently expressed lncRNAs in our dataset have prior associations with key biological processes, including
NEAT1, a structural scaffold for paraspeckle formation
[14],
[54],
MALAT1 which regulates alternative splicing
[31] and small nucleolar RNA host gene 6 (
SNHG6) which hosts a snoRNA, which function in RNA modification
[55]. These findings suggest that lncRNAs may be critical to normal tissue maintenance and function.
In this cross-cancer type analysis, we found that lncRNAs aberrantly expressed in a specific cancer may also be altered in other cancers. For example, while
MEG3 is highly expressed in normal brain tissues, this lncRNA was strongly decreased in our brain cancer datasets, and strikingly so in gall bladder, retinal and prostate cancers, consistent with the proposed tumor suppressor role for
MEG3 [48],
[56],
[57]. In another example, miR155 host gene (
miR155HG), a lncRNA processed to the miRNA
miR-155, was highly overexpressed in B-cell lymphoma consistent with previous reports
[16], but also was also upregulated in esophageal and gall bladder cancers.
Long non-coding RNAs are also implicated in the regulation of embryogenesis
[58],
[59],
[60]. Fetal lncRNAs reactivated in cancers may represent critical regulators of pluripotency or cellular growth. For example, the lncRNA urothelial cancer associated 1 (
UCA1) has demonstrated roles in both embryonic development and is implicated in bladder cancer, supporting this concept
[61]. In our datasets, we found several lncRNAs with low expression in normal tissues, but with high expression in both embryonic stem cells and cancer (
Table S12). While these reactivated fetal lncRNAs represented mostly uncharacterized examples,
H19, a well-studied lncRNA with associations in both mammalian development and cancer
[53], was also detected in our dataset. Interestingly,
NEAT1, which is constitutively and highly expressed in normal tissues
[34],
[62], with the exception of embryonic stem cells, was downregulated in lung, liver, esophageal and retinal cancers (retinoblastoma).
Since genomic amplifications and deletions are key mechanisms of gene deregulation in cancer, we investigated changes in lncRNA expression in genomic regions frequently altered in breast, brain and lung cancer. Comparison of the significantly (p<0.05) deregulated lncRNAs common between brain, breast and lung cancer tissues revealed eight lncRNAs were differentially regulated (≥2-fold) compared to normal tissue. Intriguingly, three of these lncRNAs - ENSG00000226380, ENSG00000230937 and ENSG00000253288 - were located on 7q32.3, 1q32.2, and 8q24.23, respectively, in regions completely devoid of protein-coding genes. Like protein-coding genes and miRNAs, it is possible that differential lncRNA expression is driven by similar mechanisms of disruption, including copy number gain/loss or aberrant methylation patterns. Indeed, high level amplification of lncRNA containing loci such as cytoband 19p12 has been reported in breast cancer
[63], while high level amplification of 12p13.2 (which contains a number of lncRNA loci) has been reported in breast cancer, glioblastoma, astrocytoma, and squamous cell lung cancer
[64],
[65],
[66],
[67]. Likewise, aberrant expression of a number of lncRNAs has been tied to altered methylation patterns
[68],
[69]. However, the mechanism(s) driving aberrant lncRNA expression remains mostly unknown.
While lncRNAs have been documented for nearly three decades, the magnitude and diversity of lncRNA expression has only recently been appreciated. It is estimated that lncRNAs in the human genome number into the tens of thousands, effectively doubling the number of potential gene targets in cancer gene expression networks. Large scale, cross-tissue and cancer studies are crucial to understanding the regulation of lncRNA expression and how these novel transcripts integrate with our current understanding of the mammalian transcriptome. Moreover, a deeper understanding of lncRNA expression will not only expand the number of potential target cancer genes, but also facilitate development of novel anti-cancer therapies, such as gene regulation mediated by antisense RNAs
[70] or targeting lncRNA-protein interactions
[28].