The analysis of full-length enriched cDNA libraries has been of vital importance in improving our understanding of the mammalian transcriptome. In this regard, however, unspliced noncoding cDNAs are often viewed with skepticism because they can arise as truncation artifacts of cDNA library construction. Here, we have shown that such artifacts cluster within very long, functionally important ncRNAs such as
Air and
Xist, and, rather than summarily dismissing these cDNAs as worthless, we have employed a strategy that uses them to identify long ncRNAs genome-wide. The resulting list of 66 candidate ENORs—itself almost certainly an underestimate—potentially expands several-fold the number of known mouse ncRNAs larger than 10 Kb in size, which, to date, includes only a few examples such as
Xist, Air, Kcnq1ot1, and
Ube3a-ats, most of which were successfully detected with our methods. In the past, such macro ncRNAs have been discovered experimentally on an ad hoc basis, and it has not been possible to systematically identify large ncRNAs by bioinformatics means, since most existing tools are limited to the discovery of smaller ncRNAs with conserved primary sequences and/or secondary structures [
44]. Our strategy offers a solution to this problem.
Expression studies produced a number of interesting observations. First, in silico analysis indicated that some ENORs cluster together within the genome and are coexpressed. For example, ENOR22 and ENOR23 are located within 2,300 Kb of each other on Chromosome 7 and are specifically expressed in CNS. One possible explanation for this coexpression is that these regions share a common chromatin domain. Second, we found that the majority of ENOR transcripts were predominantly nuclear, similar to functional ncRNAs such as
Xist and
Tsix. ncRNAs like these are increasingly being recognized as important in altering chromatin structure [
45,
46], and it is tempting to speculate that the ENOR transcripts might also function in this way. Third, qRT-PCR studies of the ENOR28 and ENOR31 loci () indicated that the actual transcribed regions are almost certainly underestimated based upon current ENOR boundaries. This is not surprising, since the boundaries were estimated using internally primed transcript coordinates, and reflects that our discovery pipeline was not designed to capture transcription start and end sites. Lastly, despite the possible existence of multiple macro ncRNAs in ENOR28 and ENOR31, expression correlation between the individual cDNAs was extremely high (average
R = 0.96). This indicates that even if there are separate transcripts arising from each region they appear to be under the influence of similar regional promoters, enhancers, or chromatin domains. Fluorescence in situ hybridization studies might prove useful to visualize the ENOR transcripts and their surrounding chromatin structure (via the use of histone-specific antibodies), and may also directly demonstrate in which specific cell types and subcellular compartments ENORs are localized. For instance, knowing exactly which groups of neurons in the brain express ENOR28 and ENOR31 transcripts might provide indirect information as to their function. Understanding how the expression of these transcripts is regulated will also be important. For instance, fine-detailed mapping of transcript copy number by qRT-PCR using more primer pairs might better define the relevant transcriptional start sites and promoter regions.
Macro ncRNAs can function in a variety of ways, and some clues to the possible function of the ENORs can be gleaned from their association with antisense transcription, candidate imprinting domains, and miRNAs. Antisense transcripts exert regulatory effects in a number of ways, as mentioned earlier. Some of these effects (e.g., RNA interference and translation regulation) can be mediated by small miRNAs and siRNAs, and it is unclear if longer antisense transcripts—such as those identified in this study—are required to function in certain regulatory contexts. Of course, long antisense transcripts might be processed into smaller functional RNAs, although there has been no evidence that
Xist or
Air, for instance, work in this manner. Macro ncRNAs can also regulate genomic imprinting.
Ube3a-ats, Kcnq1ot1, and
Air have all been implicated in the imprinting control of their antisense transcripts. These three ncRNAs are themselves imprinted, a fact correctly predicted by the methods we used here. These same methods suggest that a further nine ENORs might represent potentially imprinted ncRNAs, which, if confirmed, would add substantially to the number of imprinted ncRNAs currently characterized. Finally, in silico analysis detected overlap between ENORs and more than 5% of known mouse miRNAs, suggesting that one of the possible functions of some of these regions may be to act as miRNA host genes. Given a recent report indicating that many mammalian miRNAs are still to be discovered [
47], the possibility exists that more ENORs will be associated with novel miRNAs in the future.
Lacking any direct evidence of ENOR function, we also acknowledge the possibility that some of these regions do not play any functional role as RNAs. It has been shown, for instance, that expression of the yeast noncoding RNA
Srg1 is necessary for the repression of its downstream gene,
Ser3, but this appears to be due to the act of
Srg1 transcription (causing
Ser3 promoter interference) rather than any direct action of the
Srg1 RNA itself [
48]. Meanwhile, Wyers et al. found that intergenic transcripts in yeast are rapidly degraded by a specific nuclear quality control pathway and are therefore likely to be nonfunctional [
49]. Another recent report in which megabase deletions of noncoding DNA were engineered and failed to produce any detectable phenotype in mice [
50] suggests that large noncoding regions of the genome may not have function. It should be noted, however, that the regions targeted in this deletion study lacked evidence of transcription, in direct contrast to the regions we have characterized. A suggestion has also been made that many noncoding transcripts simply represent useless by-products of “leaky transcription” [
51]. Based upon our expression studies of ENOR28 and ENOR31, transcripts from both these regions appear to be clearly expressed in brain (estimated at 1–8 copies/cell based upon our previous work [
52], which is similar to
Air [] and to most mRNAs [
53]), suggesting that in these cases, at least, transcripts are controlled. To demonstrate the importance (or otherwise) of the ENORs, it will ultimately be necessary to test their function directly. This, together with efforts to better understand the gene structure, expression, and regulation of individual transcripts within each region, is the challenge that lies ahead.