|Home | About | Journals | Submit | Contact Us | Français|
Long noncoding RNAs (lncRNAs) were discovered in eukaryotes more than thirty years ago . Recent advances in genomics have led to the discovery that lncRNAs are transcribed pervasively across the genome [2–5]. There are an increasing number of reports that identify lncRNAs whose expression is modulated during cell differentiation or in disease states. However, biological functions for the vast majority of them are yet to be determined. Here, we propose two ways to identify lncRNAs that have biological functions: To identify lncRNAs with dedicated preinitiation complexes (PICs), and to focus on those whose transcription is highly regulated.
Recent advancement in technologies such as next-generation deep sequencing and tiled microarrays has enabled genome-wide analyses of the eukaryotic transcriptome. One of the unexpected findings from these analyses is that a very large number of long non-coding RNAs (lncRNAs) are transcribed from eukaryotic genomes, from budding yeast to humans [2,4–6]. Indeed, current estimation is that more than 80% of eukaryotic genomes are transcribed , meaning that thousands of lncRNAs are transcribed in eukaryotic cells. lncRNAs are usually defined as RNA transcripts that are longer than 200 base pairs that do not have the potential to encode protein . High-resolution analyses of lncRNAs revealed that they can arise under a variety of genomic contexts: intergenically or intragenically, and in the sense or antisense orientations. One of the most important questions right now is how many of the thousands of identified lncRNAs play biological roles. Although some researchers believe that the vast majority of lncRNAs are products of stochastic transcription , it has become clear that some lncRNAs are capable of possessing regulatory roles in gene expression.
There have been several examples in which transcription of antisense lncRNA leads to down-regulation of its cognate mRNA, and the underlying mechanisms have been reported for many of these cases. For instance, the lncRNA transcript can recruit histone-modifying enzymes to specific genomic loci, thereby creating repressive transcriptional environments. In mammals, the lncRNAs HOTAIR and Xist work in this manner to down-regulate the HOX genes and to inactivate one of the X-chromosomes, respectively [7,9–11]. In the budding yeast, Saccharomyces cerevisiae, transcription of the antisense, lncRNA at the PHO84 gene is coincident with recruitment of the lysine deacetylase (KDAC) Hda1, which suppresses PHO84 mRNA transcription, showing potential evolutionary conservation of lncRNA-mediated gene regulation mechanisms [12,13]. Whether the lncRNA at PHO84 directly recruits Hda1 is still not clear . It should be noted that, in some cases, the act of lncRNA transcription, rather than the lncRNA products, play regulatory roles .
These examples in both humans and yeast show that lncRNAs are expressed from an array of contexts across all eukaryotes, and can work through various mechanisms to regulate gene expression, potentially underlying disease pathophysiology. Despite this, functional roles have still not been assigned to the vast majority of lncRNAs. Therefore, establishing a method to systematically identify (or at least enrich for) lncRNAs or lncRNA transcription events that play biological roles would be a very significant step forward. We propose two ways to systematically enrich for lncRNA transcripts or transcription events that likely play biological roles: (1) Identify lncRNAs that have dedicated pre-initiation complexes (PICs). (2) Identify lncRNAs whose transcription is highly regulated.
As far as we know, the vast majority, if not all, of lncRNAs are transcribed by RNA polymerase II (Pol II). Initiation of Pol II transcription absolutely depends on ordered targeting of general transcription factors (GTFs), such as TFIIB and TFIID, to promoters, which leads to the formation of a PIC near the transcriptional initiation sites. Therefore, all protein coding genes that are either actively transcribed, or are poised to be transcribed, have PICs at their promoters. The major source of lncRNA is divergent promoters in both budding yeast [2,4] and humans , in which transcription of mRNA and lncRNA initiate from a shared nucleosome depleted region (NDR), where PIC forms (Figure 1). Because NDRs are typically small (less than 300 bp), the resolution afforded by conventional chromatin immuno-precipitation (ChIP) followed by deep sequencing (ChIP-seq) of GTFs cannot determine whether the mRNA and lncRNA share a PIC or they have distinct PICs with high confidence. However, the recent development of ChIP-exo, a super-high resolution ChIP-seq method, enabled genome-wide mapping of PICs at base-pair resolution . The initial report describing the ChIP-exo analyses of GTFs indeed revealed that a significant fraction of divergent promoters on S. cerevisiae genome have two distinct PICs at each end of NDRs, one for mRNA and another for lncRNAs (Figure 1). If a lncRNA has a dedicated PIC, it means that GTFs are targeted in an ordered fashion to assemble the PIC for the lncRNA, and that Pol II is recruited by the PIC only to transcribe the lncRNA. This suggests that the lncRNA represents a discrete transcription unit, and there is a higher likelihood that cells “intend” to transcribe the lncRNA. On the other hand, if a mRNA-lncRNA pair shares a PIC, that suggests that Pol II that transcribe lncRNA may be targeted by the PIC for mRNA transcription. In this case, lncRNA may represent the product of erratic initiation of mRNA, or transcriptional “noise”. Therefore, it is likely that a systematic identification of lncRNAs that have dedicated PICs would enrich for lncRNAs and lncRNA transcription events that play biological roles. ChIP-exo and other super high-resolution method for identifying PIC locations are applicable for metazoan cells [18,19], making this strategy available across eukaryotes.
Another way we envision to identify lncRNAs that have biological roles is to focus on the transcripts or transcription events that are highly regulated. The rationale is intuitive: There is no reason to regulate transcription of a lncRNA if it is simply a consequence of “noise”. Two different approaches can be taken. One is to identify lncRNAs that are differentially expressed under different circumstances, such as different cell types, growth conditions and disease states. This strategy has been very successful, and a very large number of lncRNAs associated with specific developmental stages or diseases have been identified this way . Another approach is to systematically identify lncRNA regulators and their targets. This would involve the identification of lncRNAs that exhibit abnormal transcript levels when the lncRNA regulators are mutated. Of course, these mutations can cause abnormal transcription events, which can produce non-functional lncRNAs. However, most of the lncRNAs that are elevated in these mutants are detectable in wild type cells. In addition, we have started to learn that a lot of cellular resources are used to repress lncRNA transcription as described below. It is therefore difficult to envision that all of these resources are used simply to suppress noise.
The fact that lncRNAs generally initiate from NDRs suggests that their transcription is strongly affected by chromatin structure, much like mRNAs. One way the cell can regulate DNA accessibility is through the use of ATP-dependent chromatin remodeling factors, which can mobilize nucleosomes or alter histone-DNA contacts to regulate DNA accessibility . The first chromatin remodeling factor found to repress lncRNA transcription is Isw2 (Imitation Switch 2) in S. cerevisiae. Isw2 functions to slide nucleosomes along the length of DNA [23,24]. Genome-wide analysis of Isw2 targets revealed that Isw2 functions at NDRs at both the 5′ and 3′ ends of genes , and that Isw2 is targeted and represses anti-sense lncRNAs from divergent promoters at the 3′ ends of genes [25,26]. More recently, it was shown that mammalian esBAF, a Swi/Snf family chromatin remodeling factor that is required to maintain embryonic stem cell (ESCs) renewal and pluripotency , was found to repress lncRNAs genome-wide . esBAF accomplishes this function by stabilizing nucleosomes at lncRNA initiation sites and maintaining a well-defined NDR .
Although mutagenesis or manipulation of candidates for lncRNAs regulators to identify highly regulated lncRNAs is feasible, more systematic, unbiased approaches can be taken. So far, large-scale genetic screens for lncRNA regulators have been performed in S. cerevisiae (see figure 2 for schematic drawing of strategies).
Following pioneering work in which a genetic screen was performed to identify genes that repress initiation of intragenic cryptic transcription , a genome-wide screen to identify genes that maintain proper relative transcription levels of mRNA and lncRNA pairs at bidirectional promoters was developed . In this screen, a reporter was constructed by using a divergent promoter sequence, fusing a mCherry fluorescent marker on the coding end, and a YPF marker on at the noncoding end, which was introduced to the yeast deletion mutant library collection . This reporter allowed for the systematic identification of mutants in which the relative levels of transcription on the coding and non-coding sides of the divergent promoter is skewed compared to wild type cells. This screen identified many genes involved in chromatin remodeling and chromatin assembly, including subunits of Swr1, Rsc, Ino80, and Isw2 chromatin remodeling factors. They also identified three genes encoding subunits of a histone chaperone the Chromatin Assembly Factors 1 (CAF-1) complex CAC1, CAC2, CAC3. Consistent with the design of the genetic screen, NET-seq  analysis on CAF-1 mutants revealed a genome-wide increase in lncRNA transcription relative to mRNA at divergent promoters, making it the first factor identified to have a genome-wide role in dictating transcriptional direction. Interestingly, the elevated lncRNA transcription in CAF-1 mutants is dependent on histone H3 K56 acetylation and Swi/Snf chromatin remodeling complex, demonstrating that multiple chromatin regulators work on lncRNAs for proper control . Although this genetic screen was not specifically designed to identify lncRNAs with biological roles, it is possible that some of the elevated transcripts have biological roles: For example, the increased level of lncRNAs proximal to bidirectional promoters might be a cellular signal of chromatin disruption, which may be required to target chromatin repair factors to the sites of chromatin disruption. For the purpose of identifying functional lncRNA transcripts and transcription events, this genetic screen can also be modified to use different divergent promoters: For example, one can use divergent promoters that form distinct PICs for mRNA and lncRNA, and/or use the promoters that are highly regulated by the environment or growth conditions.
Our lab has also developed a genome-wide screen using reconstituted RNA interference (RNAi) as a tool to identify repressors of anti-sense lncRNAs (ASlncRNAs) . The rationale behind the design of the screen is that the global increase in ASlncRNAs in the presence of RNAi would cause growth defects: This would result in a large-scale formation of mRNA:ASlncRNA hybrids, which would be processed by RNAi, causing global destabilization of mRNAs and lncRNAs (Figure 2). This screen identified 408 putative lncRNA repressors, including the chromatin remodeling factors Swr1, Isw2, Rsc, and Ino80, which were all confirmed to be the repressors of ASlncRNAs by RNA-seq. This result suggests that the cell devotes a much larger amount of resources than previously thought to repress lncRNAs. It was shown that Isw2, RSC and Ino80 are targeted to the initiation sites of the ~45% (814) of ASlncRNAs that are repressed by these factors, suggesting that these remodeling factors directly repress transcription of a large number of ASlncRNAs. Most importantly, de-repression of 259 ASlncRNAs in chromatin-remodeling factor mutants is associated with a significant decrease in the level of mRNAs they overlap. This suggests that chromatin-remodeling factors maintain the levels of these 259 mRNAs through repression of transcription of overlapping ASlncRNAs. It is likely that this number represents a gross underestimation of functional ASlncRNA transcription events, as only one growth condition (logarithmic growth in rich medium) was used for RNA analyses. Given the presence of ~400 putative uncharacterized ASlncRNA repressors, it is likely that the mechanism to regulate mRNA levels through ASlncRNA control is utilized far more commonly across the genome than currently appreciated.
Despite the increasing number of reports that identify lncRNAs in the context of development and disease [20,21], functions (if any) of the vast majority of lncRNAs remain to be identified. It is possible that lncRNA products have biological roles: Given that lncRNAs can provide information about the sequence and the location of transcription, they can be the tools for cells to target histone modifying enzymes, chromatin repair factors or other regulators to specific sites. It is also possible that the act of lncRNA transcription itself provide biological functions: It may facilitate co-transcriptional chromatin assembly, chromatin reorganization or interfere with other DNA-dependent processes. By focusing on lncRNAs that have dedicated PICs and those whose expression levels are highly regulated, one would likely be able to systematically enrich for lncRNA transcripts or transcription events that play important biological roles.
We thank the members of the Tsukiyama lab for helpful discussions. This work was supported by grant R01 GM058465 (T.T.) and predoctoral fellowship F31 GM101944 (E.A.A.).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.