|Home | About | Journals | Submit | Contact Us | Français|
Several studies have shown that promoters of protein-coding genes are origins of pervasive non-coding RNA transcription and can initiate transcription in both directions. However, only recently have researchers begun to elucidate the functional implications of this bidirectionality and noncoding RNA production. Increasing evidence indicates that non-coding transcription at promoters influences the expression of protein-coding genes, revealing a new layer of transcriptional regulation. This regulation acts at multiple levels, from modifying local chromatin to enabling regional signal spreading and more distal regulation. Moreover, the bidirectional activity of a promoter is regulated at multiple points during transcription, giving rise to diverse types of transcripts.
In the past decade, genomic research has focused on protein-coding genes. However, recent studies have revealed a myriad of non-coding transcripts in different organisms [1, 2]. While protein-coding transcription represents the output of only less than 2% of the human genome, more than 70% of the genome is transcribed . The unexpected level of transcriptome complexity has led to the suggestion that non-coding RNAs (ncRNAs) comprise a previously hidden layer of genomic programming and that ncRNA-directed regulatory circuits underpin complex genetic phenomena in eukaryotes . A large portion of reported ncRNA transcription occurs in the proximity of protein-coding genes, particularly at promoters but also at ends of coding regions and intragenically. In this review, we focus on the generation and functional consequences of non-coding transcription initiating from bidirectional promoters.
Divergent, bidirectional transcription of protein-coding genes has been recognized for many years , and the more recent sequencing and annotation of whole genomes has revealed that bidirectional organization of coding genes is common [6, 7]. Further technological advancements over the last few years, however, have revealed an abundance of previously unobserved non-coding transcripts near the promoters of protein-coding genes in different organisms, including bacteria [8–11], yeast [12–15], fruit fly , mouse [17, 18], human  and plants .
To understand the diversity of ncRNAs found near promoters, one must consider the technologies used to detect them as they bias our accounts of ncRNAs. The study of transcripts at promoters in turn provides information about the frequency of bidirectional transcription, the control of directionality in bidirectional promoters, and the importance of transcriptional regulation mediated by promoter-associated transcription. Here we comment on the main features of promoter-associated transcripts and technologies for detecting them (detailed further in Table 1 and Box 1, respectively).
Enabled by the development of new genomic technologies, an abundance of new transcripts has been described (reviewed in  and ). These technologies interrogate gene expression at multiple steps, from the initiation of transcription to the final transcripts produced.
The extent of ncRNA transcription has been investigated using different approaches, including measurements of steady-state RNA levels [13, 14, 17, 18, 21–24], RNA polymerase activity , and nascent RNAs associated with RNA polymerase II . These studies have resulted in a catalog of non-coding transcripts near protein-coding genes and have revealed both protein/non-coding and non-coding/non-coding bidirectional transcription (Figure 1).
A large body of evidence for promoter bidirectionality derives from studies that have used expression profiling of steady-state RNA levels, targeting either all transcripts or specific subpopulations of different lengths and stabilities. A widely used approach in higher eukaryotes is to specifically profile short RNAs that are otherwise difficult to distinguish from overlapping long transcripts. This has revealed short transcripts of varying sizes located on both strands near promoters [17, 18, 21–23] (Figure 1, Table 1). Transcripts also differ in their stability. As transcript abundance in a cell is dependent on the rates of both synthesis and decay [27, 28], some transcripts are not detected by RNA profiling of wild-type cells due to their short lifetimes. Manipulation of the RNA degradation pathways has been used to uncover these so-called cryptic unstable transcripts (CUTs) that usually initiate from shared promoters [13, 14, 24]. In addition to profiling transcripts with specific size and stability, bidirectional transcription has been studied using protocols that capture 5’ termini of transcripts and reveal transcription start sites (TSSs) [29, 30].
Evidence for pervasive bidirectional initiation from promoters also derives from studies that directly measure transcription. These studies have been performed either by detecting all nascent transcripts (that is, those attached to RNA polymerase) or by targeting only transcripts that are being elongated. Global run-on sequencing (GRO-seq), a method that measures actively elongating RNAs (Box 1), has revealed RNA polymerase-mediated transcription in two directions at 55% of human promoters, with only a small bias in orientation . Promoter bidirectionality has also been shown in yeast by sequencing nascent transcripts that co-purify with RNA polymerase II (NET-seq, Box 1) .
A systematic application of these diverse technologies will provide the most comprehensive information necessary to characterize non-coding transcription at promoters in terms of transcript type and number. Clearly, some of the non-coding transcripts detected reflect distinct biological entities (e.g. transcripts with characteristic biogenesis and degradation mechanisms (CUTs [13, 14], PROMPTs ). Other transcripts may not represent independent transcription events but rather post-transcriptional processing products of longer ncRNAs generated from the same genomic locus . Furthermore, some of the observed diversity is likely due to differences in detection techniques. Therefore, the capacity of each technology should be taken into account when comparing results. For example, NET-seq cannot identify nascent transcripts close to the TSS as they are too short to map uniquely to the genome, while in GRO-seq these transcripts are elongated during the run-on reaction, allowing them to be mapped. In another example, when profiling steady state transcript abundance for specific sizes, one should note that the different transcript lengths detected could be arbitrary subpopulations from a continuum of sizes. Nevertheless, the diversity of the applied approaches shows that transcriptional initiation at most, if not all, promoters occurs in a bidirectional manner, even if the final transcription products are mainly observed in one orientation.
By studying different steps of the transcription process, and measuring specific subpopulations of transcripts, one can obtain complementary insights into the activity and regulation of bidirectional transcription. Thus, recent genomic developments substantiate the pervasiveness and illustrate the complexity of bidirectional transcription. These observations coupled with the anticipated functional implications of ncRNAs on genome regulation display the need for further investigation of transcriptional initiation at promoters.
A promoter can be defined as a region of DNA that directs the transcription of a downstream unit [31, 32]. To understand how bidirectional transcription works, it is important to study the role that chromatin structure plays in the assembly of the transcription machinery and how chromatin is dynamically regulated [33–36].
Active promoters generally contain an 80 to 300 bp nucleosome-depleted region (NDR) [34, 37, 38] flanked by two well-positioned nucleosomes (Figure 2). These nucleosomes often contain the histone variant H2A.Z [39–41]. The association between NDRs and TSSs suggests that assembly of the transcription machinery occurs in NDRs [37, 38]; and further studies have shown that transcripts often originate bidirectionally from these regions [13, 14]. The regular positioning of nucleosomes with respect to promoters is conserved from yeast to humans , excluding minor differences. In yeast, the location of the TSS is just inside the +1 nucleosome, whereas in metazoans this nucleosome is positioned further downstream, leaving the TSS accessible . This could allow different ways of transcriptional regulation. In yeast, the presence of the +1 nucleosome at the TSS could play a role in transcription initiation , while in metazoans the +1 nucleosome may contribute to transcription elongation by helping to pause RNA polymerase II . There are also gene-specific differences in the degree of nucleosome depletion at promoters [33, 36], which allow variations in gene regulation and transcriptional plasticity. NDRs are not only associated with the 5’ ends of genes, but are also found at their 3’ ends, where they may be involved in transcription termination , antisense transcription [13, 43], or gene looping, where start and termination sites interact during transcription [43, 44].
As the presence of an NDR is important for transcription initiation, the precise organization of chromatin is critical for dictating where transcription starts. Spurious transcription is suppressed via several mechanisms , of which one of the most intensively studied is histone modification [35, 46]. A well-known example is the suppression of intragenic cryptic transcription by histone deacetylation. This is catalyzed by the Rpd3S complex that recognizes H3K36me produced by Set2 during RNA polymerase II elongation [35, 47, 48, Churchman, 2011 #72]. Another mechanism involves limiting access of the transcription machinery by remodeling nucleosome distributions [49–51]. Chromatin remodeling plays an important role in promoter regulation by influencing the positioning of nucleosomes on the promoter [33, 36]. In these cases, regulation is affected by displacing the nucleosomes from 5’ and 3’ NDRs  or by modifying the size of the NDR . Chromatin remodeling also acts to repress spurious transcription at intragenic locations where transcription should not occur [33, 36, 54].
In summary, a NDR is a site for transcription initiation where the transcriptional machinery assembles, potentially in both orientations. Further regulatory steps define how far the polymerase can continue when transcribing in a bidirectional manner.
If evidence of pervasive transcription at promoters indicates that promoters are generally capable of initiating transcription in two directions, why do we detect productive elongation mostly in one orientation? Once the transcription machinery has assembled onto DNA, it has the option to start transcription in both directions. Data collected by measuring transcriptional activity at the start of transcription show that RNA polymerase assembly and initiation occur in almost equal proportions in both orientations . However, nascent sense transcripts are at least eight times more abundant than divergent transcripts at more than half of yeast promoters . This shows that transcription in the divergent orientation decreases even during the transcription process as RNA polymerase II moves further away from the promoter, which is probably due to post-recruitment regulation. Indeed, a large proportion of initiation events do not produce a final stable transcript [55, 56], reflecting possible regulation after RNA polymerase II recruitment. Although at any given moment 60% of RNA polymerase II present in a cell has initiated transcription, less than 10% will produce a stable sense transcript . Therefore, the relative amount of transcripts produced from a bidirectional promoter is controlled via several regulatory steps during the transcription cycle (initiation, elongation and termination).
The progression of RNA polymerase II along a transcription unit is accompanied by a series of chromatin modifications [35, 46] and RNA polymerase II C-terminal domain (CTD) phosphorylation [57–59]. RNA polymerase II is assembled onto the preinitation complex (PIC) with its CTD hypophosphorylated. During early elongation the CTD becomes rapidly phosphorylated on Ser5 (Figure 2). In yeast, this phosphorylation helps to recruit machinery such as mRNA-capping enzymes and H3K4 methyltransferase, as well as the early termination complex (Nrd1-Nab3) for promoting termination of short transcripts . During the early elongation step, a 5' checkpoint has been hypothesized, at which the polymerase chooses between pause, termination, or commitment to productive elongation [57, 61] (Figure 2). This checkpoint could enable the regulation of promoter directionality  by committing the already initiated transcription to termination. One factor that could be involved in this checkpoint is the prolyl isomerase Ess1. Ess1 is recruited by the CTD phosphorylated on Ser5 and enhances the dephosphorylation of Ser5P by Ssu72. This leads to the dissociation of the polymerase from the Nrd1-Nab3 termination complex and hence productive elongation . This checkpoint model is also in agreement with current data showing that promoters contain chromatin modifications indicative of bidirectional initiation, while marks of productive elongation are only seen in one orientation [18, 46, 64]. The regulation of transcription after RNA polymerase II recruitment is a well-characterized process, both in metazoans [64, 65] and in yeast [66, 67]. This regulation could be one of the sources of short RNAs detected on both sides of promoters [18, 22, 23], whose length may thus reflect the point at which non-productive transcription was terminated (Figure 2).
If RNA polymerase II passes the initial 5’ checkpoint and enters productive elongation, the level of Ser5P decreases, whereas the level of Ser2P increases. Ser2P is required to recruit machinery that cotranscriptionally modifies chromatin, polyadenylates, and terminates transcription  (Figure 2). Finally, the amount of divergent transcripts is also controlled at the post-transcriptional level by regulating transcript stability [13, 24]. Hence, the extent of bidirectional transcription from a promoter is regulated at several steps, allowing varying levels of divergent transcription. Transcription of two protein-coding genes from a bidirectional promoter demonstrates that production of full-length transcripts in both orientations is possible, even though the majority of promoters appear to display orientation preference. Mechanisms for the establishment and maintenance of this preference remain speculative (Box 2).
Although promoters are generally capable of actively initiating transcription in two directions, in most cases productive elongation is seen primarily in one orientation. Thus, a mechanism must exist to regulate a putative 5’ checkpoint and thus dictate this asymmetry after bidirectional initiation. This could include:
Given that several transcripts are produced from the same promoter, the promoter must act as a regulatory unit to couple their transcription. The effect of this transcription on gene activity (local or distal) could be mediated by either the transcription process itself or by the produced transcripts. The consequences of ncRNA transcription from a bidirectional promoter thus depend on transcript length, sequence and stability. Although there are an increasing number of case studies demonstrating that ncRNAs generated from bidirectional promoters have functional roles (see below), it is not clear which proportion is functional. Here we classify instances of bidirectional transcription into those affecting the bidirectional promoter, neighboring protein-coding genes, or more distal genes. To illustrate their potential functions, we utilize examples, where informative, of other promoter-associated transcripts with characterized effects.
Divergent organization of two protein-coding genes has been widely reported in different organisms [6, 68–71]. A study in humans indicated that the number of divergent gene pairs does not correlate with gene density , suggesting that maintaining their coupling by a shared promoter is beneficial. This is also supported by partial conservation of such pairs between species . Divergent pairs are mainly co-expressed, but some display opposite regulation [6, 70]. The anti-correlated regulation could be due to competition for the same pool of polymerases and associated factors, or the recruitment of chromatin modifiers (see mechanism for coding/non-coding pairs in Figure 3A and B). Bidirectional promoters often couple protein-coding genes involved in the same process [6, 70, 72], allowing for coordinated temporal (e.g. ) and environmental (e.g. ) responses.
Nucleosomes are a significant barrier for transcription ; hence, their exclusion permits local transcription. Transcription of a ncRNA from a shared promoter could activate protein-coding gene expression by displacing positioned nucleosomes, even when the RNAs are short  (Figure 3A). A study in Schizosaccharomyces pombe has shown that upstream transcription from the same strand of the fbp1+ locus induces local remodeling of chromatin and is a prerequisite for the activation of the downstream gene . Although in this case, transcription arises from the same strand as that of the protein-coding gene, divergent transcription could modify local chromatin in a similar manner.
Pervasive transcription around a promoter region could also act to create a reservoir of RNA polymerase, which would facilitate rapid activation of the protein-coding gene when required . In addition, transcription could promote upstream initiation through negative supercoiling of DNA , serve as a barrier to block the spread of repressive chromatin , or keep the promoter open to decrease stochastic variation of gene expression (minimizing its intrinsic transcriptional noise) .
Non-coding transcription can also be repressive, for instance by removing bound transcription factors from the protein-coding gene promoter or by competing for the same pool of general transcription factors (Figure 3 A). In the case of the TPI1 gene, the unstable ncRNA and TPI1 mRNA appear to originate from different pre-initiation complexes co-regulated by the same transcription activators. These transcripts would therefore compete for the same polymerases and accessory factors .
Non-coding transcripts themselves can also act to influence protein-coding gene expression by modifying the surrounding chromatin (Figure 3B). In these cases, ncRNA transcripts cause either activation or repression of the bidirectionally-oriented gene. For example, same-strand short RNA transcription takes place at promoters of non-transcribing genes targeted by the Polycomb repressive complex-2 (PRC2), a complex that catalyzes histone trimethylation (H3K27me3) . The ncRNAs recruit PRC2 to downstream genes, leading to a block in RNA polymerase II elongation . It has also been shown that a ncRNA transcribed from the promoter of the human cyclin D1 (CCND1) gene represses its expression. This is accomplished by the recruitment of an RNA-binding protein that represses the activating effect of two histone acetyltransferases on CCND1 .
Divergent ncRNAs generated from a shared bidirectional promoter can extend to overlap an upstream gene, especially in compact genomes. For example, in yeast, 55% of stable ncRNAs initiating from a bidirectional promoter overlap an upstream ORF on the opposite strand, forming antisense transcripts  (Figure 3C). Antisense expression has been shown to repress the expression of sense protein-coding genes, for instance through transcriptional interference or histone modifications [79–83]. Possible regulatory mechanisms of overlapping antisense transcription have been extensively reviewed [2, 84, 85]. Here, we will focus on the consequences of antisense repression on protein-coding genes in the context of their initiation from bidirectional promoters.
One important consequence of bidirectional promoters is that the regulation of a single gene could spread to neighboring genes through the divergently transcribed ncRNA (Figure 3C). An example of a ncRNA that couples two tandem protein-coding genes is the transcript SUT719 in yeast, which initiates from a bidirectional promoter shared with GAL80 and extends to overlap the upstream SUR7 gene promoter . SUT719 is co-regulated with GAL80 via a Gal4 binding site in the bidirectional promoter. SUT719 expression inhibits SUR7 expression. Thus, the activation signal for GAL80 directly represses SUR7 expression through the production of an antisense transcript from the bidirectional promoter. This coupling of expression through local spreading of regulatory signals by ncRNAs from bidirectional promoters could be a general feature of the transcriptome, especially in species with a compact genome. In yeast, the ncRNA-mediated coupling of two tandem genes (such as SUR7 and GAL80) has been shown to decrease their co-expression genome-wide . Furthermore, regional signal spreading could extend to influence more distal locations. In fact, multiple examples in mammalian genomes have been described where at least three transcription units are connected to each other by overlapping transcripts or divergent promoters . Such ‘chaining’ of events could result in linking of up to 11 transcription units . Hence, the local organization of transcription units could have a significant impact on genome-wide transcriptional regulation, complexity and evolution.
While local regulatory effects can occur independently of transcript stability, distal effects are typically thought to be more dependent on stable transcript species. However, it has been observed that distal genomic regions physically interact in 3D nuclear space [44, 87, 88]. This creates local environments where distal genomic regions are subject to similar influences, such as chromatin modifiers, transcription factors, or transcripts. In these environments, the act of ncRNA transcription may also affect genes distally located on the same or different chromosomes.
ncRNAs generated by bidirectional promoters could have enhancer-like functions to regulate distal genes (Figure 3D). A study of mouse cortical neurons discovered a novel class of bidirectional ncRNAs (enhancer RNAs) that are transcribed from enhancer domains . These bidirectionally transcribed ncRNAs are generally short and non-polyadenylated, and it has been suggested that their synthesis is important for activation of the corresponding ORFs . Moreover, it has been shown that the depletion of long intergenic non-coding RNAs (lincRNAs) initiating from a bidirectional promoter in humans decreases the expression of distal protein-coding genes, along with other lincRNAs .
Further evidence exists for long-range regulation of protein-coding genes by ncRNAs. Separate transfection of sense and antisense short RNAs transcribed from both strands either upstream or overlapping the first exon of c-MYC results in reduced expression of c-MYC mRNA in HeLa cells . In yeast, ectopical expression of antisense ncRNAs represses the expression of the Ty1 retrotransposons  and the phosphate transporter PHO84 . Although there is no direct evidence for bidirectionality of its promoter, HOTAIR, a 2.2kb ncRNA transcribed in antisense orientation to the HOXC gene in humans, represses the distal HOXD gene cluster through an interaction with chromatin modifiers .
In summary, the consequences of transcription at bidirectional promoters range from local effects enabled by the transcription process itself, to distal effects mediated by the transcripts generated. Coupled transcription from bidirectional promoters is therefore used by the cell as an additional mechanism for the regulation of gene expression.
Much of the increased phenotypic complexity of higher eukaryotes is thought to arise from gene regulation rather than an increase in protein-coding gene numbers [93–95]. In addition to regulation mediated by distantly acting regulatory regions, expression of ncRNAs close to gene promoter regions provides a convenient means to locally regulate and fine-tune gene expression levels by exploiting shared chromatin, shared sequence, or sequence complementarity. While some of the non-coding transcription generated at protein-coding gene promoters may simply be transcriptional noise, there is mounting evidence that some of it is involved in gene regulation.
The interleaved, overlapping organization of transcripts has considerable consequences for research practices and genetics studies in general: mutating a region could affect not only the gene of interest, but also its associated non-coding transcripts and thus also nearby genes (e.g. ). The complex regulatory architecture also means that, when mapping a phenotype to a certain genomic locus, it is necessary to consider ncRNAs and bidirectional promoters as causes for phenotypic variability.
Several outstanding questions remain regarding the scope and functional consequences of promoter-associated non-coding transcription (Box 3). Further discovery and classification of these transcripts based on ncRNA features, subcellular localization, and correlation with protein-coding gene expression levels, promise to provide further insight into their functional consequences on cellular and organismal phenotypes.
We thank Raeka Aiyar, Wolfgang Huber, Julien Gagneur, Zhenyu Xu and Joël Savard for critical comments on the manuscript. This work was supported by grants to L.M.S. from the National Institutes of Health and the Deutsche Forschungsgemeinschaft. V.P. is supported by an EMBO postdoctoral Fellowship.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.