RNA polymerase II (RNAPII) transcription of DNA is an orchestrated process subject to regulation at numerous levels. When this process begins, RNAPII must bind to promoter DNA, initiate transcription, and transition to an elongation-state compatible with passage through nucleosomes. These transitions require concerted action by many protein complexes and are accompanied by changes in local chromatin structure, including covalent modification and ATP-dependent remodeling (1
We have previously noted the presence of short RNAs in embryonic stem (ES) cell lines that were located near TSSs of protein-coding genes and not associated with known non-coding RNAs (2
). To further investigate these low abundance RNAs, 8.4 million murine short RNA reads were analyzed (3
); 7.3 million were derived from ES cells, and 1.1 million from differentiated cell types (4
). ~42,000 of these reads, defined here as TSS-associated RNAs (TSSa-RNAs), uniquely mapped within 1.5 kb of protein-coding gene TSSs (, Table S1
). Multiple RNAs frequently associated with single TSSs (). TSSa-RNAs were found associated with over half of all mouse genes and were detected in all cell types examined (Figure S1
). TSSa-RNAs were also found in Dicer-/-
ES cells suggesting they are not Dicer products (Figure S1F
). Sequenced TSSa-RNAs are most frequently 17 nucleotides (nt) long, with a mean length of 20 nt (Fig S2
Figure 1 The distribution of TSSa-RNAs around TSSs shows divergent transcription. (A) Histogram of the distance from each TSSa-RNA to all associated gene TSSs (4). Counts of TSSa-RNA 5′ positions relative to gene TSSs are binned in 20 nucleotide windows. (more ...)
TSSa-RNAs surround promoters in divergent orientations. Sense TSSa-RNAs map downstream of the associated gene TSS, overlapping genic transcripts and peaking in abundance between +0 and +50 nucleotides downstream of the TSS. Surprisingly, 40% of TSSa-RNAs map upstream of the TSS and are oriented in the anti-sense direction relative to their associated genes, peaking between nucleotides -100 and -300. (). Sense and anti-sense TSSa-RNAs were found associated with overlapping sets of 8,115 and 6,331 gene promoters, respectively (Table S2
). This distribution is not dependent on either head-to-head gene pairs or genes with multiple TSSs, nor is it seen in intergenic regions or at gene 3′ ends (Figure S3, S4
A majority (59%) of ES cell TSSa-RNA associated genes have both sense and anti-sense TSSa-RNAs, indicating that individual TSSs produce both RNA sub-types (Figure S3E-F
). Based on their direction and position relative to TSSs, we hypothesize that sense and anti-sense TSSa-RNAs arise from divergent transcription, defined as non-overlapping transcription initiation events that proceed in opposite directions from the TSS. Divergent transcription is likely a common feature of mammalian TSSs given the presence of TSSa-RNAs in all cell types examined in this study.
TSSa-RNAs associate with genes expressed at varying levels in ES cells, but are biased towards higher levels of gene expression. TSSa-RNAs were found at the majority of highly and moderately expressed genes (, S5
) and 80% were associated with CpG island promoters (Table S2
). Additionally, the number of TSSa-RNA observations per gene correlates positively with gene expression levels, with a notable increase in the sense:anti-sense ratio found at the highest levels of expression (). This increase suggests that a fraction of these reads from the most active genes arise from mRNA turnover.
Figure 2 In ES cells, TSSa-RNA associated genes are primarily expressed. (A) ES cell expression data was separated into 4 bins based on Log2 signal intensity levels; off = 1-4, low = 5-8, med = 6-12, and high ≥ 13 (9). Gene counts for each gene expression (more ...)
While typical RNAPII transcripts have a significant bias towards G at their 5′ ends, TSSa-RNAs show a nearly random 5′ nucleotide distribution (4
, Table S3
). This distribution difference strongly suggests that the 5′ most base of the TSSa-RNAs does not represent the initial nucleotide transcribed by RNAPII.
Based on cloning levels, a TSSa-RNA sequence is estimated to be present at ~1 molecule per 10 cells (4
). Therefore, an enrichment procedure was developed to determine the nature of the short RNA species surrounding TSSa-RNA associated genes (4
). Sequenced 21 nt sense and anti-sense TSSa-RNAs associated with Rnf12 or Ccdc52, respectively, were not detected as unique species in ES cells. Instead, species between 20 and 90 nucleotides were detected at levels estimated to be greater than 10 molecules per cell (4
) (). Similar sized fragments were not found in Hela cell RNA samples using the same sequence probes, demonstrating specificity of the procedure (). Northern analysis for 2 other TSSa-RNA associated genes showed similar results (Figure S6, S7
). We suggest that 20-90 nt transcripts are the dominant short RNA species from these two promoters and the TSSa-RNAs likely represent no more than 10% of the total associated transcripts.
Figure 3 Transcripts from TSSa-RNA associated regions are primarily 20-90 nts long. (A) Map of the sense TSSa-RNA Rnf12 region. (B) Northern analysis for the Rnf12 sense TSSa-RNA using probe 1 in A. Lane 1 is a 10 bp ladder. Lanes 2-5 are detection controls with (more ...)
To further classify promoters that produce TSSa-RNAs, and by inference, promoters that show evidence of divergent transcription, we examined their local chromatin environment using chromatin immunoprecipitation coupled with DNA sequencing (ChIP-seq) (3
). TSSa-RNA associated promoters are enriched in bound RNAPII and H3K4me3 modified chromatin in ES cells (). ~90% of TSSa-RNA associated genes show H3K4me3-modified nucleosomes at their promoters, as compared to ~60% for all mouse genes (). TSSa-RNA associated genes also show a ~3-fold enrichment in promoter proximal RNAPII over all genes (). In contrast, TSSa-RNA associated genes are depleted of the Polycomb component Suz12 ().
Figure 4 Relationship between TSSa-RNAs and chromatin structure. (A) Percentage of genes associated with indicated chromatin marks. T-test gives p-values < 2.2e-16 for all marks. (B) Schematic of factor binding site mapping using forward and reverse Chip-seq (more ...)
Composite profiles of ChIP-seq data were used to determine RNAPII and histone modification positions relative to TSS. These profiles revealed a striking correlation with sense and anti-sense TSSa-RNA peaks. In such analyses, the midpoint between the forward and reverse ChIP-seq read maxima defines the average DNA binding site for a factor () (3
). At TSSa-RNA associated genes, two distinct peaks for RNAPII are detectable with a spacing of several hundred base pairs (). A sharp RNAPII peak just downstream of the TSS lies directly over the sense TSSa-RNA peak (). A second RNAPII peak, upstream of the first, is more diffuse but again lies directly over the anti-sense TSSa-RNA peak (). The co-occurrence with anti-sense TSSa-RNAs strongly suggests that the upstream peak of RNAPII is indicative of divergent transcription rather than sense initiation upstream of the TSS, as has been proposed (6
H3K4me3-modified nucleosome alignment with respect to the TSS shows peaks flanking the TSSa-RNA and RNAPII maxima, consistent with H3K4 methylation at the nucleosomes immediately upstream and downstream of TSSs (). These flanking peaks suggest that divergently paused RNAP II complexes may recruit H3K4 methyltransferase activity to mark active promoter boundaries. In contrast to the dual peaks of RNAPII and H3K4me3 surrounding TSSs, H3K79me2, a chromatin mark found over RNAPII elongation regions, is solely enriched in the direction of productive transcription (). These observations suggest that although divergent transcription initiation is widespread, productive elongation by RNAPII occurs primarily unidirectionally, downstream of TSSs.
Sense and anti-sense TSSa-RNAs with bound RNAPII are found at a surprisingly large number of mammalian promoters, suggesting that divergent initiation by RNAPII at TSSs is a general feature of transcriptional processes. Supporting this hypothesis, genome-wide nuclear run-on assays by Core et al. show divergent transcripts arise from transcriptionally engaged RNAPII at many genes in human fibroblasts.
Because TSSa-RNAs do not represent the 5′ end of transcripts, they likely mark regions of RNAPII pausing rather than initiation. Pausing has been observed at many genes, most notably Drosophila
Hsp70, where it maintains RNAPII in a state poised for activation upon heat shock (7
). RNAPII has been shown to pause 20-50 nt downstream of the TSS (7
). The results presented here now suggest the presence of anti-sense paused RNAPII upstream of many TSSs. The position of paused, anti-sense RNAPII centers around 250 nt upstream of the TSS as inferred by the presence of bound RNAPII and anti-sense short RNAs co-localizing at this location. Considering that chromatin marks associated with elongating RNAPII are only found downstream of TSSs () (8
), it appears that anti-sense RNAPII frequently does not elongate after TSSa-RNA production. This suggests the existence of an undefined mechanism that discriminates between the sense and anti-sense polymerase for productive elongation.
RNAPII initiation complex polarity at promoters is thought to be established by TFIID/TBP complex binding together with TFIIB (11
). RNAPII/TFIIF binding and DNA unwinding by the TFIIH helicase then gives rise to the open pre-initiation complex (7
). The prevalence of divergently oriented RNAPII at most promoters suggests a more complex situation. We hypothesize that transcription factors first nucleate a sense oriented pre-initiation complex at the TSS. Transcription by this complex generates at least two signals that could subsequently promote upstream anti-sense paused polymerase. First, the RNAPII carboxy-terminal domain and other initiation complex components can activate transcription when tethered to DNA, suggesting that the sense complex may promote anti-sense pre-initiation complex formation in the upstream region (12
). Secondly, as RNAPII elongates the sense transcript, negative supercoiling of the DNA will occur upstream, perhaps promoting the anti-sense initiation process (13
). This divergent transcription would structure chromatin and nascent RNA at the TSS for subsequent regulation.