To explore the range of locations where polymerase II accumulates across the genome, we performed chromatin immunoprecipitation (ChIP) from HeLa S3 cells, and profiled the purified DNA using an oligonucleotide-tiled microarray interrogating the Encyclopedia of DNA Elements (ENCODE) regions [11
] covering 471 known genes. Two antibodies were used, 8WG16 and 4H8, which recognize the hypophosphorylated (PolIIa) or a phosphorylation-independent state of the CTD of polymerase II (PolII), respectively. Thus, the 4H8 antibody is recognizing the total polymerase II population. Isolated DNA was amplified using a multiple displacement amplification (MDA) strategy (see Materials and methods) [12
To identify sites of enrichment, we used a non-parametric approach generalizing the Wilcoxon signed-rank test [13
]. Signals across 1,000 nucleotides were used to determine a p
-value for each probe. Probes were filtered for uniqueness within the bandwidth. Probes with p
-values below 10-4
were selected for further analysis because this threshold has a low false-positive rate as determined by PCR analysis (Figure ). With these parameters, the hypophosphorylated-specific anti-PolIIa antibody reveals 102 occupied sites, whereas the phosphorylation-independent antibody shows 550 sites (Table ).
Figure 1 Enrichment of selected genomic regions in ChIP. (a) PolII ChIP; (b) PolIIa ChIP. Real-time PCR relative enrichment ratios for selected regions are found to be enriched more often with p-values below 10-4. These regions include both intra- and intergenic (more ...)
Summary of RNA polymerase II locations
RNA polymerase II has distinct landscapes across each gene. Figure shows representative genes with polymerase enrichments. PolIIa is highly enriched at transcription initiation sites. On the other hand, PolII shows gene-specific landscapes with the strongest enrichments at exons within actively transcribed loci. Active genes reveal lower p-values across the gene compared with intergenic or inactive genes (compare Figure and ), indicating a relative absence of polymerase II from the nontranscribed regions. Some smaller genes with high exon density, such as SF1, reveal significant polymerase signal across almost the entire locus (Figure ). Distinct accumulations are observed with significant p-values around exons for both SF1 and KIAA1932. In the KIAA1932 gene, PolII is enriched at a subset of constitutively and alternatively spliced exons (Figure ). For some genes, RNA polymerase II is enriched at relatively few locations within the gene (Figure ).
Figure 2 RNA polymerase II shows a variety of gene-specific enrichment patterns. Graphs plot 10log(p-value) mapped to chromosome position with the significant p-values greater than 40 indicated by the rectangle blocks below the graph. Values are plotted at every (more ...)
An important question is to determine if the polymerase II sites are indicative of active transcription. We addressed this in multiple ways. First, microarray expression profiling of the mRNA with Affymetrix U133 Plus 2 chips confirms that many of the RNA polymerase II-associated genes are actively expressed in HeLa cells, as seen in a plot of mRNA expression level versus p-value in Figure . Genes with significant RNA polymerase II enrichment are biased towards genes with higher mRNA levels. Figure also shows that some genes have apparently high mRNA levels but no significant levels of PolII or PolIIa. This could be due to very low transcription levels but high mRNA stability. Second, we measured RNA from the same HeLa cells on the ENCODE tiled arrays. We observe that 34% of the PolII sites overlap with RNA signal (compared to approximately 8% expected at random) and 50% of the PolII locations are within 1 kb of some RNA signal (compared to 13% expected at random). Many sites where small pieces of RNA are synthesized, such as small exons, may be missed as a result of the spacing of the oligonucleotide probes and the imperfect nature of the probes. Third, many of the PolII and PolIIa sites overlap with annotated expressed sequence tags (ESTs) and mRNAs. Eighty-seven percent of the PolII-enriched and 88% of the PolIIa-enriched locations overlap with EST regions, compared to 31% and 44% expected at random, respectively. Lastly, reverse transcriptase PCR checks of KIAA1932 and DKC1 indicate that these genes are being expressed (data not shown). These data suggest that RNA polymerase II sites are biased towards regions of active transcription and that determining sites of enrichment of RNA polymerase II is an indicator of transcription.
Figure 3 Different RNA polymerase states show distinct exon biases. Pie charts representing the percentage of exons in each category at RNA polymerase enrichment locations. These include exons from enrichment locations that include more than one exon. PolIIa is (more ...)
Levels of RNA polymerase II enrichment at internal exons can vary between genes. To examine whether these patterns are influenced by expression levels, two categories were created: genes with multiple PolII enrichments at internal exons; and genes with PolII at one or zero internal exons. When compared to the mRNA levels, there is no significant difference between the two categories, suggesting that the number of PolII sites across the gene does not vary significantly with RNA levels. Genes with observable PolII enrichment at internal exons are correlated with higher mRNA levels on the expression array. This is consistent with reports proposing the use of PolII ChIP to monitor gene expression [14
]. Therefore, the number of PolII sites at internal exons may reflect different levels of transcription elongation control and not just the sensitivity of the experiment.
Distinct from the hypophosphorylation-specific antibody, the phosphorylation-independent antibody reveals diverse enrichment locations for PolII. In total, 74% of the identified PolII locations are near an annotated knownGene, RefSeq, or genscan exon as summarized in Table (see Additional data file 2 for a list of PolII genscan exon locations). Unlike PolIIa, PolII sites are distributed between the 5' and 3' ends of genes, with a slight bias towards terminal exons over initiating exons (Figure ). This is probably reflecting the stalling of PolII during the coupled processes of transcription termination and 3'-end processing [15
]. For some genes, significant PolII signal is observed more than 1 kb past the terminal exon, which might indicate transcription of the longer pre-mRNA before 3'-end cleavage and polyadenylation [16
]. Figure shows two representative genes with significant PolII enrichment past the terminal exon.
Figure 4 Low p-value PolII and PolIIa enrichments are biased towards higher mRNA levels. The plot depicts the observed intensity from Affymetrix U133 Plus 2 chips compared with different p-values of PolII (white) and PolIIa (gray). Some genes with no significant (more ...)
Figure 5 PolII enrichment is not always within annotated gene boundaries. Views are from the UCSC Genome Browser genome version HG16. PolIIa is in black and PolII is in blue with four rows for each, representing the data at different p-values: p < 10-5 (more ...)
Most of the hypophosphorylated PolIIa locations at internal exons also overlap a transcription initiation site, as the internal exon in question is often the second exon in the gene. Only two enrichment sites overlap with an internal exon without also being near the first exon of a transcript. One of these is at a CpG island in the MCF2L gene and the other may be an alternative transcription initiation site as annotated in the HG17 assembly at the beginning of the ITGB4BP gene. To classify the remaining sites within introns or in intergenic regions, enrichment sites were compared to other gene databases. As summarized in Table , four PolIIa sites are in introns, but three of these are within resolution of annotated or predicted exons, leaving only one location not overlapping an exon of some kind. There are 28 hypophosphorylated polymerase sites not in a RefSeq gene region. After following a similar filtering approach, only 14 sites remain that are not near a putative exon. Thus, only 14% of PolIIa-enriched locations do not overlap with a known exon or actively transcribed region. Additional data file 2 lists PolIIa sites at predicted exons that are probably newly identified transcription initiation locations in HeLa cells. Figure shows two examples of PolII and RNA signal at new sites of transcription. From the pattern of enrichments it is probable that many of these predicted exons are real and are transcription initiation locations, given the observed strong bias of the 8WG16 antibody for transcription initiation locations in well annotated genes.
To determine the generality of these observations, all RNA polymerase II occupancy sites were compared with the known genes and RefSeq databases, version HG16. PolIIa is highly enriched for the first exons around transcription initiation sites (Figure ) representing 77 of 551 known genes in HG16 on the array (see Additional data file 1 for the entire lists).
Elongation control is a common transcriptional regulation mechanism believed to affect a wide range of functional gene classes [1
]. In particular, RNA polymerase II pausing has been proposed to be associated with alternative splicing, [2
]. To determine if there is a bias for alternative exons, we counted all the annotated alternatively spliced exons in the knownGene database and determined the distribution of PolII enrichment locations on them. PolII is enriched at 57% of the annotated alternatively spliced exons of the active genes compared to 37% of annotated actively transcribed constitutively expressed exons. We also examined the distribution of all PolII p
-values on different types of exons. Each exon was mapped to the smallest p
-value ChIP-enriched site that overlaps the exon. The cassette exons are found to be more significantly associated with smaller p
-values compared to constitutively expressed exons according to the two-sample Kolmogorov-Smirnov test with a two sided p
-value of less than 0.0035.
One attractive hypothesis is that sites of exon enrichment may reflect weaker splice sites where PolII stalls during splice site recognition. Using two different empirical methods to estimate splice site strength, no significant differences are observed between the exons overlapping PolII and those that do not [17
]. Alternatively, some of the annotated constitutively expressed exons may actually be subject to alternative splicing decisions. Kampa et al
. suggest that the levels of alternative splicing are much higher than commonly believed and annotated in the human genome from their examination of expression on tiled arrays [19
]. Consistent with these findings, RNA polymerase II sites may be predicting which exons are being co-transcriptionally alternatively spliced.
To determine if there is any pattern for the 120 PolII enrichment sites that are in RefSeq introns, we compared these sites to knownGene, genscan, geneid, and sgpGene databases and find 31 within resolution of putative exons. Of the remaining 89, 57 are in genes with PolII enrichment sites that also overlap exons, suggesting that they are actively transcribed genes. No clear intronic positional bias is observed.
In conclusion, we have identified new sites of RNA polymerase II accumulation across hundreds of genes in mammalian cells. The large majority of polymerase II-enriched locations are at actively transcribed exons with a bias towards annotated alternatively spliced exons. Many of the PolII sites at annotated constitutively expressed exons may be sites of alternative splicing. Whatever the eventual splicing decision, these observations suggest that events around exons slow transcription elongation. A recent study suggests that even general splicing factors may slow elongation [20
]. Stalling of RNA polymerase II near exons may function to slow RNA synthesis in order to wait for the competition of myriad splicing signals to be resolved in order to define the exon [21
]. These ChIP data identify where these states of RNA polymerase II are localizing across the ENCODE regions.
Across genes, these data are consistent with the hypothesis of transcriptional pausing at particular locations. Alternatively, it is possible that RNA polymerase II is rearranging during transcription such that the epitope is only accessible around exons. Thus, the conformation of polymerase II may be changing and not the transcription rate. Nonetheless, it is interesting that the majority of observable elongating polymerase II accumulates around exons, suggesting that a major feature of transcription elongation control is coupling to pre-mRNA processing.
These observations differ from those observed in intronless genes typically found in prokaryotes and yeast where a more uniform PolII enrichment is observed across genes [16
]. What appears to be conserved is PolII accumulation in coding regions compared to intronic regions. These data highlight the complexity and gene-specific nature of transcription regulation not only at transcription initiation and termination locations but at specific exons. Together, these observations suggest that a major feature of transcription elongation control in mammalian cells is exon definition. Thus, these data provide new insights into the coordination of transcription and pre-mRNA processing in mammalian cells.