Using Gene Ontology classifications, we identified nearly 400 proteins that potentially regulate transcription. From this list, we categorized proteins according to what protein complex and what stage of the transcription cycle they likely belong. For this purpose, we divided the transcription cycle into four stages ( and Table S1
): 1) “Orchestration”, representing sequence-specific transcriptional activators and repressors; 2) “Access”, including histones and chromatin regulators; 3) “Initiation”, representing the general transcription factors and their regulators; and 4) “Elongation”, including RNA Polymerase (Pol) II and regulators of transcription elongation and termination.
Scope and data display of genome-wide location analysis
Occupancy levels were examined at ~20,000 locations across the yeast genome, primarily at canonical locations where upstream activating sequences (UAS), transcriptional start sites (TSS), and the 3' ends of open read frames (ORFs) reside (). Two probes were also located 30 bp and 290 bp upstream of the annotated tRNA transcription start sites. Several hundred negative control probes were placed between two convergently transcribed genes for the purpose of centering the bulk data. Controls and validation of the accuracy and resolution of these arrays was described elsewhere (Venters and Pugh, 2009a
). Importantly, the prior study demonstrated that the probes allow for interrogation of entire promoter regions with mapping uncertainties of ±40 bp. Thus, binding to UAS regions can be distinguished from core promoter regions, and promoter regions can be distinguished from the 3’ ends of genes.
Occupancy levels reflect log2
fold over background (hybridization signal of TAP-tagged regulator / hybridization signal for a set of samples lacking a TAP tag), and are reported for individual probes in Table S2
for cells grown at 25°C, and in Table S3
for heat shocked cells. Each normalized log2
dataset was centered by subtracting the median value in 344 control T-T regions, which represent intergenic regions between convergently transcribed genes. Thus, all values are relative to such regions. The occupancy level of each of the ~200 regulators at any gene can also be queried in a browser format at http://atlas.bx.psu.edu
(illustrated in ). Since a priori
knowledge of where a regulator binds within a promoter region (if at all) was not known for all regulators, promoter occupancy levels were taken to be the higher value at either the UAS or TSS probes (reported in Table S4
). Based on an error model using an untagged control strain (Li et al., 2008
), occupancy values meeting a 5% FDR cut-off were deemed to be significant. A total of 338,704 protein-DNA interactions were determined to be significant (231,704 at mRNA promoters; 99,216 at the 3' end of ORFs; and 7,783 at tRNA genes) for regulators (15% of all possible interactions).
More diverse regulation of SAGA-dominated genes than TFIID-dominated genes
Approximately 93% of the 5,866 mRNA promoter regions had significant occupancy of at least ten regulators (, which includes histones), and ~10% were significantly occupied by at least 75 proteins. As only half of all proteins were examined, we expect the actual number of bound proteins to be approximately twice that. Based upon existing models on transcription initiation it is reasonable to expect at least 75 (or 150) proteins occupying a single promoter, since GTFs, Pol II, Mediator, and TFIID/SAGA are commonly required for transcription, and they alone contribute over half of the proteins. Activators, chromatin regulators, and elongation regulators, many of which are large multisubunit complexes, would make up the remaining. Indeed, a putative gene of unknown function (YGR146C-A) had as many as 145 proteins detected. An important caveat is that these location profiling experiments of cell populations do not distinguish between regulators that simultaneously co-occupy a genomic region from regulators that bind in a mutually exclusive manner or at a different temporal order in the transcription cycle or cell cycle.
Promoter occupancy at 25°C for individual genes and groups of genes
displays a heat map representation that quantifies the occupancy levels at selected genes. The number of bound regulators varied widely from gene to gene. For example, two highly transcribed genes from the glycolysis pathway, PGK1
contained 57 and 26 bound complexes (98 and 42 proteins), respectively. Two lowly transcribed genes, HO
, had fewer regulators bound, and those regulators that were bound had relatively low occupancy levels. Table S5
lists a number of example genes and the complexes present at each.
To assess commonalities among similarly regulated genes, we examined regulator occupancy at 124 highly-active (>14 mRNA/hr) ribosomal proteins (RP) genes, which have highly coordinated regulation and are TFIID-dominated ( and Table S6
). We also examined 190 of the most active TFIID-dominated/TATA-less genes excluding RP genes, and compared them to the 41 most active SAGA-dominated/TATA-containing genes (a transcription frequency cut-off of 14 mRNA/hr was used for both groups). Similar trends were observed at other data thresholds (not shown). The percentage of those genes meeting the 5% FDR threshold for significant occupancy is shown in a heat map representation. In addition, the median occupancy level of each regulator (as a percent rank of those meeting the 5% FDR threshold) is indicated. Percent ranks, which vary between zero and 100, allowed occupancy to be compared (albeit imperfectly) between regulators. Without this scaling, occupancy levels vary widely between regulators due to differences in ChIP efficiency.
More so than other TFIID-dominated/TATA-less genes, we found a higher percentage of the RP genes occupied by the sequence-specific regulators (Orchestration group) Rap1, Ifh1, and Sfp1 confirming previous reports (Marion et al., 2004
; Rudra et al., 2005
; Yu et al., 2003
). The RP genes were comparatively depleted of the histone variant H2A.Z, along with the SWR1 complex, which is responsible for H2A.Z deposition.
SAGA-dominated/TATA-containing genes were occupied by a larger variety of regulators compared to TFIID-dominated/TATA-less genes (, with values reported in Table S4
). The median value of the fraction of SAGA-dominated genes occupied by each regulator was 24%, whereas for TFIID-dominated genes the median value was 18%. Moreover, SAGA-dominated genes tended to have higher occupancy levels of bound regulators (12% higher, as a percent rank, than for TFIID-dominated genes).
summarizes whether a regulator preferentially occupied active genes dominated by either the SAGA or TFIID pathway. For example, the NuA4 histone acetyltransferase, and TFIID-specific TAFs preferred the TFIID pathway, whereas other chromatin remodeling and histone modification complexes such as RSC, SWI/SNF, RAD6/BRE1, and RTT109 preferred the SAGA pathway. TAFs shared between SAGA and TFIID (Taf5 and Taf10), and the core transcription machinery, displayed no preference. However, certain TBP regulators such as Mot1 and TFIIA, and certain Pol II regulators such as elongation factors and Mediator also tended towards the SAGA pathway. These findings fit well with the notion that SAGA-dominated genes tend to be highly regulated and are more tailored to responding to different environmental cues than the housekeeping and less-variably regulated TFIID class (Basehoar et al., 2004
; Huisinga and Pugh, 2004
; Tirosh and Barkai, 2008
Regulators that are biased towards either the TFIID- or SAGA-pathway
Potential regulatory cross-talk in Pol II and III transcription
, the Pol III transcription machinery transcribes 274 tRNA genes (Dieci et al., 2007
). TFIIIC binds within each gene, immediately downstream of the TSS, and then recruits the TBP-containing TFIIIB complex just upstream of the TSS, which then recruits Pol III to the TSS. The core Pol III transcription machinery is largely distinct from that involved in Pol II regulation. Recent reports have found some unexpected regulators at tRNA genes, such as TFIIS and Nhp6a (Braglia et al., 2007
; Ghavi-Helm et al., 2008
). This led us to consider other regulators that may bind to tRNA genes, and so we screened the 202 datasets for regulators that occupy tRNA genes at 25°C ().
Regulators that occupy tRNA genes
In addition to the known Pol III transcription machinery (), a number of Pol II regulators stood out as having significant occupancy at tRNA genes (). The binding sites for these regulators were enriched upstream of tRNA start sites (), and include Fkh1, Reb1, and Yap6. Most of these regulators were also enriched at the RP genes. Their presence at genes that encode protein and RNA structural components of the translation machinery might provide a cross-polymerase mechanism to coordinate translation through the synthesis of ribosomes and tRNAs. Histone deacetylase Hda1 was also enriched at tRNA genes (). Inasmuch as tRNA genes can exert a negative transcriptional effect on adjacent Pol II genes (Hull et al., 1994
), deacetylation by Hda1 might generate hypoacetylated regions adjacent to tRNA genes making them refractory to Pol II transcription. illustrates a composite view regarding tRNA transcription regulator assembly for the regulators examined. An important caveat of this composite is that the indicated Pol II regulators may only occupy a small but statistically enriched subset of tRNA genes.
Functionally related regulators tend to work at the same genes
The analysis thus far indicates a preference of some regulatory complexes for distinct PIC assembly pathways. We next address which regulators might be functionally linked to each other by determining their co-occupancy at genes. If the set of regulator-bound genes is defined by a circle, then regulator co-occupancy is defined by the overlap of two circles in a Venn diagram. tabulates the results of >40,000 Venn diagrams (see Figure S1
for a high-resolution version of ), where the magnitude of percentage overlap is represented as a heatmap color-coded pixel (numerical values are presented in Table S7
). Each Venn pair is reported as two reciprocal pixels in a 202 × 202 matrix, reflecting the percentage overlap with respect to the two individual datasets. For example, 26% of all genes bound by Reb1 were also bound by Sua7. However, only 13% of all genes bound by Sua7 were also bound by Reb1. This matrix was then hierarchically clustered to reveal regulators that tend to display similar patterns of co-occupancy. Three clusters were examined further (denoted as a, b
, and c
in ). The significance of each overlap (P
-value) is reported in corresponding pixels in . The median transcription frequency for the same sets of genes is reported in . If two co-occupying regulators belonged to the same stage of the transcription cycle as illustrated in , then the corresponding pixel was color-coded as shown in . The purpose for this latter analysis was to assess whether stage-related regulators (represented by clustered pixels) tended to work at the same genes, or whether they were gene-specific.
was relatively small and consisted of general transcription regulators (GTFs, in the Initiation class) and Pol II subunits, and as expected, the overlapping sets of genes had high transcription frequencies (). Co-occupancy of these regulators at a common set of genes was verified more directly via standard gene-based clustering (Figure S2A
). Other components and regulators of the core transcription machinery occupied many cluster a
genes, but also occupied genes having other bound regulators, causing them to cluster apart from the core transcription machinery. The lack of a variety of regulators at cluster a
genes indicates that the genes which are most highly occupied by GTFs and most highly transcribed may have fewer regulatory interactions in common than at other genes. If many positive and negative regulators function to antagonize each other, this could result in a rich assortment of co-occupancy that produces a lower net output of transcription. As such, genes with high levels of PICs but few other regulators, may represent an “unattenuated”, and thus highly active, situation.
was enriched with regulators in the Access class (), including chromatin remodelers, chromatin modifying regulators, and chromatin binding proteins. Interestingly, in contrast to cluster a
, cluster b
tended to be lowly transcribed (), and contained generally repressive chromatin remodelers and histone deacetylases, such as Isw1a/b, Isw2, Cyc8/Ssn6, Hos1, and Hda1 (). The Xbp1 and Rfx1/Crt1 sequence-specific repressors were also enriched in cluster b
, suggesting that they may be functionally linked to these chromatin regulators. Thus, many chromatin regulators tend to occupy the same set of lowly expressed genes (also verified by gene-based clustering in Figure S2B
). The expression of such genes might be rate-limited by the combined repressive action of these chromatin regulators. We observed a substantial intersection of the regulators in clusters a
, and such genes tended to have an intermediate level of transcription (). We interpret this overlap to mean that many genes may have both repressive and activating proteins bound, although not temporally resolved, with the net output being an intermediate level of transcription.
Cluster c had a large membership of regulators from all stages of the transcription cycle (), and the corresponding genes tended to be lowly expressed (). We interpret the large cluster size to reflect a tendency of a large number of positive and negative regulators at all stages of the transcription cycle to work at many of the same genes, to produce a relatively low transcriptional output. This contrasts to cluster a, where apparently unbridled PICs produce high transcriptional outputs.
The co-occupancy data are displayed as a Cytoscape network in Figure S3A
(Shannon et al., 2003
), and those with stronger co-occupancies in Figure S3B
. Many of the sequence-specific regulators (“Orchestration” class shown in red) were located towards the periphery of the network, which likely reflects their gene-selective behavior, and thus had less overlap with other regulators. Ume6, Hho1, and Asf1 stood out as having a high degree of co-occupancy, and this was verified by standard gene-based clustering (Figure S3C
, cluster 2). Ume6 is a key regulator of early meiotic genes by repressing transcription during vegetative growth (Strich et al., 1994
). Asf1 is involved in chromatin assembly/disassembly during DNA replication and transcription (Schwabish and Struhl, 2006
; Tyler et al., 1999
). The presence of Asf1 at Ume6-repressed genes is consistent with the role Asf1 in establishing silenced chromatin (Krawitz et al., 2002
), perhaps in helping assemble the repressive linker histone H1 (Hho1).
Further exploration of the occupancy data is presented in Figure S4
and Table S7
, wherein we report on the overlapping membership (measured as –log10 P
-value) between regulator-occupied genes and genes that are in the upper or lower tenth percentile of a measured genome-wide property available in the public domain (e.g. expression changes in mutants, motif enrichment, and regulator occupancy levels). Over 2,300 relationships are presented for each regulator. The dataset provides a rich resource of empirical information about a regulator that is essentially untapped here.
Spatially distinct promoter organization of regulators associated with the SAGA and TFIID pathways
Understanding how the diverse repertoire of transcription regulators is organized in promoter regions can provide insight into the interplay of the transcription machinery and chromatin. Thus with microarray probes situated at the distal (“UAS”) and proximal (“TSS”) ends of each promoter (230 bp center-to-center distance), we interpolated the location of each significantly bound regulator, based upon an occupancy level weighting of each probe location (see Methods). The distributions of all bound locations relative to the nearest transcriptional start site were then plotted for each regulator as a heat map ( and Table S8
). The upper panel provides a summary list of those complexes that bind the distal versus core promoter region. Regulatory proteins appeared to separate out into distal and proximal promoter binding, with the dividing line around -120 relative to the TSS. Inasmuch as high density ChIP-chip microarrays produced similar genome-wide maps as the low density arrays described here (Figure S5
), it is likely that the distal and proximal occupancy domains were not an artifact of the microarray platform. As further support, a previous study using site-specific DNA cleavage probes tethered to the HIS4
promoter qualitatively agrees with the interpolated distribution profiles for Pol II and the GTFs (Miller and Hahn, 2006
). Nonetheless there are caveats. A regulator might not have a consensus location that resides between the two probes, or a regulator may not have any consensus location (relative to the TSS). In addition, patterns may be biased to reflect the most highly occupied locations.
Composite distribution of regulators across promoter regions
Complexes favoring the SAGA pathway ( and ) tended to reside in the distal promoter region where SAGA resides. For example, Mot1, Mediator, Histone H1, HOS1, RPD3, SSN6-TUP1, and SWI/SNF preferentially occupied the upstream promoter region. The SAGA pathway involves substantial chromatin regulation. Nearly three-quarters of the chromatin regulators (29 of 41 complexes) we examined tended to occupy the region closer to where the -1 nucleosome resides rather than where the +1 nucleosome resides, which implicates substantial regulation of the -1 nucleosome. In contrast, Nua4, SWR1, and SPT10,21, which are more enriched at TFIID-dominated genes tended to reside at core promoter along with TFIID or in the adjacent downstream region near the +1 nucleosome. Taken together, the spatial arrangement of SAGA versus TFIID suggests that these complementary PIC assembly pathways may have distinct spatial organizations at promoters.
Since the caveats associated with location mapping in promoter regions may be particularly problematic with elongation regulators, we mapped the locations for 20 regulators in the Elongation class by ChIP-seq ( and Figure S6
). Pol II and the Pop2 subunit of CCR4-NOT were enriched across the promoter. Ess1, which is a prolyl isomerase that acts on the Pol II C-terminal heptad repeat, and Bye1 which has genetic interactions with Ess1 (Wu et al., 2003
) were enriched within 100 bp of the TSS, suggesting that they might enter the elongation complex at the promoter region. Most other complexes, including components of the CCR-NOT, PAF, CTK1, and FACT complexes were enriched from 200-600 bp downstream of the TSS, possibly reflecting entry into the elongation complex as Pol II transitions from a serine-5 phosphorylated state to serine-2 phosphorylation. These results are in line with related studies reported elsewhere (Mayer et al., 2010
; Rahl et al., 2010
). Some of these regulators were also detected in the promoter region, which could reflect lower levels of interactions or fewer genes having those interactions.
Mobilization of the transcription machinery in response to acute heat shock differs in the SAGA vs TFIID pathways
In yeast the common environmental stress response, including the heat shock response (abrupt change from 25°C to 37°C), involves transient down-regulation of ~600 genes and up-regulation of ~300 genes within 15 minutes of the stress (Causton et al., 2001
). While the promoter association and dissociation of a number of regulators in response to heat shock have been determined (Harbison et al., 2004
; Lee et al., 2004
; Zanton and Pugh, 2004
), no study has comprehensively examined the genome-wide change in occupancy of the vast majority of regulatory complexes.
Changes in occupancy upon heat shock, were examined at promoter regions and the 3' end of ORFs (rows in , respectively, and Table S9
), and were K
-means clustered into seven groups. Within each region, regulators were separated by their classification into transcriptional stages (). Within each stage, individual regulators (columns) were hierarchically clustered. A list of complexes that increase/decrease in occupancy at heat shock induced/repressed genes can be found in Table S10
Assembly and disassembly of the transcription machinery in response to heat shock
Two clusters of heat shock induced genes were evident. Modestly induced genes (cluster 1 in ) tended to preferentially accumulate TFIID, SWR1, and RSC, which may reflect increased activity of housekeeping genes. Shifting cultures to 37°C actually increases growth rates, and this likely requires increased activity of housekeeping genes. Accordingly, cluster 1 genes are moderately transcribed at 25°C (4 mRNA/hr, far right panel). The most highly induced genes (cluster 2), which are enriched with SAGA-dominated stress-induced genes, saw the largest increase in occupancy of regulators such as SAGA, SWI/SNF, INO80, NuA4, Mediator, GTFs, NC2, Cyc8/Tup1, Pol II, Iws1/Spt6, FACT, Pcf11, COMPASS, Rad6/Bre1, and Set2. Some of these regulators act negatively, which may reflect their involvement in quenching the transient heat shock response. In addition, heat-shock enrichment of the elongation regulatory complexes were most evident in the body of the gene (), which is consistent with their role in events that are linked to transcription elongation and mRNA transport.
One group of genes (cluster 3) stood out because Pol II and a variety of elongation and termination regulators (Iws1/Spt6, FACT, Pcf11, Rtt103, Spt2, and Bye1) displayed a strong increase in occupancy in the promoter region but not in the 3’ ORF region. These genes were lowly transcribed at 25°C (3 mRNA/hr) and remained unchanged during the heat shock response. Furthermore, the enrichment pattern for two negative regulators of transcription elongation, Bye1 and Spt2 (Kruger and Herskowitz, 1991
; Roeder et al., 1985
; Wu et al., 2003
), mirrored the Pol II pattern, suggesting that these regulators may be involved in generating an elongation-refractory Pol II at the 5’ ends of these genes.
Cluster 4 stood out as having increased occupancy levels of the GTFs and Pol II at the 3’ end of the genes, but with little detectable RNA output. The basis for this needs to be further investigated, although it could reflect heat-shock induced production of unstable antisense transcription, as one potential example.
Occupancy changes at heat shock repressed genes (clusters 5-8 tended to involve many of the same regulators as seen at induced genes, but in the opposite direction. However, the repressed genes of cluster 5 also acquired the Xbp1 repressor, three histone deacetylases (Rpd3, Hda1, and Hos1), the Jhd1 demethylase, and Isw1a/b in the promoter region. This cluster was enriched with ribosome biosynthesis and rRNA maturation genes, and these genes are highly transcribed at 25°C (85 mRNA/hr). Thus the transient shut down of ribosome biosynthesis that accompanies heat shock appears to be an active shut-down by an influx of co-repressor complexes, rather than simply a loss of the transcription machinery.