To study the relationship between transcriptional programs and chromatin organization, we examined DNA-encoded nucleosome organization over the promoter regions of the gene sets in each of the three categories, both experimentally and using a computational model of the nucleosome sequence preferences22
. These sequence preferences are represented by a probability distribution over nucleosome-length sequences, estimated from a large set of fully sequenced in vivo nucleosomes from S. cerevisiae
. The model uses this distribution to compute the probability that each basepair in the genome is covered by a nucleosome, in an assumed equilibrium between all competing nucleosome configurations. Using a cross-validation scheme, this sequence-based model was shown to be highly predictive of the experimentally measured nucleosome organization, suggesting that nucleosome organization is partly encoded in cis
by the DNA, and that we can reliably use this model to examine the DNA-encoded nucleosome organization (see ref.22
for an overview and evaluation of this model).
We used this model to compute the occupancy over the nucleosome-depleted region of every promoter in each of the two species. We focused on the 200 basepairs upstream of the translation start site, since in vivo measurements of nucleosome occupancy showed that promoters exhibit a stereotyped depleted region of length ~100-150bp within the 200bp upstream of the translation start site4,9-11,14
. We defined the occupancy over this promoter nucleosome-depleted region (henceforth termed “PNDR”) as the lowest average nucleosome occupancy across any 100 basepair region in the 200 basepairs upstream of the translation start site. Other parameter choices that we tested for the region (in the range of 100-150bp for the width of the least-occupied region and 200-400bp for the overall length of the upstream region) resulted in equivalent results. Thus, when the PNDR score of each gene is computed by the model, it represents a predicted measure of the degree to which the gene's promoter encodes an open (nucleosome-depleted) or closed (nucleosome-occupied) nucleosome organization.
To test whether the DNA sequences of promoters from a given gene set encode a relatively open or closed nucleosome organization, we compared, separately for each species, the PNDR scores of the gene set's promoters to the PNDR scores of all other promoters. Specifically, we ranked all promoters by their PNDR score and measured the relative ranking of the gene set's promoters using a normalized Mann-Whitney rank statistic, which is equal to the area under the curve23
(AUC) when plotting the fraction of the gene set's promoters above a given PNDR score versus the fraction of all other promoters above that PNDR score, for all possible PNDR values (). In this measure, a gene set with a relatively closed promoter organization, in which every promoter has a PNDR score above that of every other promoter, will receive an AUC score of 1. A gene set in which every promoter has a PNDR score below that of every other promoter will receive an AUC score of 0 (relatively open nucleosome organization), and a gene set composed of randomly selected promoters set will receive, on average, an AUC score of 0.5.
The expression divergence of cellular respiration genes is accompanied by changes in the DNA-encoded nucleosome organization of their promoters
For each gene set from the three categories above, defined solely by their expression profiles, we then compare the predicted PNDR AUC value in S. cerevisiae to the AUC in C. albicans (). Notably, in both species, the growth-related gene sets of category I, whose expression profile in both species is highly correlated to that of the CRP genes, have AUC scores significantly lower than all other gene sets (p<10-13 and p<10-9 in student t-test in S. cerevisiae and C. albicans, respectively), indicating that their promoters encode relatively open nucleosome architectures. Conversely, the condition-specific gene sets of category II, in which expression is anti-correlated to that of the CRP genes in both species, have AUC scores significantly higher than all other gene sets (p<10-5 and p<10-18), indicating that their promoters encode relatively closed nucleosome architectures. These results suggest that both S. cerevisiae and C. albicans preserve a system-level relationship between transcriptional programs and DNA-encoded nucleosome organizations, whereby promoters of growth-related genes encode relatively open nucleosome organizations, while promoters of condition-specific genes encode relatively closed nucleosome organizations.
In contrast to the largely conserved nucleosome organization of gene sets from the first two categories, the aerobic cellular respiration gene sets of category III exhibit many changes between the two species in the DNA-encoded nucleosome organization over their promoters. In C. albicans, aerobic respiration gene sets (category III) have AUC values significantly lower than all other gene sets (p<0.005 in student t-test) and thus their promoters encode relatively open nucleosome organizations, while in S. cerevisiae, these aerobic respiration gene sets have AUC values significantly above all other gene sets (p<10-6) and thus their promoters encode relatively closed chromatin architectures. Notably, these changes in the DNA-encoded nucleosome organization are coupled to the expression divergence that the category III gene sets exhibit between the two species, in a manner that may facilitate the transcriptional program of each species. Category III gene sets, which have higher expression correlation to growth-related genes in C. albicans than in S. cerevisiae, encode a relatively open nucleosome organization in C. albicans, in accordance with the trend observed for growth-related gene sets (category I), and a relatively closed nucleosome organization in S. cerevisiae, in accordance with the trend observed for gene sets whose expression is anti-correlated to growth-related genes (category II).
These results demonstrate that a global relationship between transcriptional programs and the DNA-encoded nucleosome organizations is remarkably conserved across two yeast species, even in the presence of expression divergence. Our results thus suggest a conserved design principle of transcriptional regulation in yeast, whereby the default repression of condition-specific genes (like the aerobic respiration genes of S. cerevisiae) is facilitated by the relatively closed nucleosome organization encoded over their promoters. In conditions where activation of these genes is required, this repression is actively alleviated, presumably by the combined action of transcription factors and chromatin remodeling complexes. In contrast, for growth-related genes most commonly used by the organism (like the aerobic respiration genes of C. albicans), the repression by nucleosomes is by default alleviated through the encoding of relatively open nucleosome organizations over their promoters. We note that although this global trend is strong in our analysis, it clearly does not apply to every growth-related or condition-specific gene set or individual gene within them, since some of these gene sets exhibit moderate AUC values.
The same behavior is also evident when we create a single gene set for each of the three categories, consisting of all the genes from the gene sets of that category. In both species, when we plot the average nucleosome occupancy predicted by the model across the promoters, we find stronger predicted nucleosome depletion (lower PNDR score) in category I promoters relative to category II promoters (p<10-6 and p<10-9 in student t-test in S. cerevisiae and C. albicans, respectively). However, the cellular respiration promoters from category III differ in their average nucleosome occupancy between the two species, such that in S. cerevisiae, they encode the most closed nucleosome organization of all three categories (), but in C. albicans, they encode the most open nucleosome organization of all three categories (). Indeed, the category III promoters have a significantly more closed nucleosome organization in S. cerevisiae than in C. albicans, since the average difference in the predicted PNDR score between the two species is significantly higher than the difference obtained when randomly choosing the same number of promoters (p<10-4, 10,000 permutation tests).
The DNA-encoded nucleosome organization of cellular respiration promoters has diverged between S. cerevisiae and C. albicans