In parallel to the genetic code for protein synthesis, a second layer of information is embedded in all RNA transcripts in the form of RNA structure. RNA structure influences practically every step in the gene expression program1. Yet the nature of most RNA structures or effects of sequence variation on structure are not known. Here we report the initial landscape and variation of RNA secondary structures (RSS) in a human family Trio, providing a comprehensive RSS map of human coding and noncoding RNAs. We identify unique RSS signatures that demarcate open reading frames, splicing junctions, and define authentic microRNA binding sites. Comparison of native deproteinized RNA isolated from cells versus refolded purified RNA suggests that the majority of the RSS information is encoded within RNA sequence. Over 1900 transcribed single nucleotide variants (~15% of all transcribed SNVs) alter local RNA structure. We discover simple sequence and spacing rules that determine the ability of point mutations to impact RSS. Selective depletion of RiboSNitches versus structurally synonymous variants at precise locations suggests selection for specific RNA shapes at thousands of sites, including 3’UTRs, binding sites of miRNAs and RNA binding proteins genome-wide. These results highlight the potentially broad contribution of RNA structure and its variation to gene regulation.
RNA structure is critical for gene regulation and function. In the past, transcriptomes have been largely parsed by primary sequences and expression levels, but it is now becoming feasible to annotate and compare transcriptomes based on RNA structure. In addition to computational prediction methods, the recent advent of experimental techniques to probe RNA structure by deep sequencing has enabled genome-wide measurements of RNA structure, and provided the first picture of the structural organization of an eukaryotic transcriptome—the “RNA structurome”. With additional advances in method refinement and interpretation, structural views of the transcriptome should help to identify and validate regulatory RNA motifs that are involved in diverse cellular processes, and thereby increase understanding of RNA function.
The structures of RNA molecules are often important for their function and regulation1-6, yet there are no experimental techniques for genome-scale measurement of RNA structure. Here, we describe a novel strategy termed Parallel Analysis of RNA Structure (PARS), which is based on deep sequencing fragments of RNAs that were treated with structure-specific enzymes, thus providing simultaneous in-vitro profiling of the secondary structure of thousands of RNA species at single nucleotide resolution. We apply PARS to profile the secondary structure of the mRNAs of the budding yeast S. cerevisiae and obtain structural profiles for over 3000 distinct transcripts. Analysis of these profiles reveals several RNA structural properties of yeast transcripts, including the existence of more secondary structure over coding regions compared to untranslated regions, a three-nucleotide periodicity of secondary structure across coding regions, and a relationship between the efficiency with which an mRNA is translated and the lack of structure over its translation start site. PARS is readily applicable to other organisms and to profiling RNA structure in diverse conditions, thus enabling studies of the dynamics of secondary structure at a genomic scale.
A new study exploits the time-dependence of formaldehyde cross-linking in the commonly used chromatin immunoprecipitation (ChIP) assay to infer the on and off rates for site-specific chromatin interactions.
Libraries of S. cerevisiae and E. coli promoter reporters measured under different conditions reveal scaling relationships between expression profiles across conditions and suggest that most changes in activity are due to global effects.
Between any two conditions, the activity of most promoters changes by a constant global scaling factor that depends only on the conditions and not on the promoter's identity.The value of the global scaling factor between any two conditions corresponds to the change in growth rate and magnitude of the condition-specific response.When specific groups of genes are activated, they also tend to change according to scaling factors, changing the degree to which the entire group is activated, while preserving the ratios between genes within the group.Altogether, a handful of scaling factors are sufficient for quantitatively describing genome-wide expression profiles across conditions.
Most genes change expression levels across conditions, but it is unclear which of these changes represents specific regulation and what determines their quantitative degree. Here, we accurately measured activities of ∼900 S. cerevisiae and ∼1800 E. coli promoters using fluorescent reporters. We show that in both organisms 60–90% of promoters change their expression between conditions by a constant global scaling factor that depends only on the conditions and not on the promoter's identity. Quantifying such global effects allows precise characterization of specific regulation—promoters deviating from the global scale line. These are organized into few functionally related groups that also adhere to scale lines and preserve their relative activities across conditions. Thus, only several scaling factors suffice to accurately describe genome-wide expression profiles across conditions. We present a parameter-free passive resource allocation model that quantitatively accounts for the global scaling factors. It suggests that many changes in expression across conditions result from global effects and not specific regulation, and provides means for quantitative interpretation of expression profiles.
gene expression; growth rate; modeling; promoter activity; transcription regulation
RNA structural transitions are important in the function and regulation of RNAs. Here, we reveal a layer of transcriptome organization in the form of RNA folding energies. By probing yeast RNA structures at different temperatures, we obtained relative melting temperatures (Tm) for RNA structures in over 4000 transcripts. Specific signatures of RNA Tm demarcated the polarity of mRNA open reading frames, and highlighted numerous candidate regulatory RNA motifs in 3′ untranslated regions. RNA Tm distinguished non-coding versus coding RNAs, identified mRNAs with distinct cellular functions. We identified thousands of putative RNA thermometers, and their presence is predictive of the pattern of RNA decay in vivo during heat shock. The exosome complex recognizes unpaired bases during heat shock to degrade these RNAs, coupling intrinsic structural stabilities to gene regulation. Thus, genome-wide structural dynamics of RNA can parse functional elements of the transcriptome and reveal diverse biological insights.
Genome-wide association studies (GWAS) are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a “black box” in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction typically rank single nucleotide polymorphisms (SNPs) by the p-value of their association with the disease, and use the top-associated SNPs as input to a classification algorithm. However, the predictive power of such methods is relatively poor. To improve the predictive power, we devised BootRank, which uses bootstrapping in order to obtain a robust prioritization of SNPs for use in predictive models. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data and results in a more robust set of SNPs and a larger number of enriched pathways being associated with the different diseases. Finally, we show that combining BootRank with seven different classification algorithms improves performance compared to previous studies that used the WTCCC data. Notably, diseases for which BootRank results in the largest improvements were recently shown to have more heritability than previously thought, likely due to contributions from variants with low minimum allele frequency (MAF), suggesting that BootRank can be beneficial in cases where SNPs affecting the disease are poorly tagged or have low MAF. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.
Genome-wide association studies are widely used to search for genetic loci that underlie human disease. Another goal is to predict disease risk for different individuals given their genetic sequence. Such predictions could either be used as a “black box” in order to promote changes in life-style and screening for early diagnosis, or as a model that can be studied to better understand the mechanism of the disease. Current methods for risk prediction have relatively poor performance, with one possible explanation being the fact they rely on a noisy ranking of genetic variants given to them as input. To improve the predictive power, we devised BootRank, a ranking method less sensitive to noise. We show that BootRank improves the ability to predict disease risk of unseen individuals in the Wellcome Trust Case Control Consortium (WTCCC) data, and that combining BootRank with different classification algorithms improves performance compared to previous studies that used these data. Overall, our results show that improving disease risk prediction from genotypic information may be a tangible goal, with potential implications for personalized disease screening and treatment.
Nucleosome positioning is critical for gene expression and most DNA-related processes. Here, we review the dominant patterns of nucleosome positioning that have been observed, and summarize current understanding of their underlying determinants. The genome-wide pattern of nucleosome positioning is determined by the combination of DNA sequence, ATP-dependent nucleosome remodeling enzymes, and transcription factors including activators, components of the preinitiation complex, and elongating RNA polymerase II. These determinants influence each other such that the resulting nucleosome positioning patterns are likely to differ among genes and among cells within a population, with consequent effects on gene expression.
The core promoter is the region in which RNA polymerase II is recruited to the DNA and acts to initiate transcription, but the extent to which the core promoter sequence determines promoter activity levels is largely unknown. Here, we identified several base content and k-mer sequence features of the yeast core promoter sequence that are highly predictive of maximal promoter activity. These features are mainly located in the region 75 bp upstream and 50 bp downstream of the main transcription start site, and their associations hold for both constitutively active promoters and promoters that are induced or repressed in specific conditions. Our results unravel several architectural features of yeast core promoters and suggest that the yeast core promoter sequence downstream of the TATA box (or of similar sequences involved in recruitment of the pre-initiation complex) is a major determinant of maximal promoter activity. We further show that human core promoters also contain features that are indicative of maximal promoter activity; thus, our results emphasize the important role of the core promoter sequence in transcriptional regulation.
A single transcription factor can activate or repress expression by three different mechanisms: one that increases cell-to-cell variability in target gene expression (noise) and two that decrease noise.
The ability of cells to accurately control gene expression levels in response to extracellular cues is limited by the inherently stochastic nature of transcriptional regulation. A change in transcription factor (TF) activity results in changes in the expression of its targets, but the way in which cell-to-cell variability in expression (noise) changes as a function of TF activity, and whether targets of the same TF behave similarly, is not known. Here, we measure expression and noise as a function of TF activity for 16 native targets of the transcription factor Zap1 that are regulated by it through diverse mechanisms. For most activated and repressed Zap1 targets, noise decreases as expression increases. Kinetic modeling suggests that this is due to two distinct Zap1-mediated mechanisms that both change the frequency of transcriptional bursts. Notably, we found that another mechanism of repression by Zap1, which is encoded in the promoter DNA, likely decreases the size of transcriptional bursts, producing a unique transcriptional state characterized by low expression and low noise. In addition, we find that further reduction in noise is achieved when a single TF both activates and represses a single target gene. Our results suggest a global principle whereby at low TF concentrations, the dominant source of differences in expression between promoters stems from differences in burst frequency, whereas at high TF concentrations differences in burst size dominate. Taken together, we show that the precise amount by which noise changes with expression is specific to the regulatory mechanism of transcription and translation that acts at each gene.
In response to environmental changes, cells regulate the activity of transcription factors (TFs), which in turn change the expression of dozens of downstream target genes by binding to their promoters. The response of each target gene is determined by the interplay between TF concentration and the context in which TF binding sites occur in each target promoter. To examine the relationship between promoter sequence, mechanism of regulation, and response to TF activity, we measured expression of 16 target genes of a single TF in response to changes in TF concentration in single cells. We found that different native promoters that are all targets of the same TF exhibit diverse responses to changing TF levels in terms of both gene expression level and cell-to-cell variability (noise) in expression. Using computational modeling and mutations of specific promoter elements, we show that the molecular mechanisms of regulation can be inferred by measuring how noise changes with expression. These results show that a single TF can regulate transcription through multiple mechanisms, resulting in similar changes in mean expression but vastly different changes in cell-to-cell variability.
Many genetic variants that are significantly correlated to gene expression changes across human individuals have been identified, but the ability of these variants to predict expression of unseen individuals has rarely been evaluated. Here, we devise an algorithm that, given training expression and genotype data for a set of individuals, predicts the expression of genes of unseen test individuals given only their genotype in the local genomic vicinity of the predicted gene. Notably, the resulting predictions are remarkably robust in that they agree well between the training and test sets, even when the training and test sets consist of individuals from distinct populations. Thus, although the overall number of genes that can be predicted is relatively small, as expected from our choice to ignore effects such as environmental factors and trans sequence variation, the robust nature of the predictions means that the identity and quantitative degree to which genes can be predicted is known in advance. We also present an extension that incorporates heterogeneous types of genomic annotations to differentially weigh the importance of the various genetic variants, and we show that assigning higher weights to variants with particular annotations such as proximity to genes and high regional G/C content can further improve the predictions. Finally, genes that are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that can be predicted from their cis genetic variation.
Variation in gene expression across different individuals has been found to play a role in susceptibility to different diseases. In addition, many genetic variants that are linked to changes in expression have been found to date. However, their joint ability to accurately predict these changes is not well understood and has rarely been evaluated. Here, we devise a method that uses multiple genetic variants to explain the variation in expression of genes across individuals. One important aspect of our method is its robustness, in that our predictions agree well between training and test sets. Thus, although the number of genes that could be explained is relatively small, the identity and quantitative degree to which genes can be predicted is known in advance. We also present an extension to our method that integrates different genomic annotations such as location of the genetic variant or its context to differentially weigh the genetic variants in our model and improve predictions. Finally, genes that are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that can be predicted by our method.
A full understanding of gene regulation requires an understanding of the contributions that the various regulatory regions have on gene expression. Although it is well established that sequences downstream of the main promoter can affect expression, our understanding of the scale of this effect and how it is encoded in the DNA is limited. Here, to measure the effect of native S. cerevisiae 3′ end sequences on expression, we constructed a library of 85 fluorescent reporter strains that differ only in their 3′ end region. Notably, despite being driven by the same strong promoter, our library spans a continuous twelve-fold range of expression values. These measurements correlate with endogenous mRNA levels, suggesting that the 3′ end contributes to constitutive differences in mRNA levels. We used deep sequencing to map the 3′UTR ends of our strains and show that determination of polyadenylation sites is intrinsic to the local 3′ end sequence. Polyadenylation mapping was followed by sequence analysis, we found that increased A/T content upstream of the main polyadenylation site correlates with higher expression, both in the library and genome-wide, suggesting that native genes differ by the encoded efficiency of 3′ end processing. Finally, we use single cells fluorescence measurements, in different promoter activation levels, to show that 3′ end sequences modulate protein expression dynamics differently than promoters, by predominantly affecting the size of protein production bursts as opposed to the frequency at which these bursts occur. Altogether, our results lead to a more complete understanding of gene regulation by demonstrating that 3′ end regions have a unique and sequence dependent effect on gene expression.
A basic question in gene expression is the relative contribution of different regulatory layers and genomic regions to the differences in protein levels. In this work we concentrated on the effect of 3′ end sequences. For this, we constructed a library of yeast strains that differ only by a native 3′ end region integrated downstream to a reported gene driven by a constant inducible promoter. Thus we could attribute all differences in reporter expression between the strains to the different 3′ end sequences. Interestingly, we found that despite being driven by the same strong, inducible promoter, our library spanned a wide and continuous range of expression levels of more than twelve-fold. As these measurements represent the sole effect of the 3′ end region, we quantify the contribution of these sequences to the variance in mRNA levels by comparing our measurements to endogenous mRNA levels. We follow by sequence analysis to find a simple sequence signature that correlates with expression. In addition, single cell analysis reveals distinct noise dynamics of 3′ end mediated differences in expression compared to different levels of promoter activation leading to a more complete understanding of gene expression which also incorporates the effect of these regions.
Despite much research, our understanding of the rules by which cis-regulatory sequences are translated into expression levels is still lacking. We devised a method for obtaining parallel and highly accurate expression measurements of thousands of fully designed promoters, and applied it to measure the effect of systematic changes to location, number, orientation, affinity and organization of transcription factor (TF) binding sites and of nucleosome disfavoring sequences. Our analyses reveal a clear relationship between expression and binding site number, and TF-specific dependencies of expression on the distance between sites and gene starts including a striking ~10bp periodic relationship. We also demonstrate the utility of our approach for measuring TF sequence specificities and sensitivity of TF sites to surrounding sequence context, and for profiling the activity of most yeast transcription factors. Our method is readily applicable for studying both the cis and trans effects of genotype on transcriptional, post-transcriptional, and translational control.
Fundamental aspects of embryonic and post-natal development, including maintenance of the mammalian female germline, are largely unknown. Here we employ a retrospective, phylogenetic-based method for reconstructing cell lineage trees utilizing somatic mutations accumulated in microsatellites, to study female germline dynamics in mice. Reconstructed cell lineage trees can be used to estimate lineage relationships between different cell types, as well as cell depth (number of cell divisions since the zygote). We show that, in the reconstructed mouse cell lineage trees, oocytes form clusters that are separate from hematopoietic and mesenchymal stem cells, both in young and old mice, indicating that these populations belong to distinct lineages. Furthermore, while cumulus cells sampled from different ovarian follicles are distinctly clustered on the reconstructed trees, oocytes from the left and right ovaries are not, suggesting a mixing of their progenitor pools. We also observed an increase in oocyte depth with mouse age, which can be explained either by depth-guided selection of oocytes for ovulation or by post-natal renewal. Overall, our study sheds light on substantial novel aspects of female germline preservation and development.
Many aspects of mammalian female germline development during embryogenesis and throughout adulthood are either unknown or under debate. In this study we applied a novel method for the reconstruction of cell lineage trees utilizing microsatellite mutations, accumulated during mouse life, in oocytes and other cells, sampled from young and old mice. Analysis of the reconstructed cell lineage trees shows that oocytes are clustered separately from bone-marrow derived cells, that oocytes from different ovaries share common progenitors, and that oocyte depth (number of cell divisions since the zygote) increases significantly with mouse age.
We recently reported the identification and characterization of DNA replication origins (Oris) in metazoan cell lines. Here, we describe additional bioinformatic analyses showing that the previously identified GC-rich sequence elements form origin G-rich repeated elements (OGREs) that are present in 67% to 90% of the DNA replication origins from Drosophila to human cells, respectively. Our analyses also show that initiation of DNA synthesis takes place precisely at 160 bp (Drosophila) and 280 bp (mouse) from the OGRE. We also found that in most CpG islands, an OGRE is positioned in opposite orientation on each of the two DNA strands and detected two sites of initiation of DNA synthesis upstream or downstream of each OGRE. Conversely, Oris not associated with CpG islands have a single initiation site. OGRE density along chromosomes correlated with previously published replication timing data. Ori sequences centered on the OGRE are also predicted to have high intrinsic nucleosome occupancy. Finally, OGREs predict G-quadruplex structures at Oris that might be structural elements controlling the choice or activation of replication origins.
DNA replication origins; DNA synthesis; G-quadruplex; nucleosome; CpG islands; transcription
We propose definitions and procedures for comparing nucleosome maps and discuss current agreement and disagreement on the effect of histone sequence preferences on nucleosome organization in vivo.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Evolution maintains organismal fitness by preserving genomic information. This is widely assumed to involve conservation of specific genomic loci among species. Many genomic encodings are now recognized to integrate small contributions from multiple genomic positions into quantitative dispersed codes, but the evolutionary dynamics of such codes are still poorly understood. Here we show that in yeast, sequences that quantitatively affect nucleosome occupancy evolve under compensatory dynamics that maintain heterogeneous levels of A+T content through spatially coupled A/T-losing and A/T-gaining substitutions. Evolutionary modeling combined with data on yeast polymorphisms supports the idea that these substitution dynamics are a consequence of weak selection. This shows that compensatory evolution, so far believed to affect specific groups of epistatically linked loci like paired RNA bases, is a widespread phenomenon in the yeast genome, affecting the majority of intergenic sequences in it. The model thus derived suggests that compensation is inevitable when evolution conserves quantitative and dispersed genomic functions.
Purifying selection is a major force in conserving genomic features. It pushes deleterious mutations to extinction while conserving the specific DNA sequence. Here we show that a large proportion of the yeast genome evolves under compensatory dynamics that conserve genomic properties while modifying the genomic sequence. Such compensatory evolution conserves the local G+C content of the genome, which influences nucleosome organization. Since purifying selection is too weak to eliminate every weakly deleterious mutation in nucleosome bound or unbound sequences, the local G+C content is frequently stabilized by compensatory G+C gaining and G+C losing mutations in proximal loci. Theoretical analysis shows that compensatory evolution is inevitable when natural selection is weak and the genomic feature is distributed over many loci. These results imply that sequence conservation may not always be equated with overall selection. They demonstrate that cycles of weakly deleterious substitutions followed by positive selection for corrective mutations, which were so far studied mostly in RNA coding genes, are observed broadly and profoundly affect genome evolution.
Long intergenic noncoding RNAs (lincRNAs) regulate chromatin states and epigenetic inheritance. Here we show that the lincRNA HOTAIR serves as a scaffold for at least two distinct histone modification complexes. A 5′ domain of HOTAIR binds Polycomb Repressive Complex 2 (PRC2) while a 3′ domain of HOTAIR binds the LSD1/CoREST/REST complex. The ability to tether two distinct complexes enables RNA-mediated assembly of PRC2 and LSD1, and coordinates targeting of PRC2 and LSD1 to chromatin for coupled histone H3 lysine 27 methylation and lysine 4 demethylation. Our results suggest that lincRNAs may serve as scaffolds by providing binding surfaces to assemble select histone modification enzymes, and thereby specify the pattern of histone modifications on target genes.
Active eukaryotic regulatory sites are characterized by open chromatin, and yeast promoters and transcription factor binding sites (TFBSs) typically have low intrinsic nucleosome occupancy. Here, we show that in contrast to yeast, DNA at human promoters, enhancers, and TFBSs generally encodes high intrinsic nucleosome occupancy. In most cases we examined, these elements also have high experimentally measured nucleosome occupancy in vivo. These regions typically have high G+C content, which correlates positively with intrinsic nucleosome occupancy, and are depleted for nucleosome-excluding poly-A sequences. We propose that high nucleosome preference is directly encoded at regulatory sequences in the human genome to restrict access to regulatory information that will ultimately be utilized in only a subset of differentiated cells.
Homopolymeric stretches of deoxyadenosine nucleotides (A’s) on one strand of double stranded DNA, referred to as poly(dA:dT) tracts or A-tracts, are overabundant in eukaryotic genomes. They have unusual structural, dynamic, and mechanical properties, and may resist sharp bending. Such unusual material properties, together with their overabundance in eukaryotes, raised the possibility that poly(dA:dT) tracts might function in eukaryotes to influence the organization of nucleosomes at many genomic regions. Recent genome-wide studies strongly confirm these ideas and suggest that these tracts play major roles in chromatin organization and genome function. Here we review what is known about poly(dA:dT) tracts and how they work.
The DNA of eukaryotic genomes is wrapped in nucleosomes, which strongly distort and occlude the DNA from access to most DNA-binding proteins. An understanding of the mechanisms that control nucleosome positioning along the DNA is thus essential to understanding the binding and action of proteins that carry out essential genetic functions. New genome-wide data on in vivo and in vitro nucleosome positioning greatly advance our understanding of several factors that can influence nucleosome positioning, including DNA sequence preferences, DNA methylation, histone variants and post-translational modifications, higher order chromatin structure, and the actions of transcription factors, chromatin remodelers and other DNA-binding proteins. We discuss how these factors function and ways in which they might be integrated into a unified framework that accounts for both the preservation of nucleosome positioning and the dynamic nucleosome repositioning that occur across biological conditions, cell types, developmental processes and disease.
Nucleosome organization is critical for gene regulation1. In living cells this organization is determined by multiple factors, including the action of chromatin remodellers2, competition with site-specific DNA-binding proteins3, and the DNA sequence preferences of the nucleosomes themselves4-8. However, it has been difficult to estimate the relative importance of each of these mechanisms in vivo7,9-11, because in vivo nucleosome maps reflect the combined action of all influencing factors. Here we determine the importance of nucleosome DNA sequence preferences experimentally by measuring the genome-wide occupancy of nucleosomes assembled on purified yeast genomic DNA. The resulting map, in which nucleosome occupancy is governed only by the intrinsic sequence preferences of nucleosomes, is similar to in vivo nucleosome maps generated in three different growth conditions. In vitro, nucleosome depletion is evident at many transcription factor binding sites and around gene start and end sites, indicating that nucleosome depletion at these sites in vivo is partly encoded in the genome. We confirm these results with a micrococcal nuclease-independent experiment that measures the relative affinity of nucleosomes for ∼40,000 double-stranded 150-base-pair oligonucleotides. Using our in vitro data, we devise a computational model of nucleosome sequence preferences that is significantly correlated with in vivo nucleosome occupancy in Caenorhabditis elegans. Our results indicate that the intrinsic DNA sequence preferences of nucleosomes have a central role in determining the organization of nucleosomes in vivo.
Eukaryotic transcription occurs within a chromatin environment, whose organization plays an important regulatory role and is partly encoded in cis by the DNA sequence itself1-6. Here, we examine whether evolutionary changes in gene expression are linked to changes in the DNA-encoded nucleosome organization of promoters. We find that in aerobic yeast species, where cellular respiration genes are active under typical growth conditions, the promoter sequences of these genes encode a relatively open (nucleosome-depleted) chromatin organization. This nucleosome-depleted organization requires only DNA sequence information, is independent of any co-factors and of transcription, and is a general property of growth-related genes. In contrast, in anaerobic yeast species, where cellular respiration genes are inactive under typical growth conditions, respiration gene promoters encode relatively closed (nucleosome-occupied) chromatin organizations. Thus, our results suggest a previously unidentified genetic mechanism underlying phenotypic diversity, consisting of DNA sequence changes that directly alter the DNA-encoded nucleosome organization of promoters.