|Home | About | Journals | Submit | Contact Us | Français|
Relative to most regions of the genome, tandemly repeated DNA sequences display a greater propensity to mutate. A search for tandem repeats in the Saccharomyces cerevisiae genome revealed that the nucleosome-free region directly upstream of genes (the promoter region) is enriched in repeats. As many as 25% of all gene promoters contain tandem repeat sequences. Genes driven by these repeat-containing promoters show significantly higher rates of transcriptional divergence. Variations in repeat length result in changes in expression and local nucleosome positioning. Tandem repeats are variable elements in promoters that may facilitate evolutionary tuning of gene expression by affecting local chromatin structure.
The genomes of most organisms are not uniformly prone to change because they contain hotspots for mutating events. An abundant class of sequences that mutate at higher frequencies than the surrounding genome is composed of tandem repeats (TRs, also known as satellite DNA), DNA sequences repeated adjacent to one another in a head-to-tail manner (1). Errors during replication make TRs unstable, generating changes in the number of repeat units that are 100 to 10,000 times more frequent than point mutations (2). Variable TRs are often dismissed as nonfunctional “junk” DNA. However, some TRs located within coding regions (exons) have demonstrable functional roles. For example, TR copy numbers in genes such as FLO1 in Saccharomyces cerevisiae generate plasticity in adherence to substrates (3). In canines, variable repeats located in Alx-4 and Runx-2 confer variability to skeletal morphology, which may have facilitated the diversification of domestic dogs bred by humans (4). Thus, repeats located in coding regions may increase the evolvability of proteins.
There is also evidence that repeats influence expression of certain genes (5–7). To investigate the involvement of TRs in gene expression variation, we first mapped and classified all repeats in the S288C yeast genome (8) (data set S1). TRs are enriched in yeast promoters (table S1). Of the ~5700 promoters in the genome, 25% (1455) contain at least one TR. Many TRs in promoters consist of short, A/T-rich sequences (table S2, fig. S1, and data set S2). Comparison of orthologous regions in genomes of different S. cerevisiae strains showed that many of the TRs are variable (data set S1). For example, 24.1% of orthologous TR loci in promoters differ in the number of repeat units between the two fully sequenced strains, S288C and RM11 (8). To confirm this, we sequenced 33 randomly chosen promoter repeats in seven S. cerevisiae genomes (Fig. 1A, figs. S2 and S3, and data set S3). Twenty-five of the 33 TRs differed in repeat units in at least one of the seven strains. The repeat variation frequency is 40-fold higher than the frequency of insertions and deletions (indels) and of point mutations in the surrounding nonrepetitive sequence (P < 10−15) (figs. S2 and S3).
To determine whether promoter TR variation affects gene expression, we compared repeat variablity to expression divergence (ED), which represents how fast the transcriptional activity of each gene evolves (9–11). Promoters containing TRs showed significantly (P < 1.75 × 10−4) higher amounts of ED than did promoters lacking TRs when comparing yeast species (S. cerevisiae, S. paradoxus, S. mikatae, and S. kudriavzevii) (Fig. 1, B to D, and fig. S4A) and S. cerevisiae strains (S288C and RM11) (Fig. 1, E to G, and fig. S4, B and C). This difference was independent of factors known to affect transcriptional divergence, for example, the presence of TATA boxes (fig. S5). Only promoters containing variable numbers of repeat units between strains or species showed the elevated ED (Fig. 1, D and G). Furthermore, when variable TRs were binned into variable and highly variable (10% most variable) groups, highly variable repeats displayed even higher ED. Hence, ED correlates not merely with TRs in promoters but more specifically with repeat number variation.
To directly test whether changes in promoter TRs affect transcriptional activity, we varied the TR repeat number in the promoters of yeast genes YHB1, MET3, and SDT1 (Fig. 2 and fig. S6A). For each construct, expression increased as the length of the TR increased from zero, until a certain size was reached, after which expression dropped off. To determine whether natural variation between strains corresponded to similar changes in gene expression, we cloned promoters of several strains into the respective promoters of strain S288C. These transformants indeed mirrored the expression-level patterns that we observed with the engineered TR strains (fig. S6, B to D). Moreover, TR-mediated changes in expression have functional consequences. SDT1 encodes a pyrimidine nucleotidase that confers resistance to the nucleotide analog 6-azauracil (6AU) (12), and we observed that strains with various SDT1 promoter TR constructs show differences in growth that match the SDT1 expression changes (fig. S7).
Because promoter TR length variation affects transcription, we speculated that it should be possible to exploit promoter TR instability to quickly select for changes in gene expression. We used strains with a relatively long SDT1 TR tract, because repeat mutation rates increase with increasing tract length (13), and selected for variants having higher gene expression. The SDT1 open reading frame was replaced with selectable markers, either URA3 or yellow fluorescent protein (YFP), for selection in SC-ura (medium requiring yeast to express URA3 to grow in) or selection by sorting with flow cytometry, respectively. After a few rounds of selection, both regimes yielded mutants that showed increased growth on medium lacking uracil or increased YFP fluorescence. These mutants showed significant (P < 1 × 10−15) changes in the length of the TR tract in the SDT1 promoter. The most common tract lengths yielded by both selections was 13 repeats (26 nucleotides), corresponding to a size close to that of the engineered TR constructs with the highest expression (Fig. 2B). Construction of a 13-unit TR revealed that it had the highest expression of all the engineered strains (Fig. 3A). When strains containing the long TR tract were grown without any selective pressure, most strains remained at the initial 48 units, and the few mutants that arose show a broader TR size distribution. We were unable to obtain strains with higher expression or TR size changes when the repeat sequence in the promoter was replaced by a randomized (no repeat) sequence of equal length as the 48-unit repeat tract (Fig. 3B).
TRs can contain transcription factor (TF) binding sites, so variation in the tract length may result in the removal or addition of binding sites. Of the 1455 TR-containing promoters, 113 contain known TF binding sites located within the repeats. Many of these TRs overlap stress-response TF binding sites (table S3). Investigation of one of these TRs, located in the promoter of YKL107w, indicated that, in this case, changes in the TR number may affect transcription through variation in the number of binding sites of the oxidative-stress-responsive TF, Yap1 (fig. S8).
Most promoter TRs, however, do not overlap known TF sites, indicating that another mechanism underlies repeat-related expression differences. To investigate whether distance between promoter elements affects transcription, we replaced the SDT1 TR with different DNA sequences of the same length. These changes mostly led to severely reduced transcription, indicating that altered spacing alone is not sufficient to explain the effect of repeat variation on transcription (Fig. 4, A to C). Many promoter TRs are extremely A/T-rich, suggesting they may facilitate DNA melting. However, some of the constructs with the same high A/T content as the original repeat tract still show reduced transcription.
Most promoter TRs are located ~200 base pairs upstream from the translational start codon (Fig. 4D and fig. S9), corresponding to the nucleosome-free region of yeast promoters (14), suggesting a link between chromatin structure and TRs. We compared available genome-wide nucleosome positioning data with the positioning of TRs. Nucleosome density across promoter regions showed an inverse correlation with the presence of TRs (Fig. 4D and fig. S9A). Nucleosome depletion is especially pronounced around AT-rich repeats, which compose 80% of all repeats (fig. S9B). These results suggest either that TRs preferentially form in nucleosome-free regions or that nucleosomes cannot easily bind TRs. We also found that nucleosomes containing the histone variant H2AZ (15) tended to border the TRs (Fig. 4D and fig. S9). Although no simple rule governs nucleosome positioning, nucleosome positioning in the yeast genome is largely directed by DNA sequence (15–18), and a computational algorithm exists that uses DNA sequence information to predict where nucleosomes bind (16, 19). This algorithm predicts that TR regions in promoters are nucleosome-free (fig. S10), suggesting that promoter TR sequences intrinsically bind poorly to nucleosomes, presumably because the repeats affect biophysical properties of DNA (e.g., bendability) (20). In line with these predictions, an analysis of a transcriptome data set for a large group of chromatin remodeling mutants (21) showed that the expression of repeat-containing promoters is strongly influenced by regulators of chromatin structure and activity (fig. S11).
If repeats control nucleosome positioning, changing the repeat sequence should influence the local nucleosome structure. Deletion of the TR of SDT1 resulted in binding of nucleosomes to this region and also disturbed the positioning of downstream nucleosomes (Fig. 4E). Intermediate deletions of the TRs resulted in gradual changes in nucleosome occupancy (fig. S12), indicating that repeat variation directly affects the local chromatin structure. Replacing the repeat with a sequence predicted to have a high affinity for nucleosomes (19) resulted in a well-positioned nucleosome in the previously nucleosome-free region and in greatly reduced SDT1 expression (Fig. 4, F and G).
Although several molecular mechanisms may underlie the effect promoter TRs have on gene expression, our data indicate that repeat-dependent changes in DNA sequence and chromatin structure play a role. Local chromatin structure is known to affect transcriptional activity (22). Most repeats in promoters do not contain the hallmark dinucleotide periodicities that are associated with nucleosome-binding DNA (19). As a result, these repeat tracts may help to establish variable nucleosome-free DNA structures and influence nucleosome positioning in nearby regions. Moreover, because of the high A/T content of most promoter TRs, it remains to be tested whether these nucleosome-free TRs may allow DNA melting for loading of the RNA polymerase.
Changes within coding regions and in protein sequence underlie much biological adaptation and innovation. However, changes in the regulation of genes may be equally important (23, 24). Our results presented here are consistent with a role for TRs as ubiquitous and adjustable “evolutionary tuning knobs” (25) for transcription that mediate rapid evolution of gene expression. Genes that respond to changing environmental conditions would be particularly suited for such variable genetic elements. Indeed, genes driven by repeat-containing promoters show elevated responsiveness to changing environmental conditions (fig. S13).
A preliminary analysis of Homo sapiens promoters reveals a TR distribution comparable to that of yeast, suggesting that similar mechanisms are also at play in higher organisms (fig. S14).
We thank B. Calderon, B. (Sze Ham) Chan, and B. Breaux for their contributions; B. Stern, E. O’Shea, A. Murray, A. Rowat, and A. New for critique of the manuscript; and the anonymous reviewers for their useful suggestions. Research in the lab of K.J.V. is supported by NIH National Institute for General Medical Studies grant P50GM068763, Human Frontier Science Program (HFSP) award RGY79/2007, the Flemish Institute for Biotechnology (VIB), and the Fonds Wetenschappelijk Onderzoek Vlaanderen–Odysseus program. M.D.V. acknowledges the Ford Foundation and the Belgian American Education Foundation.
Supporting Online Material
Materials and Methods
Figs. S1 to S14
Tables S1 to S6
Data Sets S1 to S3