|Home | About | Journals | Submit | Contact Us | Français|
Trinucleotide repeat expansion underlies at least 17 neurologic diseases. In affected individuals, the expanded locus is characterized by dramatic changes in chromatin structure and in repeat tract length. Interestingly, recent studies show that several chromatin modifiers, including a histone acetyltransferase, a DNA methyltransferase, and the transcription factor CTCF can modulate repeat instability. Here, we propose that the unusual chromatin structure of expanded repeats directly impacts their instability. We discuss several potential models for how this might occur, including a role for DNA repair-dependent epigenetic reprogramming in increasing repeat instability, and the capacity of epigenetic marks to alter sense and antisense transcription, thereby affecting repeat instability.
In 1991, a novel type of mutation—the expansion of trinucleotide repeats—was shown to cause two human neurological diseases: fragile X syndrome (FRAXA) and spinal and bulbar muscular atrophy (SBMA)1, 2. To date, 17 such diseases have been identified3. Normal individuals typically harbour fewer than 30 repeats, whereas patients can carry from 35 to several thousand repeats. The disease incidence can be as common as 1 in 4000 males for FRAXA and as rare as 1 in 50 000 for Friedreich ataxia (FRDA)3. Large tracts of trinucleotides can cause disease in several ways: by affecting gene expression, producing a toxic RNA species, or altering the function of the resultant protein3 (Table 1). Changes that impair gene expression or protein function are recessive (e.g. FRDA), whereas those that produce toxic RNAs or proteins are dominant (e.g. myotonic dystrophy (DM1) and Huntington disease (HD)).
Analyses of repeat tracts in patients and animal models have revealed that trinucleotide repeats often show increases in repeat number (expansions) as well as decreases (contractions) in both germline and somatic tissues. In addition, the rate of instability varies during development and between tissues within a single organism. This variability implies multiple mechanisms of instability, an idea that is supported by the identification of a number of stage specific modifiers of repeat instability in mouse models (Figure 1). The bias of instability towards expansion in the germline and the embryo accounts for the increased severity of disease symptoms in subsequent generations. This phenomenon was coined ‘anticipation’ long before its molecular basis was discovered. In affected individuals, ongoing, expansion-biased repeat instability in somatic tissues, especially those associated with the disease phenotype, might decrease the age of onset of symptoms and accelerate disease progression4.
Instability of disease-causing trinucleotide repeats appears to be triggered by the folding of repeat sequences into abnormal secondary structures, including hairpins, slipped-strand structures, triplexes, and quadruplexes4. The mishandling of these aberrant structures by the DNA repair machinery is thought to cause changes in tract length.
Genetic studies in bacteria, yeast, flies, mammalian cells, and mice have revealed an astounding assortment of proteins whose mutation or knockdown significantly influences repeat stability4. Paradoxically, this wealth of possibilities has slowed the discovery of the major mechanisms that alter repeat stability.
To add to this complexity, the instability of repeats depends on internal influences, including the type of the repeat, its purity, and its length, as well as on the surrounding genomic context. Indeed, repeat tracts are typically much more unstable in mouse models generated through the insertion of large genomic fragments than in models that carry cDNAs, suggesting that the surrounding sequence influences repeat stability5. Examples of potential cis-acting elements include origins of replication4, transcription factor binding sites6, sense and antisense promoters7, neighboring GC-rich sequences8, and the local chromatin environment9, 10.
DNA is embedded in chromatin, which influences their metabolism (Box 1). Chromatin packing is influenced by histone tail modifications and DNA methylation at CpG dinucleotides. Recently, several studies have carefully catalogued the epigenetic marks associated with expanded GAA repeat tracts, showing that they are buried in heterochromatin6, 11, 12. In addition, several publications within the last two years have unearthed evidence that chromatin modifying enzymes and epigenetic marks contribute to repeat instability9, 10, 13. No cause-effect experiment has been done, however, to directly measure the effect of altering local chromatin structure on repeat instability. Therefore, we acknowledge at the outset that it is not yet clear whether the epigenetic status of repeats is a principal driver of repeat instability, or merely a by-product of repeat expansion.
Approximately 146bp of DNA is wrapped around a histone octamer to form the basic subunit of chromatin: the nucleosome. Chromosomes, however, are packaged into higher order structures, a process influenced by DNA methylation and covalent modification of histones. These epigenetic marks tend to be dynamic and vary between cells, between loci within a cell, and between different developmental stages at same locus32. Chromatin structure is broadly categorized into euchromatin and heterochromatin. Euchromatin is a looser structure that contains a high proportion of histone H3 and H4 molecules with acetylated lysine residues in their N-terminal tails. Heterochromatin is more densely packaged and is enriched in H3K9me and methylation of cytosine residues at CpG dinucleotides. Whether a given region is euchromatic or heterochromatic does not depend on any individual histone modification, but rather on collective influence of many different marks.
Chromatin regulates all DNA transactions including transcription. Transcription initiation and elongation require a complex interplay of nucleosome remodelling and chromatin modifying enzymes70. NER and BER proteins are often unable to bind nucleosomes in the absence of accessory proteins that expose damaged DNA for recognition71, 72. Origins of replication in higher eukaryotes are chiefly defined by their chromatin context, which needs to be maintained after the passage of the replication fork73. DNA damage signalling in heterochromatin has specific requirements and kinetics that differ from euchromatic regions74. Finally, the repair of double-strand breaks is accompanied by an intricate interplay of chromatin remodelling enzymes, nucleosomes eviction and modification, which allow accurate signalling, repair, and recovery from damage75. Clearly, these DNA transactions are heavily dependent on chromatin structure, and each of these processes affects triplet repeat instability.
Here, we summarize what is known about the chromatin environment of expanded repeat tracts and highlight the correlation between the epigenetic changes that occur during development and the timing of triplet repeat instability. To spark discussion, we speculate about how local chromatin structure might drive repeat instability throughout development.
Several studies have identified alterations in DNA methylation, histone modification, and chromatin structure around expanded repeat tracts that are not present at the corresponding wild type alleles (Table 2). Expanded CGG repeats at the fragile X mental retardation 1 (FMR1) locus in FRAXA patients provided the first indication that such repeats are associated with heterochromatin marks. The absence of FMR1 expression, which is responsible for the mental retardation phenotype in FRAXA, is caused by extensive DNA methylation within the CGG repeat, as well as in the flanking DNA14-16. More recently, chromatin immunoprecipitation analyses of specific modified histones in the FMR1 5′ region near the expanded repeat revealed high levels of other repressive marks, including methylated histone 3 at lysine 9 (H3K9me)17, and lower levels of two marks typically encountered in euchromatic regions: acetylated (Ac) H3 and H418. These data indicate that expanded CGG repeats, unlike their normal-length counterparts, carry epigenetic marks typical of heterochromatin, and display diminished levels of euchromatic marks.
Expanded CTG repeats elicit similar effects on surrounding chromatin structure in cell lines derived from patients with a severe form of myotonic dystrophy (DM1). These cells harbor over 1000 CTGs in the 3′ UTR of the dystrophia myotonica protein kinase (DMPK) gene that are largely resistant to digestion with DNaseI and methylation sensitive restriction enzymes19, 20, properties typical of heterochromatin with high levels of DNA methylation. In addition, the expanded locus contains histones enriched in H3K9me and depleted in H3Ac21. Thus, at least in the case of congenital DM1 cells, highly expanded CTG repeats, like expanded CGG repeats, appear to assume a heterochromatic conformation.
Expansion-induced changes in chromatin structure have been most extensively studied in the first intron of frataxin gene, where the expansion of a GAA repeat in the first intron gives rise to FRDA. Similar to CGG and CTG repeats, the region upstream of the expanded GAA repeat in several somatic tissues, is more highly methylated at CpG sites than it is at wild type alleles6, 12. Moreover, decreased H3Ac and H4Ac levels and increased H3K9me levels surround expanded frataxin GAA repeats 6, 11, 12.
It is not known what triggers or maintains heterochromatin at expanded triplet repeats22. In all cases, the repeat tract length must first reach a certain threshold. When the repeat nears this threshold, heterochromatin forms and spreads in a manner reminiscent of position effect variegation, the stochastic spreading of heterochromatin to adjacent euchromatic regions23. For instance, some individuals carrying FRAXA premutation alleles (harboring ~60-200 CGG repeats) display mosaic patterns of DNA methylation in different tissues24, whereas the full expansion (>200 CGGs) is associated with complete FMR1 promoter methylation25. The idea that repeats can induce heterochromatin spreading was directly tested in mice26. Whereas arrays of reporter genes were silenced only when they integrated near a heterochromatic region26, similar arrays, harboring 192 CTG or 200 GAA repeats, were silenced regardless of the transgene’s genomic location26. Thus, CGG, CTG, and GAA repeats, independent of their genomic location, seem to direct formation and spreading of heterochromatinn.
Several studies in FRAXA individuals indicate that instability can occur during early embryogenesis. Some individuals harbor two major FMR1 CGG repeat lengths—derivatives of the same allele—at high frequency in every examined tissue, suggesting that repeat instability arose during the first zygotic cell division27, 28. This possibility is supported by the existence of monozygotic twins who carry different allele lengths27, 29.
In support of these conclusions from patient studies, Savouret et al provided convincing evidence for instability of DM1-associated CTG repeats during early embryogenesis30. They examined the effects of knocking out the mismatch repair (MMR) gene Msh2 on intergenerational repeat instability in mice that carried >300 DMPK CTG repeats. Msh2 is central to the recognition of mismatches in the DNA31 and binds hairpins formed by CTG repeats4. Savouret et al observed ongoing instability in the germline of the transmitting parent, pointing to germline development as a period prone to repeat instability. However, if the effect of Msh2 deletion were solely confined to the parent’s germline, the genotype of the embryo should make no difference. This was not the case: whereas Msh2+/− offspring of Msh2+/− parents harbored mostly CTG expansions, Msh2−/− littermates were strongly biased toward contractions. Similarly, in a cross between Msh2−/− mice carrying the expanded allele and Msh2+/− mice, the Msh2−/− offspring had mostly contractions, but their Msh2+/− littermates displayed about equal proportions of contractions and expansions. This dependence on genotype indicates that some repeat instability must occur in the embryo. In addition, the lack of multiple repeat lengths in individual newborn mice indicates that embryonic instability must occur very early during development, probably just after fertilization30.
Vertebrate embryogenesis entails extensive epigenetic changes that are essential for resetting the developmental potential of the zygote32. This process is called reprogramming. In the mouse, the paternal chromatin is remodeled in the egg by histone exchange and acetylation and by elimination of DNA methylation via an active process that remains poorly understood32, 33. By contrast, the maternal genome is passively demethylated through replication during the first 3.5 days post fertilization. The methylation pattern is subsequently restored by de novo methyltransferases and maintained by Dnmt1 (DNA (cytosine-5-) methyltransferase 1). The temporal overlap of chromatin structure reprogramming and trinucleotide repeat instability raises the possibility of a cause-and-effect relationship between the two processes.
To probe the effects of reprogramming on repeat instability, Gorbunova et al attempted to model, in cultured cells, the demethylation that occurs during embroygenesis34. Treatments that reduced genome-wide DNA methylation levels and Dnmt1 activity stimulated CAG repeat contractions up to 1000-fold in a selection assay in mamalian cells and promoted expansions in unselected DM1 cell lines35. These data suggest that DNA demethylation provokes repeat instability, but the mechanism is unclear. Several possibilities have been eliminated, however, including homologous recombination, nonhomologous end joining, and transcription through the repeat tract36. Similar effects of CpG methylation have also been noted in bacteria: methylated CGG repeats and CAG repeats surrounded with methylated sequences are more stable than their unmethylated counterparts37. These studies provide a strong experimental link between DNA demethylation and repeat instability.
DNA demethylation during reprogramming could trigger repeat instability by engaging DNA repair pathways. Active demethylation in the paternal genome during early embryogenesis has been proposed to involve DNA repair via cytosine deaminases of the activation induced cytosine deaminase (Aid)/apolipoprotein B editing complex (Apobec1) family38, 39 in combination with the glycosylase Mbd4 (methyl-CpG binding domain protein 4) and the nucleotide excision repair (NER) component Gadd45a (growth arrest and DNA-damage-inducible alpha)39, 40. Aid and Apobec1, which are expressed in pluripotent cells, can deaminate 5-methylcytosine to thymine, creating T-G mismatches38, 39. Removal of T and its replacement by C could occur via MMR, base excision repair (BER), or even NER. Because these repair pathways introduce nicks in the DNA, and because nicks are associated with instability41, we speculate that reprogramming might cause destabilization of repeats by generating nicks at nearby methylated CpGs. Thus, methylated CpG sites might serve as signals to recruit DNA repair factors to expanded repeat tracts, leading to further instability (Figure 2i). This scenario could account for the MMR dependence of repeat instability in the early embryo30. Although no glycosylase is known to cause trinucleotide repeat instability in the embryo, there is precedence for glycosylase-induced instability in somatic tissue: removing 7,8-dihydro-8-oxo-guanine-DNA glycosylase 1 (Ogg1) greatly diminishes somatic instability in an HD mouse model42.
Instability in the germline and during gametogenesis is well documented in humans and mice. FRAXA patients, for example, show a striking parent-of-origin effect. Expanded alleles are often observed in children whose asymptomatic mothers harbor premutation alleles. Male fetuses from these mothers carrythe full mutation in all tested tissues except fetal testes, in which contractions were observed43. Similar analyses of female fetuses support the conclusion that CGG expansions occur in the female germline43.
Several studies have addressed the timing of germline instability. In HD patients, a substantial fraction of all instability of the CAG tract—mostly expansions—is generated pre-meiotically44. Similarly, in a DM1 mouse model, repeat instability increased with age, and all instability was already present in spermatogonia, a cell type found at the earliest stage of spermatogenesis, and no further increase was apparent at later stages of gametogenesis45. In contrast, using a mouse model for HD, Kovtun and McMurray46 detected instability in males only after the development of spermatozoa. The basis for the difference is unclear.
Analogous studies in the female germline of spinocerebellar ataxia type 1(SCA1) mouse models found that moderate to large contractions dominated transmissions from the maternal parent and the size of the contractions increased with age9, 47, 48. Thus, CAG repeat instability arises during both spermatogenesis and oogenesis..
As is the case during embryogenesis, instability of repeat tracts in the male and female germlines coincides with epigenetic changes to the genome. In the mouse, primordial germ cells are demethylated between days 8.5 and 13.5 of embryonic development, with the bulk of DNA demethylation occurring between E11.5 and E12.5, concurrent with substantial changes in histone modification and content32. DNA remethylation in the male germline starts at E1532. In the female germline, the genome is remethylated only after birth, during oocyte growth 32. Thus, epigenetic changes occur throughout germline development, at all stages where repeat instability has been identified.
The rapid loss of methylation in the developing germline suggests that demethylation is active and possibly dependent on DNA repair, as proposed for embryogenesis49. But germline reprogramming has not been studied as extensively as early embryogenesis, where evidence for a role of DNA repair in reprogramming is rapidly accumulating33. Nonetheless, we suggest that a spike in DNA repair activity near the repeat tract during reprogramming might account for the increased instability (Figure 2i). Alternatively, as reprogramming decreases the density of heterochromatic regions, DNA repair enzymes might gain freer access to the repeat tract, leading to enhanced recognition of secondary structures, gratuitous repair, and subsequent instability (Figure 2ii).
To probe the effects of DNA methylation on repeat stability, we removed one copy of Dnmt1 in a SCA1 mouse model carrying 143 CAG repeats at the endogenous locus9. Both male and female Dnmt1+/− SCA1 mice passed an expansion to their progeny 3 to 4 times more frequently than did Dnmt1+/+ SCA1 mice. In addition, the effects in Dnmt1+/− mice were confined to the germline; repeat instability was not observed in the early embryo or in somatic tissues9.
Analyses of testes and ovaries from Dnmt1+/− SCA1 mice did not reveal any global differences in DNA methylation of repetitive elements relative to Dnmt1+/+ SCA1 mice9, suggesting that Dnmt1 deficiency had little effect on genome-wide methylation. At CpG sites 17 and 20bp upstream of the repeat tract, however, ovaries displayed significantly reduced methylation levels, whereas testes showed significantly elevated levels. In the same region, testes of Dnmt1+/− mice also displayed variegated H3K9me2 levels, suggesting that the local chromatin environment underwent alterations. Collectively, these observations suggest that Dnmt1 deficiency enhances an expansion-biased process early on in the development of both the male and female germlines, at a time overlapping, and perhaps perturbing, the period of hypomethylation associated with reprogramming.
Studies of intergenerational instability in a fly model of CAG repeat instability suggest that histone acetylation is also involved13. Haploinsufficiency of CREB-binding protein (CBP), which encodes a histone acetyltransferase, increases triplet repeat instability13. By contrast, treatment with the histone deacetylase inhibitor, trichostatin A (TSA), decrease triplet repeat instability13. Although genome-wide levels of acetylated histones differed between the CBP-mutant flies or TSA-treated flies and their wild type counterparts, changes in histone acetylation were not observed near the repeat tract, nor was transcription through the repeat affected13. These results implicate histone acetylation as a modifier of triplet repeat instability but the effect, in this case, appears to be indirect.
Somatic instability of CGG repeats, which, unlike CAGs and GAAs, can be methylated, is low in patients harboring the full mutation (>200 CGGs)50, 51. By contrast, FRAXA males, who express FMR1 although they harbor a large expansion, have unmethylated repeats with a high degree of instability in somatic tissues52, 53. One such individual displayed instability in some tissues and not in others, a pattern that correlated perfectly with methylation status of the repeat52. Cultured cells from FRAXA patients show the same absolute correlation: methylated CGG repeats are stable, whereas unmethylated repeats are unstable54-56. Importantly, in cultured primate cells, DNA methylation of plasmids containing CGG repeats stabilized the repeat tract, specifically implicating DNA methylation within or near repeat tracts as a key modifier of triplet repeat instability37.
In addition to epigenetic modifiers, transcription has emerged as an important player in triplet repeat instability. Transcription through a repeat tract can stimulate repeat instability in human cells and in fly models of CAG repeat diseases57. Lin et al 7, 58 showed that transcription through a repeat of 95 CAGs caused substantial contractions of the repeat tract7. Moreover, transcription-induced destabilization required MMR and transcription-coupled NER (TC-NER)57. Thus, heterochromatin might reduce repeat instability by lowering the rate of transcription through the repeat tract, thereby decreasing the potential for TC-NER (Figure 2iii). The absence of transcription through the highly expanded CGG repeats at the FMR1 locus could account for their surprising stability in somatic tissues.
Many repeat loci, however, are more unstable than their shorter counterparts even though they are transcribed at reduced levels57. For these loci, another mechanism must be at work. We suggest that this mechanism might be antisense transcription. Indeed, a surprisingly high fraction of human genes are associated with antisense transcripts, including most wild type alleles of triplet repeat-associated genes59 (Table 3). At the FMR1 locus, heterochromatin appears to shut-down both sense and antisense transcription60. By contrast, at the SCA8 and DM1 loci, the levels of antisense transcription are higher on the expanded alleles21, 61. At least at the DMPK locus, high levels of antisense transcription are associated with heterochromatin marks21.
At the DM1 locus, the expanded CTG repeat tract is associated with methylation of a CCCTC-binding factor (CTCF) binding site, which appears to insulate the repeat from antisense transcription19. When the CTCF site is methylated, however, CTCF cannot bind, allowing enhanced antisense transcription through the expanded repeat19. Thus, paradoxically, heterochromatinization allows for higher levels of antisense transcription through the repeat, which could promote instability via TC-NER (Figure 2iii).
Libby et al investigated the role of the CTCF binding site in triplet repeat instability in a mouse model for SCA710. In these mice, CTCF binding sites flank the 92 CAG repeats in the transgene. Mutation of the downstream CTCF site dramatically increased instability in the germline. Furthermore, the kidney, cortex, brainstem, and liver of mice mutant for the CTCF binding site also displayed a higher level of CAG instability compared to mice with a wild type binding site10. Intriguingly, one mouse that carried a wild type CTCF binding site showed elevated levels of instability only in its kidney. Further investigation showed an aberrantly methylated CTCF binding site in this tissue. DNA methyaltion prevented CTCF binding its target sequence in vitro. The authors concluded that CTCF binding, which is influenced by DNA methylation, contributes directly to preventing repeat instability. These results are akin to those for the expanded CTG repeat in DM1 cells, where methylation of the CTCF binding sites correlate with loss of CTCF binding and increased antisense transcription19, 21. In the SCA7 mice, however, the role of antisense transcription has not been analyzed. Taken together, these data provide a tantalizing link between CAG repeat instability, CG-rich sequences, DNA methylation, and perhaps antisense transcription through the locus Antisense transcription could promote repeat instability in other ways. Simultaneous sense and antisense transcription might lead to head-on collisions between converging RNA polymerases, which could amplify the problems that arise when a repeat is transcribed in a single direction (e.g. enhance formation of secondary structure, persistence of RNA/DNA hybrids, and increased gratuitous TC-NER) thereby elevating instability.
Antisense transcription could also generate double-stranded RNA, which might trigger the RNA interference (RNAi). RNAi is a specialized pathway that allows post transcriptional repression of specific mRNAs62. Double stranded RNAs, such as microRNAs, are processed by the ribonuclease Dicer into 21nucleotide fragments. Short RNAs are subsequently loaded in the RNA-induced silencing complex (RISC), which targets mRNAs for degradation or inhibits their translation62. In fission yeast, the RNAi pathway prevents the appearance of centromeric transcripts by facilitating formation and maintenance of centromeric heterochromatin62. Whether an analogous process operates in mammalian cells is not yet clear. Nonetheless, repeat-containing transcripts can be processed by Dicer, and enter the RNAi pathway22, 63, 64. Thus, the RNAi pathway could promote heterochromatin formation at disease loci22, which could impact instability.
RNAs perform a variety of other functions that might be relevant to repeat instability: they can serve as a template for DNA synthesis during DNA repair65, guide programmed chromosome rearrangements66, and help remove T-G mismatches generated during reprogramming67. One of these functions might ultimately be found to play a role in repeat instability.
The instability of trinucleotide repeats presents a surprisingly complex puzzle. One key point of agreement is that instability follows from the capacity of unstable repeats to form secondary structures, which in turn engage a variety of DNA repair activities in an attempt to regenerate a normal Watson-Crick duplex. In the past decade, most of the effort geared towards elucidating the mechanisms of triplet repeat instability has focused on knocking out or knocking down individual candidate genes and observing the resultant instability phenotype. A high fraction of tested genes can modulate triplet repeat instability, including elements of the replication machinery, DNA damage checkpoint components, and proteins involved in MMR, NER, BER, recombination, and transcription4, 57.
The chromatin environment in which repeat tracts are embedded adds yet another layer of complexity. The distinctive heterochromatic structure at expanded repeats might prevent or promote instability depending on the developmental stage and nearby cis elements. The well characterized links between chromatin structure and DNA metabolism require an expansion of the discussion of the mechanisms of triplet repeat instability to include the possible roles of epigenetics and chromatin.
In the future, it will be important to catalogue the changes in epigenetic marks that occur at expanded repeat tracts during development. To move beyond correlation to causation will undoubtedly prove challenging. In addition, a better understanding of the relationship between chromatin structure at repeat tracts and repeat instability might guide the design of new therapeutic approaches. For example, histone deacetylase inhibitors can reactivate gene expression at the FMR1 and FRDA loci68, 69. It will be interesting to test whether these molecules also alter the stability of repeat tracts. Drugs that can preferentially induce repeat contractions could lead to permanent improvements to patient health.
We thank the members of the Wilson laboratory for helpful discussion, Susan Gasser for support and F. Hamaratoğlu, H. Ferreira, S. Kueng, B. Pike, B. Towbin, M. Bühler, R. Waterland, and T. Punga for critical reading of the manuscript. V.D. is currently supported by a post-doctoral fellowship from the Terry Fox Foundation through The Canadian Cancer Society Research Institute. Work on triplet repeats in the Wilson laboratory is supported by NIH grant GM38219.