|Home | About | Journals | Submit | Contact Us | Français|
Trinucleotide expansion underlies several human diseases. Expansion occurs during multiple stages of human development in different cell types, and is sensitive to the gender of the parent who transmits the repeats. Repair and replication models for expansions have been described, but we do not know whether the pathway involved is the same under all conditions and for all repeat tract lengths, which differ among diseases. Currently, researchers rely on bacteria, yeast and mice to study expansion, but these models differ substantially from humans. We need now to connect the dots among human genetics, pathway biochemistry and the appropriate model systems to understand the mechanism of expansion as it occurs in human disease.
Expansions in simple DNA repeats underlie ~20 severe neuromuscular and neurodegenerative disorders1,2. Our understanding of the pathogenic mechanisms for trinucleotide repeat (TNR) expansion diseases has advanced substantially in recent years (recently reviewed in refs 3-6), but many aspects of the mutational mechanism remain enigmatic.
Repetitive sequences constitute 30% of the human genome and, in most species, alterations in the lengths of repetitive DNA during evolution create diversity7. However, the rapid alteration in TNR length observed in human expansion diseases is surprising. Mammals have developed systems for resisting rapid changes that could be deleterious. However, when longer than a crucial threshold length, these simple TNRs over-ride genomic safeguards and expand during most parent–child transmissions and during development of the progeny.
The changes in TNR length can be substantial. For TNR repeats in coding sequences, the repeats become unstable at ~29–35 units in length, and the changes in tract size are modest, typically ≤10 repeats per generation1,2 (TABLE 1). By contrast, unstable parent–child transmissions of TNRs in non-coding regions initiate from pre-mutation alleles of ~55–200 units and increase by 100–10,000 units per generation1,2 (TABLE 1). For both coding and non-coding alleles, as the repeat length grows beyond a threshold length, the size of the successive expansions and the likelihood of another unstable event increase. Also, the disease becomes more severe and has an earlier age of onset with each successive generation, a phenomenon known as anticipation1,2.
Why some repeats expand more than others remains an important unresolved question. Most models for repeat expansion agree that expansion occurs through the formation of looped intermediates1,2, which are incorporated into DNA. However, thermodynamically, the differences in the physical properties of looped intermediates formed from different TNRs are subtle. For example, although the complementary sequences CAG and CTG are associated with short and long expansions, respectively, the free energy difference between CAG and CTG is only 1–2 kcal/mole for hairpins comprising a 30-repeat tract8,9. Neither the sequences nor the hairpin structure accounts for the differences in the expansion size. We now know that DNA repair and/or replication promote expansion in some manner. What we do not know is whether there are mechanistic differences between expansions of long and short repeats or whether the same pathways for expansion are used under all conditions — that is, in different cell types — during distinct developmental stages and in maternal and paternal transmission. These are important issues to address.
Here I focus on two long-standing questions: do long and short TNR expansions occur by the same mechanisms, and do they occur differently during human development? To address these questions, this Review is organized into two parts. In the first part, I examine the human genetic data of four TNR expansion diseases in which the TNR tract differs in length and position (in coding or non-coding regions) and discuss instability during development. This section lays a foundation for understanding the impact of cell type and parent-of-origin on long or short expansions in humans. The DNA replication and repair pathways that are thought to be relevant to expansion have been reviewed recently1,2. Thus, in the second part of this Review, I integrate genetic and biochemical results to discuss which replication and repair mechanisms fit the properties of expansion observed in humans during development. Finally, I consider directions for future research.
Any mechanism for expansion must explain the features of human disease. However, many of our concepts of expansion have been defined by studies of microsatellite instability in bacteria, yeast and mice. These systems differ substantially from humans and from each other in terms of DNA replication rate, the cell type in which expansion occurs and chromatin structure. How well an experimental system reproduces the features of human disease is an important consideration. If our ultimate aim is to understand the mutational mechanism in humans, the logical starting point is to assess the TNR dynamics in human disease.
In fragile X syndrome (FXS, also known as FMR1), CGG repeats in the non-coding region of fragile X mental retardation 1 (FMR1) expand from the pre-mutation allele length (55–200 units) to a full mutation (200–4,000 units) almost exclusively through maternal transmission (TABLE 1). In females, the probability of expansion increases with the length of the TNR, but expansion is almost certain if the pre-mutation allele is 90 units or longer10. Normal length and intermediate length FXS alleles are more prone to expand into the pre-mutation range during paternal transmission than during maternal transmission11. However, paternal transmission of a fully expanded disease allele (TABLE 1) results either in no change or in contraction in the CGG tract10, and the contraction frequency increases with the length of the transmitted allele10. Some somatic heterogeneity occurs in both males and females. For example, in one family, monozygotic twin brothers with FXS had CGG tracts that differed by ~100–500 units12.
The developmental timing and the cell type in which instability occurs provide insights into whether the expansion mechanism is replication or repair dependent. For FXS, the timing of expansion is not known precisely but is likely to occur in the maternal oocyte during arrest in meiotic prophase I13-16 (FIG. 1). In female carriers of a pre-mutation allele, expansion is already present in seven-cell pre-implantation embryos14,15. Developing oocytes arrest during the first meiotic division (which starts during embryogenesis) (FIG 1), suggesting that the TNR alterations occur in quiescent cells and do not require replication (although some genetic selection cannot be excluded). Mice are good models for developmental aspects of TNR instability, and several mouse models for CGG expansion have been generated17,18. Although no animals exhibit large expansions (that is, expansions of thousands of units) through female transmission, the developmental timing of instability in a recent knock-in model is consistent with the timing observed in humans18.
Males who inherit a full mutation allele from their mother harbour the expanded allele in their somatic cells but do not transmit the expanded allele to their progeny10,19,20. This is because large CGG repeat tracts in their spermatogonia are shortened around weeks 13 to 17 of fetal development20 (FIG. 1). Some genetic selection among spermatogonia for cells with smaller TNRs cannot be ruled out19. However, during development in sons of mothers with pre-mutation alleles, only the full mutation is observed in the spermatogonia at 13 weeks of gestation, but pre-mutation alleles are evident at week 17 (ref. 20). These observations suggest that at least some of the large expanded alleles are actively shortened during cell proliferation between 13 and 17 weeks of human development20 (FIG. 1). Thus, large expansions during female transmission appear to be repair-dependent, whereas the shortening of long repeat tracts during male transmission appears to depend on replication.
The patterns of mutation in patients with myotonic dystrophy type 1 (DM1) are similar to those of FXS (TABLE 1). Full mutations are almost exclusively transmitted maternally21,22. Expansion can be observed as early as the two-cell stage in pre-implantation embryos, and the length of expansion correlates with maternal age, which implies that expansion occurs in quiescent oocytes before transmission21,22. In germ cells of DM1 males, the TNR tract contracts if a disease-length TNR tract is large23-26, and the contraction frequency during male transmission increases with allele length23-26. Smaller expansions of CTG tracts from the normal to pre-mutation allele length (55–200 units) are observed in the male germ cells (TABLE 1). A study of mice that harbour ~300 CTG repeats in a human dystrophia myotonica-protein kinase (DMPK) transgene pinpointed the timing of this expansion to spermatogonia, the premeiotic, proliferation phase of male germ cell development27 (FIG. 1). TNR tracts in patients with DM1 (ref. 25) and in mouse models28,29 display a high degree of somatic instability, which increases with age in terminally differentiated somatic cells.
Overall, shared patterns of mutation emerge in patients with DM1 and FXS. Surprisingly, the large expansions of these non-coding TNRs arise in non-dividing oocytes and in terminally differentiated somatic cells; the full mutation alleles tend to contract in the dividing male spermatogonia, and small gains and losses in the pre-mutation range can be observed in both spermatogonia and in somatic cells.
Do TNRs in coding sequences expand differently from those in non-coding regions during development? For Huntington’s disease (HD), growth from a pre-mutation repeat length allele (29–35 CAGs) to the disease range (>35 CAGs) and large expansions (>7 CAGs) occur almost exclusively through paternal transmission30-32 (TABLE 1). Offspring of affected mothers are more likely to show no change or to show contractions in CAG repeat size, even if the maternal allele is of a similar size to one that expands through paternal transmission33. In males with pre-mutation alleles, small gains or losses occur over a series of generations, but after the inherited allele approaches the disease threshold, expansion is favoured three- to 175-fold over contraction30-33. Similar patterns have been observed in an HD mouse model34.
The timing of expansion in the HD male germ line is not clear but seems to occur at multiple stages of germ cell development both in humans and mice35-38. The variations in CAG tract sizes of four patients with HD were significant in their mature sperm, but tract size variations in the testes (comprising dividing spermatogonia, differentiated spermatocytes and haploid spermatids) of three patients with HD were minimal35. Therefore, some expansions apparently occur late in germ cell development. Indeed, in the R6/1 mouse model for HD, expansion is most obvious in postmeiotic haploid cells36. In patients with HD31,35 and in transgenic mouse models for HD34, instability does not correlate with the age of the father, as would be expected if expansion occurred in spermatogonia that renew germ cells throughout adult life. However, in laser-captured testicular cells from two patients with HD, the majority of disease-length expansions were found in human primordial spermatogonia, and fewer disease-length expansions were detected in the postmeiotic cell populations (but the expansions in the latter population contained the longest TNRs)37. As suggested by Yoon et al.37, expansions apparently occur before meiosis in dividing spermatogonia37,38 or after meiosis is complete in differentiating germ cells.
Somatic heterogeneity is extensive in humans with HD30,31,39,40 and in HD mouse models41-44. Expansion in humans cannot be monitored with age, but in mice the size of the allele is stable from the eight-cell-stage embryo until roughly 6–11 weeks postnatal41,43, when expansion resumes in somatic tissues and continues throughout the lifetime of the animal. In mice, laser capture44 and fluorescence activated cell sorting (FACS) analyses45 have confirmed that expansion occurs in postmitotic neurons, and the degree of expansion differs substantially among somatic tissues and brain regions44. The expansions in post-mortem brain tissue from patients with HD are so substantial that, in addition to the inherited TNRs, the CAG tract length in somatic brain cells apparently influences the onset of disease46.
Expansions in coding CAGs in spinocerebellar ataxia type 1 (SCA1) and HD share properties (TABLE 1). Expansions in patients with SCA147 and in a mouse model48 occur most often through paternal transmission (up to 28 repeats have been observed), and offspring of affected mothers are more likely to show no change or to show contractions in CAG tract size47. Instability during transmission by female patients with SCA1 or in transgenic animals48 increases with maternal age and instability is not observed in oocytes from young female mice, which suggests that expansion occurs in arrested oocytes48. Somatic changes in TNR tract length occur in patients with SCA1, but there is little increase in heterogeneity with increasing tract length.
Thus, coding TNRs also have consistent patterns: the largest expansions are observed in non-dividing somatic cells, and smaller instabilities are observed under both dividing and non-dividing conditions.
Because TNR alleles are subject to genetic selection, it is difficult to define a ‘one size fits all’ explanation for expansion. That is, long coding TNRs are likely to be selected against if the encoded gene product is too dysfunctional for viability, whereas TNR expansions in untranslated or intronic regions are likely to be better tolerated. Nonetheless, the features of human disease suggest that coding and non-coding TNRs share mechanisms.
In terminally differentiated somatic cells, the sizes of the expansions are surprisingly comparable for coding and non-coding TNRs (TABLE 1). For example, somatic cells in patients with DM1 can harbour non-coding TNRs that are hundreds to thousands of units long. Similarly, in post-mortem brain samples from patients with HD, coding CAG expansions of up to thousands of units (increasing with age) have been observed in the striatum, which is the brain region that is most vulnerable to cell death in HD39,40. The cells harbouring the longest expansions are the most likely to die first. Thus, the frequency of large expansions observed in the brains of patients with HD may be higher, but cannot be measured. Also, in at least two mouse models of HD, animals maintain long CAG repeats of greater than 700 units in the coding sequence49. Thus, in terminally differentiated somatic cells, coding and non-coding TNRs are capable of expanding to roughly the same degree, which suggests similar repair-dependent mechanisms.
Non-coding TNRs can expand in oocytes, and the age-dependent changes in these quiescent cells imply a repair-dependent expansion mechanism. The underlying mechanism for TNR instability is more difficult to assess in male germ cells, as larger repeat tracts in either coding or non-coding repeats seem to be selected against26. Either pre-mutation alleles undergo successive, small expansions to the disease range or smaller expansions are the residual products of large deletions that occur when TNR tracts are long. In either case, the changes are detected during proliferation.
Thus, a mechanistic explanation must be found for the following three patterns of human mutation: the occurrence of the largest expansions in non-dividing cells; the occurrence of the longest TNR deletions in dividing cells; and the observation of smaller instabilities with or without cell division. In the following sections I discuss how the currently proposed repair and replication models of expansion fit with these three human mutation patterns. For expansion to occur, single-strand loops must form to provide the ‘extra’ DNA, and then the loops must be incorporated into duplex DNA. Therefore, I first consider how loops might form in the context of human development. I then discuss how loops might be incorporated into DNA, which may occur by distinct mechanisms from loop formation.
It is surprising and perhaps anti-intuitive that the largest expansions occur under non-dividing conditions. So, what is happening in meiotically arrested oocytes or in terminally differentiated neurons to generate large looped intermediates? Gains of 100–1,000 units in DM1 and FXS require a mechanism that opens 300–3,000 DNA base pairs (either as a few large loops or several smaller loops) and is energetically untenable under non-dividing conditions without some sort of DNA break.
Double-strand breaks (DSBs) occur in quiescent human oocytes in the pachytene stage of meiosis13, and occur around the time that expansion is observed. However, repair of DSBs by homologous recombination, crossover or non-homologous end joining apparently does not influence expansion in humans50,51 or in mice28. Current evidence suggests that loop formation arises instead from repair of single-strand breaks (SSBs) during excision of damaged DNA bases.
SSBs are generated as intermediates during the process of removing chemically damaged bases from DNA by base excision repair (BER)52 (FIG. 2). In a mouse model of HD, age-dependent TNR expansion in somatic cells occurs concomitantly with the accumulation of oxidized DNA bases45, and in a mouse model of FXS, treatment with potassium bromide, a powerful oxidant, increases CGG expansion in vivo53. Also, in human HD fibroblasts, which do not normally expand their repeats in culture, exposure to peroxide can induce expansion45. Together these results imply that expansion arises during removal of oxidized DNA bases. In support of that idea, loss of 7,8-dihydro-8-oxoguanine DNA glycosylase (OGG1), the main DNA glycosylase responsible for removal of 7,8-dihydro-8-oxoguanine (8-oxoG) (the most common oxidized base in DNA), suppresses expansion in HD mice45. In fact, the stoichiometry of BER proteins correlates with the differential somatic instability in the striatum and cerebellum in a mouse model of HD54. These findings imply that BER is involved in expansion under non-dividing conditions, but how might it happen?
After removal of the oxidized base, BER is completed by DNA synthesis to restore the correct Watson–Crick base and ligation to reconnect the broken end (FIG. 2). In BER, gap-filling synthesis can occur using either a long patch or a short patch mechanism52. However, only the long patch repair pathway generates displaced single-strand flaps that are of sufficient size for loop formation45,52 (FIG. 2). Moreover, CAG tracts facilitate strand displacement45, and the DNA repair polymerase β (Pol β), which can fill in the gap, is stimulated by fork-like flap structures55. Normally, a flap generated by BER is removed by flap endonuclease 1 (FEN1), which has 5′ to 3′ exonuclease activity56-58 (FIG. 2). However, in vitro, a looped intermediate, such as a hairpin, ‘hides’ the 5′ end of the DNA and prevents FEN1-mediated cleavage in vitro56,58 and, presumably, in vivo56,59.
In long-patch BER, Pol β59,60 or one of the replicative polymerases (polymerase δ or polymerase ε)52,60,61 typically incorporates two to 15 nucleotides into the repair patch, which is too small a number to generate a large expansion in a single step. However, oxidation of DNA bases occurs approximately 50,000 times per day52, providing ample opportunity for many SSBs. Repeated rounds of oxidation–repair–expansion at TNRs would promote a toxic oxidation cycle for loop formation and progressive expansion in many sequential steps45,62 (FIG. 2).
However, there are problems with a BER-dependent model for large expansions. OGG1 is the only glycosylase that is known to promote TNR expansion in mice, but it is not the only glycosylase that removes 8-oxoG52,63, nor is 8-oxoG the only oxidized lesion possible within CNG (N = C, A, G) tracts52. Because G and C occur with equal frequency, oxidation is predicted to induce formation of 5-hydroxycytosine as well as 8-oxoG within CNG tracts52. Human endonuclease III-like protein 1 (NTH1)52 has a preference for removal of 5-hydroxycytosine and, theoretically, should be able to induce TNR expansion via the same SSB mechanism as OGG1 (ref. 52). However, loss of NTH1 does not reduce expansion of CAG tracts in an HD mouse model45. Moreover, other glycosylases, such as endonuclease VIII-like 1 (NEIL1), NEIL2 and NEIL3 (ref. 64) can serve as ‘back-ups’ for OGG1 in vivo65 and would also be expected to induce expansion if it occurs by BER. Enhanced oxidation might increase the demand for OGG1, or other proteins might cooperate with OGG1 to increase specificity for the TNR tract. Thus, the specific importance of OGG1 in promoting expansion is not yet fully understood.
Nucleotide excision repair (NER) also induces formation of single-strand flaps, and has been implicated in TNR instability. NER consists of two subpathways (recently reviewed in refs 66,67): global genome repair (GGR) is used for lesion correction throughout the genome, whereas transcription-coupled repair (TCR) corrects lesions within actively transcribing genes66,67 (FIG. 3). A specialized TCR-related process called differentiation associated repair (DAR) is less well understood but seems to promote lesion removal from the non-transcribed strand in neurons68. Although these subpathways have distinct lesion recognition steps, the repair steps are identical (FIG. 3).
A complex of xeroderma pigmentosum complementation group F (XPF) and excision cross complementing repair 1 (ERCC1) makes an incision 5′ to the lesion and the endonuclease XPG makes an incision 3′ to the lesion66,67. This excises a patch consisting of 10–20 nucleotides. In vivo, the 5′ incision by XPF–ERCC1 is made before the 3′ incision by XPG69, so ssDNA containing the TNR has the potential to form a looped intermediate, at least transiently (FIG. 3). In theory, either GGR or TCR can initiate flap formation in this way. However, in mice, loss of XPC — which is specific to GGR (FIG. 3) — has little effect on CAG expansion in HD, suggesting that GGR is unlikely to cause expansion70. By contrast, loss of Cockayne syndrome protein CSB (also known as ERCC6) — which is specific to TCR (FIG. 3) — suppresses contractions in human cells, suggesting that TCR is a more relevant mechanism for TNR instability71.
Indeed, in Drosophila melanogaster, transcriptional induction of a transgene (under the control of regulatory protein GAl4) that contained exon 10 of ataxin 3 (ATXN3, also known as SCA3) resulted in expansions and contractions of the TNR tract within this exon, and loss of XPG in this line reduced instability72. In human cell lines, using a selection system for contractions (which does not detect expansions), contraction of long CAG repeats within an intron of an active hypoxanthine guanine phosphoribosyl transferase (HPRT) minigene71,73 is reduced by small interfering RNA knockdown of ERCC1, XPG and CSB. XPG binding, but not its enzymatic activity, is required at the transcription bubble to stimulate the ATPase activity of CSB74, which could explain why both XPG and CSB are involved in TNR instability. However, XPG and XPF bind together at the opened transcriptional bubble66,67, and loss of XPG may also influence bubble stability, the extent of XPF–ERCC1 incision and, consequently, TNR flap formation.
There also are problems with TCR being the mechanism for expansion. The single-strand flap that forms in TCR66,67 is typically 10–20 nucleotides long and is too small to generate large loops in a single step. For TCR to be a plausible mechanism to generate loops (a few large loops or multiple single-strand loops), TNRs must either block XPG incision (thus allowing abnormal progression of a transcription complex and large strand displacements (FIG. 3)) or TNR tracts must have an unusually high frequency of TCR-derived incisions. The latter case implies that transcriptional pausing occurs frequently within or nearby TNR tracts. In support of this idea, CTG and CGG tracts at the human DM1 and FXS loci75 and GAA repeats at frataxin (FXN)76 alter or inhibit elongation by RNA polymerase.
There also is ‘crosstalk’ between NER and BER77,78. For example, CSB cooperates with components of BER machinery79-81 (including OGG1) to remove 8-oxoG from transcribed genes. Furthermore, the BER glycosylase NEIl1 is more efficient at removal of oxidation damage that is in bubble DNA — a classic substrate for NER — than elsewhere82. Alternative pathways provide additional mechanisms to ensure removal of oxidative DNA damage, but the role of hybrid pathways in flap formation is unknown.
The observations of small expansions in coding and non-coding TNRs in spermatogonia suggest that there are mechanisms for expansion that either depend on replication or that occur during replication.
Small increases and decreases of a few bases can occur at any repetitive sequence by classic polymerase slippage8,9,83-86 (FIG. 4). For example, of the HD pre-mutation alleles that change during transmission, 78% are increases of one to three CAG units and 22% are contractions of one CAG unit87. The DNA polymerase pauses as it encounters the TNR sequence84-86 (FIG. 4). ‘Stuttering’ and misalignment on the daughter strand generates an increase in length after the next round of replication83,84. On the template strand, the advancing helicase generates ssDNA behind it, which forms a hairpin as the polymerase passes over it (FIG. 4); deletion occurs after a second round of replication.
However, simple slippage fails to explain the expansion sizes of coding and non-coding TNRs in the disease range. Polymerase slippage is constrained by the thermodynamics of base-pairing, which limits the slippage size to a few bases8,9,88. Larger expansion could theoretically arise as the sum of many small replication slippage errors. However, in yeast and bacteria (which are good models for dividing cells), deletions of repetitive DNA and TNRs in the disease range are favoured roughly ten- to 100-fold over expansion during proliferation2,88-90, and long TNR tracts are difficult to sustain during active growth2,84,86,88-90. This is in sharp contrast to the sperm of patients with DM1 or HD, in which expansions are favoured roughly ten- to 100-fold over contractions23,30,32,33,35. In DM, for example, 80% of the changes in pre-mutation length alleles are expansions, which can be as large as 300 repeats23. Therefore, the pattern of mutation in humans does not match the profiles of TNR changes that occur in dividing yeast and bacterial cells. Furthermore, there is no correlation between the rate of expansion and the rate of division in mammalian cells91. Therefore, if large or multiple small TNR loops occur during proliferation in spermatogonia, they are more likely to arise from a repair-dependent mechanism that occurs during cell proliferation.
In dividing yeast, there is strong evidence for a replication-dependent repair mechanism92. Two-dimensional gel analysis of replication intermediates provides evidence that DNA polymerase has difficulty traversing and stalls on long tracts of CAG repeats93, CGG repeats94 or GAA repeats95,96. In such cases, the replication fork can ‘back-up’97, and the polymerase uses the daughter on the opposite strand to synthesize enough DNA to ‘bootstrap’ synthesis through a TNR tract (FIG. 5A). TNR loops can form as the polymerase reverses to copy the daughter strand or if TNRs misalign during restart. Electron microscopy provides direct evidence that DNA polymerase spontaneously regresses when moving through CTG repeats, which results in the predicted four-way ‘chicken foot’ intermediate98 (FIG. 5A). Replication restart is consistent with the evidence that DNA polymerase inhibitors suppress expansion94,99 and that expansion does not depend on DSB repair pathways28, as discussed above. If the leading strand polymerase copies the daughter on the lagging strand, the size of the expansion would not be expected to exceed the length of the Okazaki fragment (typically 100–200 nucleotides in eukaryotes). This is consistent with the size of many expansions in the pre-mutation range (55–200 repeats) observed in spermatogonia of males with DM1 (ref. 23). An expansion mechanism reminiscent of replication restart has been reported recently for long GAA tracts in yeast96.
How then do large deletions in non-coding TNRs occur in dividing spermatogonia? Answering this question is more difficult because deletion cannot be easily distinguished from genetic selection in vivo. However, the TNR deletion bias observed in the spermatogonia of males with DM1 (ref. 23) or FXS19,20 is consistent with the mutation pattern observed in proliferating yeast or bacterial cells (FIG. 4). This observation implies that large deletions could occur by multiple slippage events that arise as the DNA polymerase copies the TNR tracts. If the DNA polymerase fails to traverse a repeat tract, despite repeated attempts, it may leave behind unrepaired single-strand loops that — in both bacteria100 and eukaryotes101 — can induce an SOS response. This response upregulates expression of specialized ‘translesion’ polymerases (TLPs) to bypass the block and complete replication (FIG. 5B). How TLPs bypass blocks is poorly understood. However, there is evidence for a ‘switching’ mechanism100-103: a TLP replaces a stalled replicative polymerase until a block is bypassed, then the replicative polymerase displaces the TLP and synthesis resumes (FIG. 5B). Because the large single-strand loops (generated by DNA polymerase attempting to restart replication) would occur in the template strand, they would be deleted in the next round of replication. The deletion is unlikely to perfectly restore the inherited TNR length. Thus, the residual deletion products may contribute to TNR size heterogeneity in spermatogonia and may even be taken as expansions.
The current replication and repair models, discussed above, explain how large and small extrahelical loops might form in distinct cell types but do not address how they are incorporated into DNA. The mechanisms of loop formation and loop incorporation may not be equivalent, and independent pathways for these two parts of expansion should be considered.
Normally, in dividing cells, small loops are removed by the mismatch repair (MMR) system in which a 5′ to 3′ exonuclease (EXO1) removes the loop, a DNA polymerase fills the gap, and the DNA duplex is faithfully resealed without mutation104 (FIG. 6, ‘normal repair’). In vitro, human cell extracts can catalyse error-free repair of CAG or CTG hairpins, in a manner similar to MMR105,106. However, for expansion to occur, uncorrected loops must be maintained to provide the ‘extra’ DNA that is eventually the expansion. Thus, at its most basic definition, expansion is a failure to remove loops (FIG. 6, ‘inhibition of repair’).
So, how are the uncorrected loops incorporated into duplex DNA? The bulk of the evidence suggests that TNR incorporation also involves MMR104,107. The R6/1 mouse model of HD carries the portion of the human huntingtin (HTT) gene that contains the CAG repeats41, and if these mice are crossed with mice lacking the MMR protein MutS homologue 2 (MSH2), expansion is abolished43,108. Similarly, in a DM1 mouse model, loss of the MSH2–MSH3 complex or its binding partner, PMS2, suppresses expansion109. In mice, expansion of the 5′-CTG-3′ repeat in the 3′ non-coding region of the human DMPK transgene110 and expansion of the 5-′CAG-3′ tract in the coding sequence of a truncated human HTT transgene111 both depend on MSH2–MSH3 but not on the other eukaryotic MMR complex, MSH2–MSH6. Apparently, MSH2–MSH3 has a dominant role in causing rather than correcting expansion mutations. Two current models for how this might occur are hotly debated: either an active MMR pathway aids loop incorporation or MSH2–MSH3 aids loop formation and TNR loop incorporation occurs by a non-canonical pathway.
The MSH proteins couple DNA binding with ATP hydrolysis to carry out the first steps of MMR104. MSH2–MSH3 and MSH2–MSH6 bind and hydrolyse ATP within two conserved Walker-type nucleotide-binding domains104. Mutations in these domains can inactivate ATP binding or ATP hydrolysis or both104. If ATP hydrolysis in the MSH2 subunit were essential for loop incorporation, then loss of this function would inhibit expansion. Indeed, crossing a DM1 mouse model carrying a highly unstable CTG tract (>300 units) with mice harbouring a G693A mutation in MSH2 (which is predicted to abolish ATPase activity) stops expansion112. In yeast, expression of the same msh2-G693A mutation suppresses large and small GAA deletions113. Downstream signalling by MSH2–MSH3 involves a complex involving the MutL homologue (MLH) proteins and PMS1 (ref. 113). In yeast, a mutation in PMS1 that is predicted to impair complex formation has similar effects to the MSH2 mutation113. Purified MSH2–MSH3 binds ATP114,115 and retains at least some ATP hydrolytic activity in vitro when bound to a CAG hairpin111,114. Collectively, these findings support the notion that the ATPase function of the MSH2 subunit and the MSH2–MSH3–MLH–PMS complex are necessary for TNR instability in yeast.
However, if it causes expansion, active MMR cannot be working properly. Canonical MMR removes small loops without mutation104 (FIG. 6, ‘normal repair’), whereas expansion requires the presence of an unrepaired loop precursor (FIG. 6, ‘inhibition of repair’). Indeed, the patterns of mutation during human development indicate that loops are not removed correctly during mitosis or meiosis. In an MMR-competent yeast strain, hairpins that are stabilized by hydrogen-bonding (for example, CAG, CTG or GAC repeats) escaped repair in vivo, whereas loops formed from random sequences or triplets without the capacity to form stable structures (for example, GTT repeats) were efficiently removed during meiosis without error116. Thus, in these strains, MMR was operational but the hairpin loops that were able to form stable structures were not removed. In human cell extracts, in vitro repair of loops in synthetic templates is similar whether or not MSH2 is present106, supporting the idea that MSH2 is required for loop formation rather than loop incorporation. Indeed, a model in which MSH2–MSH3 binds and stabilizes a CAG hairpin loop has been proposed36,111.
Therefore, a second scenario is that MSH2–MSH3 is essential for loop formation, but a non-canonical MMR pathway is used for loop incorporation. Several models are possible. TNR hairpins harbour mispaired bases in their stems that bind MSH2–MSH3 in vitro, but MSH2–MSH3 may fail to remove them efficiently by canonical MMR36,111. In this scenario, the bound MSH2–MSH3 acts as an ‘adaptor’ to recruit non-canonical machinery to complete repair (FIG. 6). In non-dividing cells, loop incorporation probably requires a nick on the opposite strand to the loop, loop melting and gap-filling synthesis, which leads to expansion. Enzymes that carry out these functions in the absence of a replication fork are unknown. Loops might be incorporated during the gap-filling synthesis step of BER if strand displacement occurs on the opposite strand from the hairpin. No specific enzymes that carry out loop incorporation in non-dividing cells have been definitively identified. However, it could be envisioned that, in an alternative mechanism, MSH2 retains all or part of its ATPase activity (in agreement with the mouse data discussed above), but failure to couple catalytic activity to productive repair results in recruitment of a non-canonical pathway and expansion.
In summary, coding and non-coding TNRs in humans apparently share mechanisms of expansion. However, the cell type and its status of division impose constraints on which mechanisms are used. Shorter changes of premutation-length alleles can apparently occur in any cell type. Whether expansions, in some cases, are residual deletion products should be considered. However, longer expansions or deletions are observed under particular circumstances. In humans, the largest expansions of coding or non-coding alleles occur in quiescent cells and involve DNA repair-dependent mechanisms. Deletions of the largest alleles are most prominent in dividing cells. Thus, the biology of human disease not only provides insights into where and when repeat instability occurs but also suggests that expansion and deletion, of at least long alleles, occur by distinct mechanisms.
Many questions and puzzling features remain. However, to continue progress, there is a crucial need to draw conclusions from appropriate experimental systems that closely reflect the properties of the cell type in which expansion occurs. It will also be important to integrate the results of different experimental approaches, as they may lead to different conclusions. For example, the role of MMR in expansion is a work in progress that will benefit greatly by considering more than one set of data. The MSH2-G693A112 mouse model provides compelling evidence for the importance of ATPase activity in expansion. However, the biochemistry of the MSH2-G693A mutant protein needs to be explored further: does it bind ATP, does the mutation influence the activity of MSH3 or might the ATPase activity of MSH3 compensate for the mutation in MSH2? In vitro repair assays are powerful systems for evaluating the enzymology of loop repair. However, cell extracts contain the machinery of many active DNA repair pathways, and discerning which ones are operating at the TNR loops is an important and challenging issue. Biochemical measurements of MMR and of other enzymes implicated in expansion need to be integrated with DNA repair assays to link enzymology with function.
In general, we are only beginning to think about the possibility of crosstalk among DNA repair pathways and their relationship with TNR expansion78. However, tantalizing pieces of evidence suggest that hybrid pathways might be important. For example, MSH2–MSH6 and the BER glycosylase, MUTYH, form a physical complex117. It can be speculated, though not yet demonstrated, that MSH2–MSH3 and OGG1 interact, which could explain the involvement of BER and MMR in TNR length changes. Future work that explores the interactions among the components of different repair pathways will be informative.
More attention needs to be paid to TLPs and their role in TNR instability. Interestingly, TLPs are involved in BER and NER — both of which are candidate pathways for expansion — and may also be involved in generating large deletions (in replication restart). In NER, the DNA damage binding proteins DDB1 and DDB2 (also known as XPE)118, which recognize cyclobutane-pyrimidine-dimers (CPDs) and UV-induced (6-4) pyrimidine photoproducts, and Pol η and Q119,120 are TLPs. In BER, Pol λ serves as a back-up for OGG1 and has fivefold greater DNA synthesis fidelity than Pol β121,122. Its use may result in robust flap formation if oxidized bases block Pol β progression at TNRs122. The majority of evidence suggests that oxidized bases in random DNA sequences do not inhibit replicative DNA polymerases, but it will be important to test whether oxidized bases in CG-rich TNR tracts promote polymerase stalling. It should also be tested whether TLPs can promote strand displacement or enhance lengthening of a single-strand flap in either BER or TCR. The use of TLPs may be a common factor in large changes in TNR length and should be explored more thoroughly.
Finally, all of these mechanisms for expansion must operate within the context of chromatin, and there is growing interest in exploring how chromatin structure and epigenetic modifications influence expansion. For example, a study of the human ATXN7 locus in transgenic mice has established a link between binding of CCCTC-binding factor (CTCF, a regulatory protein implicated in DNA conformation and genomic imprinting) and regulation of repeat instability123. Currently, the links among epigenetic changes and expansion remain enigmatic, but the influence of genome locus, post-translational modification of histones and DNA methylation on TNR expansion will be key issues to explore4,5,124,125.
In conclusion, TNR expansions of ten to 10,000 units add a stunning 30 to 30,000 base pairs to DNA during transmission and during somatic growth, and both contribute to disease onset. Thus, blocking expansion at various developmental stages is likely to be beneficial. The inherited TNR tract determines whether an individual develops disease, but progressive somatic mutation may influence, at least in part, when disease occurs. The ability to modulate expansion raises hope that the severity of pathophysiology might be reduced or its onset delayed, thereby widening the therapeutic window for these deadly TNR diseases.
I would like to thank J. Majka, V. Platt, W. Lang, E. Xun, C. Canaria and S. Bernstein for critical discussions and comments. This work is supported by US National Institutes of Health grants NS062384, NS40738, GM066359, NS060115, NS069177 and CA092584.
Competing interests statement
The author declares no competing financial interests.