Nucleotide repeat expansion disorders comprise a heterogeneous group of diseases that result from expansion of specific repetitive DNA microsatellite sequences. Pathogenic expansions can occur in coding or non-coding regions of genes, and were initially believed to act in two dichotomous ways. In disorders such as Freidreich’s Ataxia, expansions in non-coding regions cause transcriptional silencing or down-regulation of the associated gene and therefore act as recessively inherited, loss-of-function mutations. In contrast, in disorders such as Huntington’s disease, tri-nucleotide expansions in the protein coding region introduce an abnormally long stretch of a single amino acid (often glutamine) into the associated protein which leads to a dominantly inherited, gain-of function mutation. In the nine known polyglutamine diseases, the mutant proteins accumulate in ubiquitin-positive inclusions and interfere with cellular homeostasis through several different mechanisms (for a recent review, see 1
Many nucleotide repeat expansion disorders, however, do not fit neatly into either category. In myotonic dystrophy (DM1), an expanded CTG repeat sequence in the 3′ untranslated region (UTR) of DMPK
causes disease in a dominantly inherited manner. After studies failed to reveal a significant role for DMPK
haploinsufficiency in DM1 disease pathogenesis, evidence emerged supporting a toxic gain of function mechanism at the RNA level. Over the past 10 years, at least 7 other neurological disorders have been identified that likely share this new pathogenic mechanism, each with its own nuances (See and the following recent reviews 2
). This review addresses how these nucleotide repeat expansions are thought to cause toxicity and dysfunction by affecting 1) transcriptional regulation, 2) mRNA splicing and metabolism, 3) RNA binding protein distribution and 4) Signal transduction and cellular homeostatic pathways, with an eye towards potential sites of therapeutic intervention.
A Primer on RNA Processing in Neurologic Disease
To explain how repeat expansions in a non-coding region of mRNA can lead to a multi-systemic disease and neuronal dysfunction, it is important that we first review recent advances in understanding how RNA participates in gene regulation, RNA processing, and protein translation. The human transcriptome is made up of protein coding messenger RNAs (mRNA) and multiple different classes of non-coding RNAs, including ribosomal RNAs (rRNA), transfer RNAs (tRNA), small nuclear and nucleolar RNAs (snRNA and snoRNAs, respectively), microRNAs (miRNA), and a host of recently described RNA species whose functions are less clear (e.g. vault RNAs, Y RNAs, piRNAs, lincRNAs; see 4
for a detailed review). Furthermore, many genes are transcribed in both the sense direction (yielding a protein encoding mRNA) and the anti-sense direction (usually producing a shorter non-coding sequence), often such that increased production of the sense transcript is associated with similar increases in the antisense transcript. The roles of antisense transcripts are still incompletely defined but likely include regulation of transcription, stability, and translation of the sense mRNA.
Messenger RNAs are initially transcribed as pre-mRNAs that contain a 5′ Untranslated Region (5′UTR), a 3′Untranslated Region (3′UTR) and numerous non coding intronic regions (introns) between the protein coding regions of the mRNA (exons). A pre-mRNA is then spliced such that introns are removed and a variable subset of exons are compiled into a single linear sequence for translation. For each pre-mRNA, usually multiple alternatively spliced mature mRNAs are produced, with different mature mRNAs being favored in different tissue and cell types or in response to different environmental cues. The mature mRNA, still containing the 5′UTR and 3′UTR, is then transported from the nucleus to the cytoplasm as part of a large ribonucleoprotein complex known as an RNP. In the cytoplasm, it associates with ribosomal components and is translated in a regulated fashion. From the beginning of transcription through splicing and translation, the mRNA is associated with numerous RNA binding proteins and non-coding RNAs that regulate its processing, stability, transport and translation.
Both the coding and non-coding RNAs and their associated binding proteins are involved in numerous cellular pathways. These pathways, which include RNA processing and the regulation of transcription and translation, are critical determinants of neuronal differentiation and plasticity. Perhaps not surprisingly, alterations in these pathways have now been identified that contribute to a wide variety of neurologic and non-neurologic disorders, including a number of neurodegenerative diseases4
. For example, mutations in two RNA binding proteins involved in RNA splicing, the Tar DNA binding protein of 43kd (TDP-43) and Fused in Sarcoma (FUS), can cause Amyotrophic Lateral Sclerosis (ALS). Redistribution of TDP-43 from the nucleus into cytoplasmic aggregates is a common feature in sporadic ALS and frontotemporal dementia6
The RNA dominant disorders discussed in this review can result from expansions of repetitive sequences in the 5′ UTR, the 3′ UTR, or intronic sequences of protein encoding mRNAs and can also occur as microsatellite expansions in non-protein coding RNAs. In many cases, evidence suggests the repeats are transcribed in both the sense and antisense directions. Once transcribed the pre-mRNA containing these repeat expansions forms complex secondary structures including hairpin loops that can alter their processing, transport, translation and interactions with RNA binding proteins. In many cases, these expanded repeat containing RNAs accumulate and form aggregates with a subset of RNA binding proteins implicated in regulating RNA splicing and transcription. To date, most work in this field has focused on the concept that formation of these nuclear aggregates, termed RNA foci, drives pathogenesis via sequestration of specific RNA binding proteins. The relative absence of the RNA binding proteins then leads to mis-regulated processing of numerous other non-repeat containing RNA transcripts, resulting in altered patterns of mRNA splicing and hence alterations in protein isoform frequency. This “sequestration hypothesis” can explain many, but not all aspects, of neurodegeneration in these disorders ().
The sequestration hypothesis of RNA dominant disorders
The sequestration hypothesis of RNA Toxicity
The most compelling evidence that sequestration of RNA interacting proteins by the expanded repeat containing mRNA causes neurological disease comes from cellular and animal models of DM1 and pathological samples from patients with Type I and Type II myotonic dystrophy. The following has been established:
- An expanded CTG repeat placed into the 3′untranslated region of an unrelated gene is sufficient to induce toxicity in mouse and drosophila models, reproducing many aspects of the human disease7, 8.
- In DM1 patients and various model systems, CUG repeat-containing mRNA forms aberrant RNA foci in the nucleus9. These mRNA foci contain members of the muscleblind-like family of RNA splicing proteins, MBNL1, 2 and 310, and disrupt their normal nuclear distribution.
- Overexpression of MBNL1 in drosophila and mouse models rescues the CTG repeat-induced muscle phenotype, and MBNL1 knockout mice recapitulate key aspects of the adult form of the human disease, suggesting that reduced MBNL1 splicing activity contributes directly to DM1 pathophysiology11, 12.
- Consistent with a major role for MBNL proteins in DM1 pathogenesis, aberrant splicing occurs in several key genes in DM1 muscle and brain 5.
Perhaps the most convincing evidence for a primary role of MBNL sequestration in DM1 RNA pathogenesis came when the genetic cause for Myotonic Dystrophy Type 2 (DM2) was found to be another non-coding nucleotide expansion in an unrelated gene, Zinc Finger 9. This expanded CCUG repeat-containing RNA sequesters MBNL1 proteins in the nucleus in RNA foci that are very similar to those seen in DM1. Moreover, many of the same mis-splicing events are shared between the two disorders13
The sequestration model for RNA pathogenesis is not limited to CUG/CCUG repeat expansions. It may also apply to Fragile X Tremor ataxia syndrome (FXTAS), a recently described cause of late onset gait disorder and tremor that potentially affects upwards of 1:3000 males 14
. FXTAS is caused by an expanded CGG repeat in the 5′ untranslated region of the FMR-1
gene on the X chromosome. Normally, the sequence is less than 50 CGG repeats. Expansion to greater than 200 CGG repeats (a “full” mutation) leads to transcriptional silencing of the FMR-1
gene and causes fragile X syndrome, the most common inherited cause of mental retardation. By contrast, patients with FXTAS have a repeat between 50 and 200 CGG repeats. This “pre-mutation” range repeat is transcribed efficiently and there is near-normal expression of the fragile X mental retardation protein, FMRP. Intriguingly, the translational efficiency of a pre-mutation CGG repeat mRNA is actually quite poor, but is offset by a 5–8 fold increase in FMR1 mRNA levels 15
Pathologically, FXTAS is associated with neurodegeneration throughout the cortex and cerebellum. In addition, ubiquitin-positive inclusions accumulate in neuronal and glial nuclei in these brain regions. These inclusions contain the expanded FMR1 mRNA as well as a host of proteins. Experimental evidence now suggests that two RNA binding proteins, Pur α and hnRNPA2/B1, are sequestered to a degree that impairs their function17
. Pur α is a ubiquitously expressed RNA and DNA binding protein that binds avidly to expanded rCGG repeats in vitro and in vivo. Moreover, Pur α localizes to inclusions in the drosophila
model of FXTAS and, when over expressed, mitigates CGG repeat-mediated neurodegeneration. Similarly, hnRNPA2/B1, interacts directly with rCGG repeats and may act to recruit the CUG Binding protein 1 (CUGBP1) into RNA inclusions18
. Both CUGBP1 and hnRNP A2/B1 play multiple roles in RNA processing including RNA splicing, and CUGBP1 has been implicated in the splicing abnormalities in myotonic dystrophy (19
and see below). In support of a role for these two molecules in FXTAS, overexpression of either hnRNP A2/B1 or CUGBP1 in a drosophila
FXTAS model rescues the phenotype18
Abnormal activation of signaling cascades
Sequestration of MBNL into nuclear foci by CTG repeats does not explain certain aspects of the phenotype in myotonic dystrophy. In addition to the role of MBNL proteins, there appears to be accumulation and aberrant sub-cellular distribution of another splicing regulatory protein family typified by the CUG binding protein, CUGBP1. This protein binds CUG RNA sequences but is not present in CUG RNA nuclear foci. Over expression of CUGBP1 recapitulates key features of the human disease and its induction may contribute to muscular atrophy and aberrant differentiation in certain animal models of DM121
. The toxic consequences of CUGBP1 redistribution are presumably mediated through altered splicing of a different set of mRNAs than those targeted by MBNL proteins19
. CUGBP1 is also known to impact translational regulation, and thus its effects may be more pleotropic23
CUGBP1 is stabilized via phosphorylation by protein kinase C and is hyperphosphorylated in DM1 patient tissues and in some animal models of DM124
. Precisely how an expanded CUG repeat leads to this phosphorylation is unknown. Some evidence suggests that expanded nucleotide repeats can activate protein kinase cascades by forming hairpin loops that trigger double-stranded RNA dependent protein kinase25
. How important such signaling pathways are in other RNA-mediated disorders remains to be seen, but overexpression of CUGBP actually rescued
CGG repeat dependent toxicity in a drosophila
model of FXTAS, the opposite of the predicted effects in DM1. Thus, the relevant signaling pathways may be affected differently in each disorder.
Aberrant mRNA splicing
The mis-spliced mRNAs in DM1 offer insight into how a multi-systemic disease can arise from a single non-coding mutation. For example, mis-regulated splicing of ClC-1, a chloride channel, leads to decreased expression of the mature channel in muscle fibers20
. This results in the muscle hyperexcitability underlying myotonia. Importantly, correction of the CIC-1 mis-splicing event alleviates the myotonia in mouse models of DM127
. Mis-splicing events may also contribute to cardiac abnormalities and muscle wasting seen in DM1.
Less is known about which mis-splicing events might underlie central nervous system dysfunction in DM1. To date, screens of a limited number of candidate genes have identified three mis-spliced genes in the brains of adult DM1 patients: abnormal splicing of exon 5 in glutamate receptor NMDAR1, multiple exons in Microtubule Associated Protein Tau (MAPT), and exon 7 in the Amyloid Precursor Protein, APP28
. These same mis-splicing events were recently reported in a mouse model of DM129
. Of these three, evidence best supports a direct role for MAPT mis-splicing in the clinical symptoms of patients. There is accumulation of neurofibrillary tangles and intranuclear inclusions that contain Tau in adult DM1 brains at autopsy, suggesting that DM1 is partly a neurodegenerative disorder30
. In addition, both APP and NMDAR1 are critically involved in synaptic plasticity and neuronal function, and thus their mis-splicing may explain some of the cognitive symptoms seen in DM1, especially in the congenital form of the disorder. These three mRNAs likely represent only a small subset of the transcripts mis-spliced in neurons in this disease. The identity of these additional mRNAs will be critical for understanding the effects of DM1 on the nervous system in adults and children.
Given the complexity of RNA splicing and processing in neurons, it is perhaps not surprising that all of the RNA dominant disorders uncovered to date have some central nervous system effects, although the specific effects are highly variable between the disorders and more subtle in some than others. The exact effect of each repeat expansion on splicing and RNA processing in specific cell types and regions in the brain at least partially explains how a single mechanism can lead to such different phenotypes across these diseases. However, exactly what drives the disease and tissue specificity of neurodegeneration in RNA dominant diseases (and protein mediated neurodegenerative diseases, for that matter) remains largely unknown.
Antisense transcripts and aberrant transcriptional regulation
One of the most intriguing recent developments in repeat expansion disorders is the emerging role of anti-sense transcripts in disease pathogenesis (). As discussed above, antisense transcripts are very common in the genome and likely play important roles in RNA stability and transcriptional activity. A role for toxicity associated with these transcripts is best established for Spinocerebellar Ataxia type 8 (SCA 8). In this disorder, a CTG repeat expansion in a non-protein coding gene is associated with disease. Normal individuals typically have upwards of 50 repeats whereas patients with SCA8 typically have between 70 and 250 repeats. This CTG repeat is transcribed as Ataxin8OS (Ataxin 8 Opposite Strand), a non-coding mRNA expressed in the brain and cerebellum31
. Initial work on SCA8 suggested that this CTG repeat acts via a dominant mechanism at the mRNA level similar to that seen in DM132
. In line with this hypothesis, a recent study demonstrates mis-splicing of a GABA tranporter in the cerebellum in SCA-8 that is dependent on sequestration of MBNL1 and which results in cerebellar dysfunction33
. However, there is also evidence that an antisense transcript is produced that includes the expansion34
. This transcript contains a short open reading frame that would translate the repeat in the CAG direction into an expanded polyglutamine polypeptide. Moreover, nuclear inclusions of aggregated protein that immunostain for polyglutamine have been noted in the cerebellum of a SCA 8 patient. It remains unclear, however, if this CAG transcript and resultant polyglutamine protein contribute significantly to SCA 8 pathogenesis.
Transcriptional Dysregulation and Antisense Transcripts in repeat expansion diseases
A similar mixed mechanism may be present in Huntington’s disease-like disease 2 (HDL-2). HDL-2 is a rare disorder clinically similar to Huntington’s disease that results from a CTG repeat expansion in the Junctophilin-3 gene. Pathologic samples from patients with HDL-2 demonstrate neurodegeneration in the striatum and the formation of RNA foci that contain expanded CUG repeat sequences and MBNL135
. However, these same pathologic samples demonstrate ubiquitin positive inclusions that stain for the presence of a polyglutamine protein36
. Thus, an antisense transcript encoding for a glutamine stretch may also be present in this disorder and contributing to pathogenesis.
In addition to possibly encoding toxic proteins, antisense transcripts also have potential roles in modifying RNA stability of the toxic mRNA product. Two different antisense transcripts have been described in FXTAS, one of which begins in the first exon (ASFMR1) and extends through the CGG:CCG repeat with multiple splice variants, the other of which overlaps only the very 5′ end of the sense transcript start site (FMR4)37
. At least one isoform of ASFMR1 contains an open reading frame that would be predicted to lead to translation of a polyproline-containing protein through the CCG repeat sequence, but to date there is no evidence that this actually occurs. However, these transcripts potentially play important roles in RNA toxicity and in the stability of the sense transcript. Along this line, one intriguing experiment in a fly model of FXTAS demonstrated that expression of a CGG or CCG repeat mRNA can induce toxicity in flies but that co-expression of both
mRNAs leads to phenotypic rescue via an Argonaute 2 (i.e. RNA interference) dependent pathway39
. In lymphoblasts derived from FXTAS patients, both the sense and antisense FMR 1 transcripts are up-regulated in cells, but toxicity could result from an imbalance of sense and antisense mRNA expression levels in affected tissues.
The expanded nucleotide repeats also have important effects in cis
at the gene level. In FXTAS, the expanded CGG repeat may contribute directly to increased transcription of the FMR1 mRNA by effecting local chromatin structure40
. Similarly, the more severe phenotype seen in congenital DM1 may result not only from a larger CTG expansion at the mRNA level, but also from effects of the CTG expansion on local chromatin regulation and transcription, both of itself and of surrounding genes. In congenital DM1, unlike adult DM1, the DMPK
locus becomes methylated. This methylation inhibits CTCF binding at sites surrounding the DMPK
gene, leading to transcriptional activation of DMPK concurrent with that of a nearby homeodomain gene, Six542
. As Six5 is expressed at high levels during early neuronal and muscular development 43
, temporally aberrant expression of toxic DMPK
mRNA could lead to a developmental phenotype that would be absent in adult onset DM1. However, this hypothesis has not yet been tested in vivo
In addition to altered mRNA splicing, significant effects on the transcriptional regulation of many other genes have also been reported for both FXTAS and DM1 44
. In DM1, some of these changes may still be explained by sequestration of MBNL1, though it remains unclear if this reflects a direct role of MBNL1 in transcriptional regulation or an indirect effect through altered splicing of critical transcription factors44
. In FXTAS, some of these effects may be mediated by Pur α, which can act as a transcriptional repressor46
Cellular and protein homeostasis in RNA-mediated disease
The culmination of the diverse pathogenic mechanisms discussed above is neuronal dysfunction and death. The degree of mechanistic overlap between mRNA and protein-mediated disorders is unknown, but early evidence suggests some commonality in these neurodegenerative pathways. For example, the intranuclear inclusions in FXTAS contain many proteins that are not thought to interact directly with mRNA, including nuclear lamin proteins that function in nuclear envelope formation and structure. Intriguingly, the nuclear envelope structure in cultured cells transfected with an expanded CGG repeat mRNA is abnormal48
. Beyond sequestration of specific proteins, the biological impact of inclusions is also unclear. The inclusions in FXTAS are ubiquitinated, and both the neuronal inclusions in FXTAS and the RNA foci in DM1 contain components of the proteasome28
. In addition, overexpression of chaperone proteins (e.g. HSP-70) alleviates aspects of the FXTAS phenotype in a drosophila
model, and cells overexpressing an expanded CGG repeat-containing mRNA display increased sensitivity to impairment of the ubiquitin-proteasome system45
. These results support a model of disease in which aberrant mRNA-protein interactions can lead to degradation-resistant complexes that may function as a toxic species in these disorders, interfering with a host of cellular processes in a manner similar to that proposed for polyglutamine proteins.
RNA toxicity in Polyglutamine disorders
Until recently, mRNA transcripts were not thought to contribute significantly to toxicity in repeat expansion disorders where the repeat is translated into protein. However, a recent experiment questions this dogma for at least some polyglutamine disorders. Spinocerebellar Ataxia Type 3, also known as Machado-Joseph Disease, results from a CAG expansion that encodes an abnormally long polyglutamine stretch in the disease protein, ataxin-3. Using a genetic modifier screen in a drosophila
model of SCA3, Li et al found that the Muscleblind protein family implicated in DM1 pathogenesis also modified the ataxin-3 expanded repeat phenotype51
. Modification of the mRNA sequence from a pure CAG repeat to CAACAG (which still encodes glutamine) reduced the toxicity of the repeat in flies despite similar expression of the polyglutamine protein. Moreover, expression of a long, untranslatable CAG repeat by itself in the 3′UTR of a reporter gene caused slowly progressive neurodegeneration. This neurodegeneration did not appear to be due to transcription of an antisense CUG-containing mRNA. Taken together, these findings suggest that large CAG repeats could contribute to disease processes at the mRNA level in some polyglutamine disorders. However, whether these findings are relevant to mammalian systems and patients with polyglutamine disorders is still unknown. Given the requirement for very large repeats to achieve toxicity in the drosophila
model system, it will likely only be significant in humans if there is dramatic somatic repeat instability in affected tissues. This possibility needs to be investigated.
In some polyglutamine disorders, however, it seems unlikely that mRNA-mediated neurodegeneration plays a prominent role. For example, in Spinal and Bulbar Muscular Atrophy (SBMA, also known as Kennedy’s disease), a pathogenic expanded CAG repeat that is translated into a polyglutamine tract is present in the androgen receptor gene. There is clear evidence from drosophila
and mouse models that translocation of the disease protein to the nucleus in response to ligand binding is required for disease pathogenesis, precluding a primary role for mRNA toxicity52
. Moreover, single non-conservative nucleotide changes distant from the CAG repeat sequence in Atxn
1, the disease gene in Spinocerebellar ataxia type 1 (SCA1) can abrogate toxicity in a mouse model of the disease without affecting mRNA levels53
. Lastly, an earlier study that formally tested for toxicity from un-translated CAG repeats failed to show an effect, though the expansion size of the tested repeat was smaller54
. Thus, further research is needed to determine the impact of this finding that very long CAG repeats can lead to RNA-mediated toxicity in polyglutamine disorders, and in turn on our understanding of protein mediated neurodegeneration.
Therapeutic Development in RNA-Mediated Diseases
As RNA mediated disorders do not change the sequence or function of the protein associated with the mutated gene, they may be particularly amenable to curative therapeutic development. Moreover, in DM1 in particular, there is fairly strong evidence that haploinsufficiency for the associated protein, DMPK, is not associated with significant dysfunction in animal models. Thus, therapeutics targeted at the elimination of the toxic mRNA hold great promise (See ). Two recent papers using antisense technology in animal models of DM1 have provided proof of principal for this technique. In the first of these, Wheeler et al used a morpholino synthetic oligonucleotide made up of 8 CAG repeats to interfere with the interaction of the expanded CUG repeat containing mRNA and MBNL1 in a mouse model of DM155
. Delivery of this morpholino into affected muscles led to elimination of RNA foci, correction of splicing abnormalities, and a reversal of clinical myotonia. In a second related paper, Mulders et al utilized a modified antisense oligonucleotide in 2 mouse models of DM1 and also demonstrated resolution of RNA foci and splicing abnormalities, although myotonia persisted56
. Importantly, non-specfic toxicity was not observed in either study.
Therapeutic Development in RNA mediated Diseases
A second line of therapeutic development involves identification of small molecules that directly interfere with the interaction of expanded nucleotide repeats and their connate RNA binding protein partners. Three different groups have used various rational compound screening approaches to identify molecules as inhibitors of the interaction of expanded CUG repeats with MBNL157
. Warf et al have gone on to demonstrate that their identified compound, pentamadine, is capable of reversing splicing defects in cell culture and in a mouse model of DM158
. Similar chemical studies have also been done to target CAG and CCUG mRNA repeats, suggesting that this technique may have broader application to all RNA dominant disorders59
. Other recent studies have utilized high throughput drug screens in either cell culture or invertebrate models of DM1. A recent screen in a drosophila
model of DM1 identified 10 compounds, most of which are FDA approved for other indications, as rescuing the semi-lethal phenotype seen in this model with pan-neuronal expression61