|Home | About | Journals | Submit | Contact Us | Français|
Expansion of tandem repeat sequences is responsible for more than 20 human diseases. Several cis elements and trans factors involved in repeat instability (expansion and contraction) have been identified. However no comprehensive model explaining large intergenerational or somatic changes of the length of the repeating sequences exists. Several lines of evidence, accumulated from different model studies, indicate that transcription through repeat sequences is an important factor promoting their instability. The persistent interaction between transcription template DNA and nascent RNA (RNA•DNA hybrids, R loops) was shown to stimulate genomic instability. Recently, we demonstrated that cotranscriptional RNA•DNA hybrids are preferentially formed at GC-rich trinucleotide and tetranucleotide repeat sequences in vitro as well as in human genomic DNA. Additionally, we showed that cotranscriptional formation of RNA•DNA hybrids at CTG•CAG and GAA•TTC repeats stimulate instability of these sequences in both E. coli and human cells. Our results suggest that persistent RNA•DNA hybrids may also be responsible for other downstream effects of expanded trinucleotide repeats, including gene silencing. Considering the extent of transcription through the human genome as well as the abundance of GC-rich and/or non-canonical DNA structure forming tandem repeats, RNA•DNA hybrids may represent a common mutagenic conformation. Hence, R loops are potentially attractive therapeutic targets in diseases associated with genomic instability.
Approximately 50% of the human genome is occupied by repetitive sequences.1,2 The simplest—tandem microsatellite repeats—comprise 1–6 base pairs repeated in a head-to-tail orientation.3 The number of repeated units can range anywhere from a few to hundreds or even thousands of copies. These sequences, considered in the past as molecular junk, turned out to play an important role in evolution and adaptation of organisms to different conditions.4,5 Additionally, nearly 20 years ago, increase in length of trinucleotide repeats was found to be a mutation responsible for spinobulbar muscular atrophy, fragile X syndrome and myotonic dystrophy type 1.6–10
Since then, more than 20 human syndromes, as well as pathologies in plants and animals, have been attributed to repeat expansions.11 These unusual mutations, termed dynamic mutations due to unstable, progressive character and non-Mendelian inheritance occur predominantly at GC-rich repeat sequences.12
Several cis elements and trans factors that affect repeat instability have been identified thus far, however no comprehensive model explaining the molecular mechanism of repeat expansion exists.13 A large body of evidence, from both in vitro as well as in vivo studies in different model systems, indicates that intrinsic DNA properties play a central role in repeat expansions and contractions (instability).13,14 Tandem repeated CTG•CAG, CGG•CCG, GAA•TTC and CCTG•CAGG sequences (center dot separates two complementary DNA strands) can adopt non-canonical DNA structures such as stable hairpins, triplexes, tetraplexes and slipped-strand structures. The propensity of DNA to form unusual secondary structures is directly correlated with the instability of these sequences, hence all DNA metabolic processes leading to transient separation of DNA strands such as replication, recombination, repair and transcription have stimulatory effects on repeat instability.11–15 Small changes to the structural potential of the repeat tract e.g., by introducing a single interruption affecting repeat homogeneity or increasing superhelical density of the DNA, alters stability of the repeats.16–18
In addition to DNA structure, RNA structure of expanded CUG and CCUG repeats plays a critical role in the pathogenesis of myotonic dystrophy type 1 and 2 (DM1 and DM2).19–21 In these diseases expanded RNA repeats have toxic gain-of-function consequences.22 Also aberrant protein structures have been implicated in some of the repeat disorders. The beta sheet dominant conformation adopted by expanded polyglutamine domains has been postulated to be responsible for the assembly of these proteins into insoluble beta-sheet-rich fibrillar aggregates and for their accumulation as inclusion bodies.23,24 Formation of the protein aggregates is associated with CAG•CTG repeat expansion diseases including Huntington's disease and several spinocerebellar ataxias.11 Considering the central role of structure of major biochemical cellular components in the pathogenesis of repeat expansion diseases, these diseases maybe called “diseases of structure”.
Our recent data suggests another type of structure—cotranscriptional RNA•DNA hybrids, called R loops, contribute to the instability of repeating sequences.25 An R loop represents a structure in which RNA is partially or completely hybridized with one DNA strand, rendering the complementary DNA single stranded. We demonstrated that in vitro transcribed GC-rich repeating sequences form stable, ribonuclease A resistant and ribonuclease H sensitive structures. These structures could be detected, using electrophoretic assays or bisulfite modification, in plasmids as well as in human genomic DNA. More importantly, we showed that formation of stable RNA•DNA hybrids stimulates instability of CTG•CAG repeats in vivo in E. coli and human cells.25
Instability of tandem repeats is often associated solely with the errors that occur during DNA replication. However, neither DNA replication nor homologous recombination can account for repeat expansion and contraction detected in slowly or non-dividing cells (e.g., neurons in the striatum or non-proliferating sperm precursors). Several lines of evidence, accumulated from prokaryotic, yeast, mammalian cell culture and animal model studies indicate that transcription through repeating sequences is an important factor promoting their instability.26–34 In terminally differentiated cells, which no longer replicate DNA, transcription may be the most robust instigator of repeat instability.
A direct link between transcription and trinucleotide repeat instability was initially discovered in a prokaryotic model system.29 An increase in CTG•CAG repeat instability was observed upon the stimulation of gene expression using an inducible lacZ promoter. The magnitude of this effect depended on the length, orientation and sequence of the repeating tract. Additionally, mutations in the components of the nucleotide excision repair (NER) pathway affected instability of transcribed repeating sequences.35,36 Studies in eukaryotic organisms including yeast with dinucleotide GT•AC repeats34 and transgenic flies with CAG•CTG repeats37 revealed that transcriptional activity was positively correlated with instability of the repeats. In a transgenic mouse model of Huntington's disease the level of repeat instability was associated with active transcription of the transgene.30 Interestingly, recent studies exploring the effect of transcription on GAA•TTC repeats demonstrated that transcription may have a dose-dependent effect on instability, whereby very high levels of gene expression shift the tendency of instability toward contractions and moderate (close to physiological) levels of gene expression tend toward expansions. It is likely that different mechanisms underlie the repeat contractions and expansions.31,38
Unfortunately, none of the aforementioned studies employed methods to detect or quantify relatively rare instability events, nor methods that would allow analysis of the effects of various potential modulators of repeat instability. The development of a highly sensitive hypoxanthine phosphoribosyltransferase (HPRT) based genetic assay to study factors influencing CTG•CAG repeat instability in mammalian cells uncovered several novel pathways affecting stability of these sequences.28,39 In this system, the HPRT gene is divided into two exons separated by an intronic sequence containing 95 CTG•CAG repeats (rCAG transcript). This long repeat tract interferes with splicing, thus rendering human FLAH25 cells HPRT negative.28 Contractions of the repeat tract below a threshold of 39 CTG•CAG repeats permits correct splicing of the HPRT transcript, thus allowing the cells to grow in hypoxanthine-aminopterin-thymidine (HAT) selection media. The number of cells surviving under selective conditions corresponds to the level of repeat instability (contractions <39 repeats). Although this system is limited to CTG•CAG repeats and detects only contraction events, its sensitivity is several orders of magnitude higher than traditional biochemical approaches.28 Another genetic assay, based on splicing inhibition and orotidine 5-phosphate decarboxylase (URA3) and 5-fluorouracil (5-FOA) selection, has been recently developed to analyze the frequency of large expansions of GAA•TTC repeats in yeast.40
Transcription itself cannot change the length of repeat tracts, however transient separation of DNA strands increases the likelihood of the formation of non-canonical DNA structures in transcribed regions. These secondary structures, including hairpins/slipped-strand structures, in the case of CTG•CAG repeat tracts, and also triplex or quadruplex conformations of GAA•TTC or CGG•CCG repeats, may impede the progress of RNA polymerase II (RNAP II) during subsequent rounds of transcription.38,41–43 It is likely that these non-canonical structures are stabilized by specific binding proteins e.g., components of the mismatch repair system (MMR) that bind CTG and CAG hairpins,44,45 since secondary structure itself may not be sufficient to impede transcriptional machinery. According to one of the most recent models of transcription-induced repeat instability, stalling of the RNAP II triggers a response from the transcription-coupled nucleotide excision repair (TC-NER) machinery.26 Repair proteins can access hairpin structures after the removal of the stalled RNAP II. As a result of repair, expansions or contractions can be generated. In addition to MMR and TC-NER, components of the proteasome pathway and proteins dealing with the stalled RNAP II machinery have been shown to affect transcriptionally-induced repeat instability in human cells.26–28,46 This model does not take into account the role of the nascent repeat-containing RNA. Recently, we have conducted experiments indicating that cotranscriptional RNA•DNA hybrid formation at short tandem repeats contributes to the instability of these sequences.25
Transcription is associated with continuous formation of short RNA•DNA hybrids. These transient interactions, spanning only a few base pairs in direct proximity to RNA polymerase, dissociate rapidly with the progression of the transcription machinery. In contrast to these transient RNA•DNA hybrids, persistent R loops encompass larger DNA regions and remain stable after removal of the transcriptional complex. Transient as well as persistent RNA•DNA hybrids can be also formed in the course of many vital biological processes including genomic DNA replication, telomere replication by telomerase, replication of mitochondrial DNA and the replication of retroviruses by reverse transcription.47
It has been demonstrated that cotranscriptional persistent RNA•DNA hybrids exist in vivo in different organisms. In prokaryotic cells, they were first identified to cause growth defects in strains carrying mutations in the topoisomerase gene. This phenotype was rescued by overexpression of RNase H.48–50 Interestingly, mutations in topoisomerase I, leading to increased DNA negative supercoiling stimulated trinucleotide repeat instability in a transcription-dependent manner.16,17 Cotranscriptional R loops have also been studied in Saccharomyces cerevisiae, chicken DT40 cells, and in highly repetitive immunoglobulin class switch recombination sequences in mammalian cells.51–56 In the majority of cases persistent R loops represented a threat to the genome since their presence was shown to stimulate genomic instability.57,58 Formation of RNA•DNA hybrids can facilitate DNA damage via direct lesions to single stranded DNA at the non-template strand. Cotranscriptional R loops may also block progression of the replication fork (R loop structures alone or by the collision between replication machinery with stalled transcriptional complexes) ultimately leading to double-strand break formation (reviewed in ref. 57 and 58).
The first indication that R loops can be formed at trinucleotide repeat sequences came from the work of Grabczyk et al.38 who demonstrated that transcribed GAA•TTC repeats can form persistent RNA•DNA hybrids.
The GAA•TTC repeats from the FXN gene and the expanded ATTCT•AGAAT pentanucleotide sequence in spinocerebellar ataxia type 10 are intriguing exceptions from the GC-rich expandable tandem repeat sequences.11,59,60 Homozygous expansion of the GAA•TTC repeat sequence within the FXN gene is responsible for the most common hereditary ataxia—Friedreich's ataxia.60 More recently, expansion of the GAA•TTC sequence has been also associated with severe growth abnormalities in Arabidopsis thaliana.61 Although lacking high GC-content, the GAA•TTC polypurine•polypyrimidine sequences are capable of adopting stable triple helical conformations. A plethora of structures, both intramolecular and intermolecular, formed by long GAA•TTC repeats have been described.11,62 An elegant study by Grabczyk et al. showed that GAA•TTC repeats, when transcribed in vitro and in E. coli, can form persistent RNA•DNA hybrids that are sensitive to RNase H digestion.38 Additionally, these extensive RNA•DNA hybrids were tightly linked to RNA polymerase arrest. In the proposed model, the wave of negative supercoiling associated with transcription progression stimulated formation of an R-R•Y triplex which in turn allowed for the nascent RNA to interact with the single-stranded DNA template strand. This cooperation between a non-canonical DNA structure and an RNA•DNA hybrid conformation leads to transcription stalling.38
We extended these studies and first demonstrated that other repeat sequences including CTG•CAG and CCTG•CAGG can form RNase A resistant and RNase H sensitive R loop structures in vitro.25 Also, CGG•CCG repeats expanded in the FMR1 gene are capable of forming R loops (K. Reddy, R.P. Bowater, and C.E. Pearson, personal communication). No RNA•DNA hybrids could be detected in long (AAT•ATT)90 repeats when transcribed by either SP6 or T7 polymerase.25
To demonstrate the effects of RNA•DNA hybrids on repeat instability in E. coli we analyzed changes in repeat length after transformation of plasmids harboring different repeating sequences into RNase HI deficient strains (E. coli FB2 rnhA1).25 RNase HI is the main enzyme in E. coli responsible for removal of RNA•DNA hybrids, and the strategy of downregulating RNase H activity represents a very sensitive approach to analyze downstream effects of RNA•DNA hybrid formation. These experiments showed that CTG•CAG repeats were significantly more unstable in the RNase H deficient strain. The instability was dependent on the orientation of the repeats (rCUG versus rCAG transcript) and more importantly on active transcription through the repeats. Similarly, (GAA•TTC)115 (rGAA transcript) tracts were unstable in the RNase HI mutant strain (Fig. 1), confirming that the R loops formed by these repeats in E. coli cells can stimulate their instability.38 Parallel to the CTG•CAG repeats, the increase in instability of the (GAA•TTC)115 tract detected in rnhA1 strain is statistically significant only in the presence of transcription, excluding the role of RNase H in Okazaki fragment processing as the major contributor to repeat instability. These results are in agreement with previous yeast studies showing that eliminating RNase H activity had no effect on instability of non-transcribed CTG•CAG repeats. Interestingly, the (CAA•TTG)90 tract, which has little or no capacity to form non-canonical structures and does not stimulate R loop formation in vitro,63,64 is stably propagated in the RNase HI deficient strain in the presence or absence of transcription (Fig. 1).
Furthermore, we used the mammalian HPRT assay described above to demonstrate that knockdowns of either RNase H1 or H2 destabilizes CTG•CAG repeats.25 Similar to the data obtained in the prokaryotic model, a 2–3 fold increase in instability was dependent on active transcription implicating RNA•DNA hybrids. Deficiency of either RNase H1 or H2 caused a comparable increase in CTG•CAG instability, suggesting that in mammalian cells both enzymes can participate in R loop removal.
Involvement of the transcription template in interactions with nascent RNA renders the non-template strand unpaired, hence detectable by single-strand specific chemical modifications.65 We used bisulfite conversion of genomic DNA both before and after RNase H treatment to identify unpaired cytosine residues vulnerable to the conversion to uracil.25 In a human cell line carrying a transcribed tract of 68 CTG•CAG repeats (rCAG transcript), we identified a strong bias towards the modification of cytosines in the non-template DNA strand. After incubation with RNase H both template and non-template strands were modified equally by bisulfite. The extent of modification was also significantly greater in the repeat region when compared to GC-rich flanking sequences. Characteristically, the pattern of bisulfite modification corresponded to the presence of several small hairpins/slipped-strand structures rather than a large unpaired region indicating that the non-template strand adopts non-canonical structures partially limiting chemical modifications. These bisulfite modification experiments demonstrated that persistent RNA•DNA hybrids can exist in transcribed CTG•CAG repeats in human genomic DNA.25
Downregulation or overexpression of RNase H enzymes is a common strategy employed to increase the half life of RNA•DNA hybrids, thus allowing for detection of their effects in vivo.57 Thus it is plausible that spatial and temporal differences in the expression of RNase H and other proteins affecting RNA•DNA hybrid stability (e.g., topoisomerases) could contribute to the cell-specific patterns of instability observed in several repeat expansion diseases. Additionally, specific changes in gene expression related to a disease process could exacerbate these effects. Recent gene expression analyses of a Friedreich's ataxia model cell line showed that decreased frataxin level was associated with significant downregulation of RNase H2 and topoisomerase 2 expression.66 Lower expression of these proteins in frataxin deficient cells may affect somatic instability of the GAA•TTC repeats.
Results of numerous studies, conducted in different systems, show that a combination of several factors determines the propensity of a given sequence to adopt an RNA•DNA hybrid conformation. Tandem repeat sequences expanded in human diseases appear to fulfill a majority of these requirements and are intrinsically prone to form RNA•DNA hybrids (Fig. 2). They are highly GC-rich, very frequently the regions immediately upstream and downstream of the repeats are GC-rich as well (Fig. 2), and long stretches of C/G residues that may serve as nucleation sites for RNA•DNA hybrids are also frequently found adjacent to the repeats.67–69 In addition, the reiterative character of these sequences facilitates re-annealing of the RNA transcript to the DNA template in the case of RNA•DNA hybrid dissociation. Transiently single stranded DNA regions can adopt non-canonical, intermolecular hairpin structures, hence increasing the probability of RNA•DNA hybrid formation by decreasing the likelihood of re-association of the complementary DNA strands. Triplehelical structures formed by GAA•TTC repeats have also been shown to facilitate R loop formation.38
Persistent RNA•DNA hybrids may also be responsible for additional downstream effects of expanded trinucleotide repeats other than their instability. RNA polymerase stalling at the repeats has been recurrently demonstrated and could be attributed to RNA•DNA hybrid formation.38,51,70 Stalled polymerase can induce instability as described above but may also contribute to gene silencing as observed in mutated FMR1 or FXN genes. Using a cell line harboring the luciferase gene containing either 50 or ~700 intronic GAA•TTC repeats, we showed that siRNA-mediated downregulation of RNase H1 and H2 (by at least 85% as determined using real-time PCR) decreases the expression of the (GAA•TTC)700-containing reporter gene. No effect on expression of the luciferase gene harboring (GAA•TTC)50 insert was observed (Fig. 3). The degree of transcription suppression by downregulation of RNase H1 and H2 is moderate (~20%) likely due to already low levels of transcription in the (GAA•TTC)700 cells.
Correlation between the reduced expression of the luciferase cell line harboring (GAA•TTC)700 tract and decreased levels of RNase H1 and H2 brings forth an attractive but speculative model. Perhaps, cellular attempts of transcription through the expanded GAA•TTC repeats lead to persistent RNA•DNA hybrids and consequently contribute to RNAP II stalling within the repeat region. Such unusual structures have been shown to attract DNA modifying enzymes to initiate the cascade of epigenetic changes leading to the silencing of the gene.71 Experiments to test this hypothetical scenario are currently underway.
Shortly after the connection between DNA structure and repeat instability was demonstrated, the first attempt to control stability of trinucleotide repeats by manipulating DNA structure and/or DNA metabolism was undertaken.72–76 Two major, apparently contradictory, therapeutic strategies towards expanded DNA repeats can be distinguished: (i) to stabilize repeat sequences and prevent germline and somatic expansions and (ii) to destabilize the already expanded pathogenic sequences.
Targeting expanded repeat sequences and forcing their contractions (or at least preventing their further expansion) would represent an important therapeutic advance for these diseases. Various compounds including DNA intercalators, DNA methylating and alkylating agents, drugs affecting DNA replication and topoisomerases inhibitors have been evaluated.77 Although some of them demonstrated a clear potential to induce DNA instability or regulate expression of affected genes, these compounds were frequently strong mutagens and toxic agents with pleiotropic effects. A more repeat-specific approach was employed to induce expression of the FXN gene using GAA•TTC repeat targeting polyamides.78 High affinity binding of these polyamides interfered with formation of a non-canonical structure of the GAA•TTC repeats in vitro. Perhaps a similar effect on the conformation of the GAA•TTC tract resulted in partial restoration of the FXN transcription in cultured cell lines derived from Friedreich's ataxia patients. The effect of modified polyamides containing the GAA•TTC binding module conjugated to a DNA alkylating agent (chlorambucil) on repeat instability is currently being investigated (Napierala, et al. manuscript in preparation). Another example of a repeat directed approach aimed towards stimulation of repeat contractions is the use of CAG and CTG specific zinc finger nucleases (ZNF).79 Double strand breaks induced by the engineered enzymes in the expanded CAG•CTG tracts are repaired predominantly by repeat contractions or insertion of the extrachromosomal DNA fragments into the repeat tract.79
Although the first RNA•DNA duplexes were described more than 50 years ago80 and biophysical as well as biochemical properties of several of these hybrid nucleic acids have been analyzed,47,81–83 research specifically devoted to discovering compounds that preferentially recognize R loop structures is very limited. Only a handful of compounds have been demonstrated to bind with high affinity to RNA•DNA duplexes.47 Compounds like ethidium bromide, ellipticine, paramomycin and ribostamycin were shown to interact with RNA•DNA hybrids via intercalation or grove binding. Interestingly, ethidium bromide, one of the most efficient intercalators of RNA•DNA hybrids, was demonstrated to reduce the CTG•CAG repeat expansion rate in Dmt-D transgenic mouse kidney cells, however its effect on DNA conformation versus RNA•DNA hybrids or any other secondary DNA metabolic pathway was not evaluated.73 Similarly to DNA, RNA•DNA hybrids may also be affected by compounds or oligonucleotides that interact with the single-stranded non-template DNA strand of the R loop. Such compounds are of therapeutic interest especially due to their potential as antiviral drugs targeting HIV-1 reverse transcription and in anticancer therapy by targeting telomerase.47
More extensive work towards development of efficient, selective molecules capable of targeting RNA•DNA hybrids, perhaps demonstrating sequence specificity, and evaluation of their potential as modulators of repeat instability is required. A rational selection of structure specific compounds using such methods as competition dialysis and thermal denaturation have already proved to be effective in uncovering compounds that increase efficiency of transcription through the GAA repeat tract within the FXN gene.84,85 More recently a surface plasmon resonance method has been applied to study binding of small molecule ligands to non-canonical DNA structures.86 Similar approaches may result in discovery of new potent compounds targeting specific nucleic acid structures and affecting repeat instability.
A significant portion of the human genome, including several genes mutated in the repeat expansion diseases, undergoes bidirectional transcription.87,88 Hence, plenty of opportunities exist for interactions between DNA and RNA. Consequences of these interactions maybe damaging to the genome, especially when the appropriate cis conditions (R loop formation-prone DNA sequences) and trans factors (lower expression of proteins protecting against the R loops) converge.
Cotranscriptional R loops are an emerging class of nucleic acid conformations implicated in genetic instability. Perhaps their importance as one of the sources of DNA damage is still underestimated. Recent data suggest that RNase H sensitive RNA•DNA hybrids may be associated with the pathogenesis of other neurological diseases such as ataxia telangiectasia (AT) and spinocerebellar ataxia with axonal neuropathy (SCAN1).89,90 Therefore, finding new strategies to prevent formation or to facilitate removal of persistent RNA•DNA interactions can be of therapeutic significance for several human conditions.
This work was supported by National Ataxia Foundation, Friedreich's Ataxia Research Alliance and in part by National Institute of Neurological Diseases and Stroke 1R21NS064827-01 to M.N.
Previously published online: www.landesbioscience.com/journals/rnabiology/article/12745