|Home | About | Journals | Submit | Contact Us | Français|
In addition to the canonical double helix, DNA can fold into various other inter- and intramolecular secondary structures. Although many such structures were long thought to be in vitro artefacts, bioinformatics demonstrates that DNA sequences capable of forming these structures are conserved throughout evolution, suggesting the existence of non-B-form DNA in vivo. In addition, genes whose products promote formation or resolution of these structures are found in diverse organisms, and a growing body of work suggests that the resolution of DNA secondary structures is critical for genome integrity. This Review focuses on emerging evidence relating to the characteristics of G-quadruplex structures and the possible influence of such structures on genomic stability and cellular processes, such as transcription.
The right-handed double helical structure of B-form DNA (B-DNA) has been known since 1953 (REF. 1). However, it has become increasingly clear that DNA can adopt a variety of alternative conformations based on particular sequence motifs and interactions with various proteins. These non-B-form secondary structures, which include G-quadruplex structures (G4 structures) (FIG. 1) as well as Z-DNA, cruciforms and triplexes (BOX 1), were originally characterized in vitro using biophysical techniques (for example, circular dichroism2). Accumulating evidence now points towards the existence of these structures under physiologically relevant conditions, and all of them are hypothesized, or even known, to have functional roles in vivo. The current wealth of genomic data — which is enabling the evolutionary comparison of motifs that can adopt non-B-form secondary structures in vitro — and the use of structure-specific antibodies, structure-binding ligands and clever experimental techniques are driving progress in this field.
G-quadruplex (G4) structures are only one of many (ten or more) non-B-form DNA secondary structures analysed to date127. Brief descriptions of three well-studied structures are provided below.
In contrast to standard B-form DNA (B-DNA), Z-DNA is a left-handed helix128 (see the figure, part a). Z-DNA motifs (that is, sequences that form Z-DNA in vitro) are tracts of alternating purines and pyrimidines, which occur about once every 3,000 bp in metazoans129. Negative supercoiling stabilizes the formation of Z-DNA under physiological salt conditions130, and it is hypothesized that Z-DNA relieves transcription-induced torsional stress131. Z-DNA motifs are tightly associated with transcriptional start sites in eukaryotic genomes132, and these motifs can also cause genome instability, although the type of damage they cause varies from prokaryotes (dinucleotide insertions and deletions) to eukaryotes (double-strand breaks resulting in larger deletions)120,121,133,134.
Negative supercoiling can also cause B-DNA to adopt a four-armed, cruciform secondary structure that resembles a Holliday junction135 (see the figure, part b). These structures require ≥6 nucleotide inverted repeats (cruciform motif) to form, and such motifs are located near replication origins, breakpoint junctions and promoters in diverse organisms136,137. In metazoans, cruciform motifs are enriched near sites of gross chromosomal rearrangements138, and deletions and translocations occur more frequently in vivo at sites of cruciform motifs than in B-DNA139–141. However, cruciforms might also serve positive roles (for example, stabilizing the human Y chromosome (reviewed in REF. 134)).
Three-stranded triplex DNA occurs when single-stranded DNA forms Hoogsteen hydrogen bonds in the major groove of purine-rich double-stranded B-DNA142 (see the figure, part c). Triplexes in which the third strand is antiparallel to the DNA duplex can form at physiological pH, and these structures are stabilized by negative supercoiling142. Sequences capable of forming triplexes are common in eukaryotes but much rarer in prokaryotes143. In mammals, triplex-forming motifs are enriched in the introns of a variety of essential genes, including those involved in development and signalling144. Additionally, triplexes are hypothesized to cause genomic instability by causing double-strand breaks that result in translocations145. However, the formation of a triplex structure in a trinucleotide repeat sequence (for example, (CAG)n) can prevent the expansion of the repeat138,139; repeat expansion is related to human genetic disorders146,147.
Although the high thermal stability of G4 structures — potentially an impediment to DNA transactions — has led to some scepticism concerning their in vivo relevance, interest in G4 structures has increased enormously in recent years owing to their unique physical properties and the presence of G-rich sequences in biologically functional regions of many genomes. For example, G-rich regions with the potential to form G4 structures (hereafter called G4 motifs) are over-represented in telomeres, mitotic and meiotic double-strand break (DSB) sites, and transcriptional start sites (TSSs; often near promoters). These findings suggest multiple roles for G4 structures. Moreover, recent work suggests that failure to resolve non-canonical DNA structures makes the sequence motifs capable of forming structural hotspots for genomic instability.
We begin this Review with an overview of G4 DNA structures, including their in vitro characterization and chromosomal locations in diverse organisms. Next, we discuss the putative roles of G4 structures at telomeres, during DNA replication, in gene regulation and in various other biological processes. Finally, we conclude by summarizing outstanding questions in the field and suggesting possible ways to address these issues.
G4 structures are stacked nucleic acid structures that can form within specific repetitive G-rich DNA or RNA sequences (reviewed in REF. 3). In 1910, Bang4 was the first to report the fact that guanylic acid forms a gel at high concentrations, which suggested that G-rich sequences in DNA may form higher-order structures. Fifty years later, Gellert and colleagues5 used X-ray diffraction to demonstrate that guanylic acids can assemble into tetrameric structures. In these tetramers, four guanine molecules form a square planar arrangement in which each guanine is hydrogen bonded to the two adjacent guanines (that is, a G-quartet (FIG. 1a)). Stacked G-quartets form a G4 structure, and the intervening sequences are extruded as single-strand loops (although tetramolecular G4 structures may also lack loops). The sequence and size of the loop regions varies. However, loops are usually small (1–7 nucleotides (nt)), and smaller loops result in more stable G4 structures, as do longer G-tracts3. This structure is stabilized by monovalent cations that occupy the central cavities between the stacks, neutralizing the electrostatic repulsion of inwardly pointing guanine oxygens6–8.
G4 structures adopt a variety of topologies and can be classified into various groups depending on the orientation of the DNA strands (FIG. 1b). Thus, G4 structures can be parallel, antiparallel or hybrids thereof. Furthermore, they can form within one strand (intramolecular) or from multiple strands (intermolecular), and various loop structures are also possible9,10. G4 structures can be extremely stable, although the topology and stability of the G4 structure depends on many factors, including the length and sequence composition of the total G4 motif, the size of the loops between the guanines, strand stoichiometry and alignment11–13, and the nature of the binding cations14.
Intramolecular G4 structures are predicted to form at specific G-rich regions in vivo that have in common a sequence motif with at least four runs of guanines (G-tracts), in which each G-tract most often contains at least three guanines (G≥3NxG≥3NxG≥3NxG≥3). G4 structures with only two stacks of guanines are possible but have low stability; here, when we refer to G4 motifs, we refer to motifs in which each G-tract contains three or more guanines. Computational analyses reveal that there are >375,000 G4 motifs in the human genome, whereas there are >1,400 G4 motifs in the Saccharomyces cerevisiae nuclear genome, including those in ribosomal and telomeric DNA, which are both particularly G4-rich15–18. Thus far, it is unclear how many of these motifs form stable G4 structures in vivo and, if they do, when they form.
Computational studies in various organisms have revealed that G4 motifs are not randomly located within genomes, but rather they tend to cluster in particular genomic regions (reviewed in REF. 19). In human, yeast and bacterial genomes, G4 motifs are similarly distributed and are over-represented in certain functional regions, such as promoters15–18,20. Furthermore, the locations and nucleotide compositions of G4 motifs are conserved in human populations and among related yeast species15,21. The nonrandom distribution of G4 motifs and the evolutionary conservation of their positions in genomes suggest that G4 motifs have one or more positive functions in the cell. In many organisms, telomeres contain a high concentration of G4 motifs owing to their high GC content and the single-stranded nature of the telomeric overhang. In diverse organisms, G4 DNA motifs are also common in G-rich micro- and minisatellites, up- and downstream of TSSs (often near promoters), within the ribosomal DNA, near transcription factor binding sites, and at preferred mitotic and meiotic DSB sites15,17,18,21,22.
Telomeres are nucleoprotein complexes at the ends of linear chromosomes. They are composed of a double-stranded region and a single-stranded G-rich 3′ overhang. Telomeres are essential to protect chromosomes from degradation, end-to-end fusions, and being recognized as DSBs23. In most telomeric DNAs, guanines and cytosines are distributed asymmetrically between the two DNA strands, with the G-rich strand running 5′ to 3′ from the centromere to the telomere. For example, vertebrate telomeric DNA consists of 5′-T2AG3-3′ repeats, whereas certain ciliated protozoans such as Stylonychia lemnae have 5′-T4G4-3′ repeats. Moreover, the G-rich strand is longer than its complement, resulting in single-strand ‘G-tails’ at the very termini of chromosomes. Regardless of the precise sequence of the telomere, the G-rich strand of various telomeric sequences can usually form stable G4 structures in vitro (FIG. 2); for example, in non-denaturing polyacrylamide gels, oligonucleotides corresponding to the telomeric G-rich strand display unexpected banding patterns that are due to the formation of G4 structures6,24–26.
The possibility that G4 structures might form in vivo is demonstrated by in vitro experiments showing that telomere structural proteins, such as TEBPα and TEBPβ in ciliates and Rap1 in S. cerevisiae, can promote the formation of G4 DNA25,27–29. By contrast, the human telomeric G-strand binding protein protection of telomeres protein 1 (POT1) promotes the unfolding of G4 structures in vitro30,31. Thus far, the most direct evidence that G4 structures exist at telomeres comes from studies in ciliates that exploit antibodies raised by ribosome display against parallel and antiparallel telomeric G4T4 structures. With these antibodies, it is possible to show that G4 structures exist in vivo at Stylonychia lemnae telomeres and to determine proteins that are required for their formation and unfolding28,32,33. Only the antibodies raised against antiparallel G4 structures bind to S. lemnae telomeres, indicating that antiparallel, and not parallel, G4 DNA is present in vivo32. In addition, several in vivo control experiments demonstrated that the anti-G4 antibodies do not induce the formation of G4 structures. The visualization of the regulation of G4 structures is an important observation because unresolved G4 structures are likely to be an obstacle for DNA replication and telomere elongation. Accordingly, telomeric G4 structures, which are present during most of the S. lemnae cell cycle, are resolved during DNA replication32. Further analysis using RNAi to silence gene expression indicates that the formation of telomeric G4 structures is dependent on two telomere binding proteins: TEBPα and TEBPβ. TEBPα binds to the telomeric overhang and recruits TEBPβ, which is able to promote the formation of G4 structures with its highly charged carboxyl terminus, as shown in vitro27,28.
As stated above, G4 structures are not present at S. lemnae telomeres during S phase. In vitro and in vivo studies demonstrate that G4 unfolding is dependent on at least three conditions. First, TEBPβ, which is essential for the formation of G4 structures, must be removed from the telomeres. This removal happens during DNA replication and requires phosphorylation of TEBPβ. Second and third, immunofluorescence and gene knockdown analyses show that two enzymes, the telomerase holoenzyme and a RecQ family helicase, are recruited to telomeric G4 structures at the end of S phase and are essential for the unfolding of telomeric G4 structures28,33–35. Currently, it is not clear how or why telomerase is needed to unwind G4 structures during DNA replication nor whether this regulation is conserved among other organisms. However, RecQ helicases in other organisms, such as Sgs1 in S. cerevisiae and WRN and BLM in humans, also act on telomeres and can unwind G4 structures in vitro (reviewed in REF. 36). To date, no-one has isolated antibodies against the human telomeric G4 structure, but the fact that TEBP homologues exist in vertebrates suggests that similar mechanisms might exist in higher eukaryotes.
There is also evidence for G4 DNA at telomeres in human cultured cells: BMVC (3,6-bis(1-methyl-4- vinylpyridinium) carbazole diiodide) is a fluorescent biomarker that binds and stabilizes G4 structures in vitro, and in vivo staining with BMVC marks the distal ends of metaphase chromosomes in human lung adenocarcinoma cells37,38, suggesting telomeric binding. However, it is not clear whether this ligand detects G4 structures formed in vivo or whether it induces G4 DNA formation. Additional in vivo experiments are required to prove the specificity of such ligands.
Owing to the biochemical properties of DNA polymerases, they cannot replicate the very ends of linear chromosomes. In most organisms, telomerase, a telomere-dedicated reverse transcriptase, uses its RNA subunit as a template to lengthen the G-strand of the telomere. Human telomerase is inactive in most somatic cells but is upregulated in most cancers, in which it is thought to promote the lifespan of malignant cells39. G4 structures influence telomerase activity: intramolecular antiparallel G4 structures block telomerase activity, whereas intermolecular parallel G4 DNA is permissive for extension by telomerase40–42.
Because telomerase is active in most human cancers and this activity can be influenced by G4 structures, a variety of small molecule ligands with different specificities and target regions that bind and stabilize G4 structures are being tested in various assays43. The hope is that ligands that promote the formation of certain types of telomeric G4 structures might inhibit telomerase by preventing annealing of telomerase RNA to G-strand overhangs. For example, telomestatin has nanomolar affinity for telomeric G4 structures (which is nearly two orders of magnitude lower than its affinity for double-stranded DNA) and stabilizes intramolecular antiparallel G4 structures in vitro44,45. Moreover, telomestatin inhibits telomerase46 and causes gradual telomere shortening and growth arrest or apoptosis in human tissue culture cancer cells47–52. However, telomeric DNA damage also increases in telomestatin-treated cells50,53,54. Thus, telomere shortening in telomestatin-treated cells might also be due to capping defects, especially as telomere repeat binding factor 2 (TRF2) and POT1 telomere binding are lost in these cells. Indeed, in S. cerevisiae, G4 structures are thought to contribute to telomere capping when natural capping is impaired55. Further research is required to determine whether G4 ligands are effective in vivo, whether they are specific for telomeric DNA and whether their presence has deleterious effects on non-telomeric G4 structures.
During DNA replication, the two strands of the DNA double helix are separated by the replicative helicase: one strand serves as the template for leading strand synthesis and the other for lagging strand synthesis. Although leading strand DNA replication can be continuous, the lagging strand is replicated discontinuously, making it transiently single-stranded; this is a conformation that provides opportunities for G4 structure formation. Thus, during DNA replication, G4 structures may form inappropriately, especially on the lagging strand template (FIG. 3), and this formation is more likely to occur when DNA replication is slowed. In addition, some G4 structures could be present during DNA replication because they have roles in transcriptional regulation (see below). Whether G4 structures are pre-existing or form during DNA replication, they must be resolved for completion of DNA replication because the sequence comprising the G4 structure cannot serve as a template until it is unfolded. Thus, helicases are likely to be necessary to unwind G4 structures.
We surveyed the literature and found that 22 different helicases have been tested for their ability to bind and/or unwind G4 structures in vitro, and all but one, the Escherichia coli RecBCD helicase, was positive (K.P., M.L.B. and V.A.Z., unpublished observations; summarized in Supplementary information S1 (table)). These data suggest that G4 unwinding is a non-specific activity of many DNA helicases. However, most of these unwinding studies are qualitative, and it is difficult to ascertain from them whether a given helicase is particularly effective at unwinding G4 structures and/or whether G4 structures are a preferred substrate for that helicase. Most of the human helicases that unwind G4 structures in vitro56–60 are associated with human diseases that cause genomic instability, including the RecQ helicases WRN (associated with premature ageing) and BLM (associated with increased cancer risk) as well as FANCJ (associated with increased cancer risk) and PIF1 (associated with increased cancer risk). The best evidence that human disease is associated with loss of G4 unwinding comes from the finding that cell lines from human patients with Fanconi anaemia carrying FANCJ mutations display deletions that overlap G-rich regions with the potential to form G4 structures56. In addition, telomestatin, a chemical ligand that is able to stabilize G4 structures in vitro53,61,62, causes impaired proliferation and increased apoptosis and DNA damage in FANCJ-deficient cells63. The association of these helicases with inherited genome instability has heightened interest in the possibility that G4 unwinding might suppress both premature ageing and cancer by regulating G4 structures.
Some enzymes are far more active on G4 structures than others. The S. cerevisiae Pif1 helicase acts at G4 motifs64, and members of the Pif1 DNA helicase family are particularly efficient in vitro unwinders of parallel intramolecular G4 substrates59. Pif1 is a multi-functional DNA helicase that binds >1,000 sites in the genome of mitotic cells, of which ~10% overlap G4 motifs, which represents ~25% of the G4 motifs in this organism. Twenty-five per cent is likely to be an underestimate as, for technical reasons, this number excludes the large number of G4 motifs in ribosomal and telomeric DNA, both of which are strong Pif1 binding sites64. Several genetic assays show that in the absence of Pif1, DNA replication slows and DSBs occur at many of the G4 motifs that are normally bound by Pif1. G4 motifs also show a high mutation rate in Pif1-deficient cells, and these mutations eliminate the ability of the motif to form a G4 structure without necessarily reducing the high GC content of the motif. When these mutated motifs are put back in the genome, they no longer bind Pif1, slow DNA replication or cause DSBs. Together, these data make a strong argument that G4 structures form in vivo and that their resolution by Pif1 suppresses genome instability64. Other studies also found instability of G4 motifs in pif1 cells59,65. This instability was particularly pronounced when the G4 motifs were on the template for leading strand synthesis, but this result may reflect the repetitive nature of the G4 substrate used in this analysis. The frequent mutation of G4 motifs in pif1 mutant cells suggests the involvement of error-prone processes when G4 motifs are replicated and repaired in Pif1-deficient cells64. Indeed, in DT40 chicken cells, REV1, a translesion polymerase, is implicated in replication fork progression past G4 motifs on the leading strand66.
There are also suggestions that human PIF1 acts at G4 motifs. One study used chromatin immunoprecipitation followed by sequencing (ChIP–seq) in combination with in vivo labelling with pyridostatin, a G4 binding molecule67. Genome-wide, pyridostatin bound preferentially to G4 motifs, where it caused replication and transcription-dependent damage that was detected by its high γH2Ax content. Many of the γH2Ax foci overlap with GFP–PIF1 foci in the pyridostatin-treated human cells. The current hypothesis is that G4 formation or stabilization blocks transcription and/or replication, resulting in DNA damage.
Similar to what is seen in cells from patients with Fanconi anaemia whose disease is due to mutations in the FANCJ helicase, mutations in the Caenorhabditis elegans DOG-1 helicase, which is distantly related to FANCJ, cause genome-wide deletions in G-rich sequences with the potential to form G4 structures68,69. The mutation rate in dog-1 mutants is very high (up to 4% per generation68) and increases with the length of the G-tract69. Finally, the activity of regulator of telomere elongation helicase (RTEL) family helicases is also hypothesized to be directed towards G4 structures. Recent data indicate that the human RTEL helicase helps to resolve G4 DNA at telomeres, perhaps in conjunction with BLM, to ensure telomere stability70. Although biochemical evidence of G4 unwinding is lacking for RTEL homologues from other organisms, current data indicate that they may function similarly to human RTEL. For instance, C. elegans rtel-1 has high sequence similarity to dog-1, although G-rich sequences are not unstable in worms deficient for rtel-1 (REF. 71) as they are in dog-1 mutant animals. However, mutation of rtel-1 and him-6 (a BLM homologue) is synthetically lethal in C. elegans, suggesting that RTEL-1 may function in concert with one or more additional helicases (DOG-1 and/or HIM-6) to resolve G4 structures.
The high concentration of G4 motifs near promoter regions suggests a potential function of G4 structures in gene regulation. Indeed, one or more G4 motifs are found within 1,000 nt upstream of the TSS of 50% of human genes72. Intriguingly, bioinformatics show that the promoters of human oncogenes and regulatory genes (for example, transcription factors) are more likely than the average gene to contain G4 motifs, whereas G4 motifs are under-represented in the promoters of housekeeping and tumour suppressor genes22,72. A similar enrichment of G4 motifs in promoter regions is found in other organisms, including yeast, plants and bacteria15,17,20,73,74. Additionally, in humans, G4 motifs are less often found in the template strand than in the non-template strand. Those that are on the template strand tend to cluster at the 5′ end of the 5′UTR75. In yeast, there is no distinct asymmetry in G4 motif location between the non-template and template strands, but there is a correlation between nucleosome-free regions and G4 motifs in promoters15, a finding that supports the prediction that G4 structures will form more easily in nucleosome-free regions17. Experiments in bacteria using a G4 motif on the non-template strand of a plasmid-borne transcribed gene demonstrate loop formation on the opposite strand of the G4 motif, suggesting the existence of G4 structures that form upon transcription in living cells76. Such structures may help to keep the transcribed template accessible for transcription by preventing it annealing to its complementary strand. In this way, G4 structures could contribute to high transcription levels of certain genes (FIG. 4).
It is well known that supercoiling has both positive and negative effects on transcription77, and G4 structures are thought to form as a result of supercoiling- induced stress during transcription78. In vitro studies show that the formation of G4 structures can compensate for the negative supercoiling78,79. These findings suggest that G4 structures in or near promoter regions may influence transcription in both positive and negative ways (FIG. 4). First, depending on which DNA strand encodes the G4 motif, the structure could either inhibit transcription (if the motif is on the template strand, blocking the transcription machinery) or enhance transcription (if the motif is on the non-template strand, maintaining the transcribed strand in a single-stranded conformation). Second, proteins bound to the G4 structures (for example, transcriptional enhancers versus repressors) could also affect transcription (reviewed in REF. 80).
One of the best-studied systems for a role of G4 structures in transcription involves the mammalian MYC (also known as c-MYC) locus (reviewed in REFS 3,79), although findings similar to those discussed below have been reported for multiple loci80–84. MYC is a transcription factor whose expression is associated with cell proliferation. Increased levels of MYC expression are observed in 80% of human cancer cells, and this increase promotes tumorigenesis85–90. Nuclease hypersensitive element III1 (NHE III1), which is downstream of the MYC promoter, controls >80% of the MYC transcription. This element contains a G4 motif that forms a G4 structure in vitro91. Footprinting studies and luciferase reporter assays comparing the expression of a gene with a wild-type NHE III1 versus one with a mutated NHE III1 that cannot form a G4 structure demonstrate that the G4 motif in NHE III1 represses transcription92. In another study, TMPyP4, a compound that binds to and stabilizes G4 structures (but also binds duplex DNA)93,94, reduced MYC transcription in lymphoma cell lines and showed antitumour activity in mice92,95. This reduction is speculated to be mediated by TMPyP4 binding to the G4 structure in NHE III1 of MYC. However, given that TMPyP4 binding is not limited to G4 structures and the many G4 motifs in the genome, more analysis is required to determine its mechanism of action. GQC-05, an analogue of ellipticine (an antineoplastic drug), is another promising therapeutic ligand. GQC-05 binds the G4 structure in the NHE III1 region of MYC in vitro with high affinity and selectivity, and when added to Burkitt’s lymphoma cell lines, GQC-05 results in reduced levels of transcribed MYC mRNA96. However, a recent publication found that 11 known G4 DNA ligands that affect MYC expression in cell-free assays do not interact directly with the MYC G4 structure in certain Burkitt’s lymphoma cell lines97, clouding the interpretation of the GQC-05 results.
Nucleolin, the most abundant nucleolar phosphoprotein in eukaryotic cells, is also proposed to regulate MYC transcription via its interaction with NHE III1. This hypothesis is based on the in vivo binding of nucleolin to the MYC promoter in HeLa cells and the dose-dependent reduction in MYC transcription that occurs in nucleolin-treated cells98. One hypothesis is that nucleolin-mediated G4 formation in NHE III1 inhibits MYC transcription by masking binding sites for MYC transcriptional activators, such as the transcripton factor SP1 and cellular nucleic acid-binding protein (CNBP)99. However, human nucleolin binds many G4 structures and can induce the formation of G4 DNA in vitro98,100–103. Thus, more work is needed to establish that nucleolin-associated changes in MYC transcription are a direct result of its effects on G4 structure formation within the NHE III1 element.
Transcription may also be altered by G4 binding proteins that affect the formation and unfolding of G4 structures. For example, myosin D (MyoD) family proteins are transcription factors that bind to E-boxes in the promoters of several muscle-specific genes to regulate muscle development104. In vitro, MyoD homodimers bind preferentially to G4 structures that are derived from the promoter sequences of muscle specific genes105. One hypothesis is that when G4 structures form in the promoters of E-box driven genes, MyoD homodimers preferentially bind to the G4 structure and not the E-box. Consequently, MyoD–MyoE heterodimers, which cannot bind G4 structures, bind to the E-box instead and enhance gene transcription106. However, like the MYC experiments, additional work is needed to prove this hypothesis.
In addition to gene-specific approaches, results from genome-wide studies analysing the effects of drugs that stabilize and/or induce G4 formation have been used to argue that G4 structures affect transcription79,107. Indeed, expression levels of many genes are influenced by treating cells with G4 ligands. Similar studies have investigated the effects of mutations in helicases known to unwind G4 DNA on transcription genome wide17,108. For instance, in human fibroblasts deficient for the WRN or BLM RecQ helicases, the transcription of genes that are predicted to form intramolecular G4 structures is significantly upregulated (P < 0.0001), and this upregulation correlates with the G4 motifs, not simple G-richness108. The genes associated with G4 motifs account for 20–30% of all transcripts that are upregulated in WRN and BML mutant cells.
Although such studies support a role for G4 structures in transcription, when interpreting genome-wide studies the possibility must be considered that many of the observed changes in gene expression may be indirect. However, in diverse organisms, genes whose expression is affected by G4 ligands are statistically associated with the presence of nearby G4 motifs, which provides some of the best evidence for widespread effects of G4 structures on transcription.
A general criticism of models in which G4 structures affect transcription is that G4 formation is too slow and the stability of G4 structures is too high for them to be used as regulatory elements. This criticism can also be raised against hypotheses suggesting that G4 structures affect telomeres or DNA replication. Indeed, it is well documented that intermolecular G4 DNA structures form and resolve slowly under physiological conditions109,110. However, the existence of chaperones (for example, TEBPβ and Rap1) that promote the formation of G4 DNA27–29 suggests that nature has evolved mechanisms to overcome this slow formation. A recent thermodynamic and kinetic measurement of G4 structure formation indicates that G4 structures can form cooperatively111. Rates of formation for intramolecular G4 structures have also been reported for human telomeric G4 DNA (millisecond timescale)112, and it is possible that other intramolecular G4 structures form as readily. This possibility is simple to test and should be demonstrated directly for other G4 motifs that are proposed to form intramolecular G4 structures that function in vivo. Unwinding of G4 structures in a timely manner can also no longer be considered a problem given the discovery of helicases that bind and unwind G4 motifs with high efficiency (see above).
A new hypothesis suggests that G4 structures might influence epigenetic regulation of gene expression. Maintaining epigenetic marks, such as histone methylation, is essential for stable gene expression and cell identity, and these marks must therefore be preserved after DNA replication and repair. As reported above, G4 structures are thought to cause replication fork stalling. These stalled forks might be restarted with the aid of translesion polymerases, as suggested by data from DT40 chicken cells66, in which REV1, a Y family translesion polymerase113, is implicated in G4 lesion bypass. In the absence of REV1, DNA synthesis is uncoupled from histone recycling mechanisms, and transcriptional activation is blocked66. The authors postulate that REV1 functions in replication at G4 motifs in order to preserve histone modifications66. A recent publication extends this work by showing by microarray analysis that lack of REV1 causes genome-wide dys-regulation of G4-dependent transcription in DT40 cells (P value = 0.005), and this dysregulation is worsened by mutation of the WRN, BLM and FANCJ helicases114.
It is well documented that chromatin can influence the timing of origin activation during DNA replication115. Recently, genome-wide analysis of replication origins116 using a short nascent strand sequencing approach together with deep sequencing techniques identified a large number of new origins in different human cell types. Most of the identified peaks overlap with previously identified origins; however, many of the newly identified origins are significantly associated with G4 motifs. The authors propose that G4 structures near origins promote origin of replication complex binding and thereby influence origin activation116, although direct proof for this model is not yet available.
G4 structures are also suggested to be involved in the alignment of sister chromatids during meiosis. One hypothesis is that G4 structures assist in the formation of the telomere-dependent bouquet structure during meiosis (FIG. 5a), but there is no direct evidence for this appealing possibility26 Various G4-promoting proteins (FIG. 5a, pink) might be involved in formation of the G4 and tethering of the bouquet. G4 structures are also proposed to have a more general role in meiosis: for example, by promoting meiotic homologous recombination76,117 (FIG. 5b). This idea is supported by genome-wide computational studies in yeast that demonstrate overlap between G4 motifs and preferred meiotic DSB sites15, but Spo11, the enzyme that makes the DSBs, does not cleave at G4 motifs118. However, a role for G4 DNA in meiosis is supported by the finding that the S. cerevisiae Hop1 protein, which is a major component of the chromosome axial element–synaptonemal complex during meiosis, promotes G4 formation in vitro119,120. The multifunctional protein Kem1 also binds G4 structures in vitro and cleaves in the single stranded region 5′ of the G4 structures. Together with the fact that kem1Δ cells arrest during meiotic prophase, these results led to speculation that Kem1 acts on G4 structures in vivo121. In addition, the MRX complex, which is composed of Mre11, Rad50 and Xrs2 and acts during meiotic DSB formation, has a high affinity for G4 structures in vitro122,123. However, there is not yet in vivo evidence that Hop1, Kem1 or the MRX complex carry out their meiotic functions by acting at G4 structures.
In several pathogenic microorganisms, recombination provides the basis for antigenic variation in which the pathogen escapes its host’s immune surveillance by changing the identity of a surface antigen. There is good evidence that Neisseria gonorrhoeae, the bacterium that causes human gonorrhoea, uses a G4 based system to regulate expression of the genes that allow it to avoid the human immune system124. N. gonorrhoeae encodes many pilin genes, the products of which make up the hair-like projections, called pili, on the bacterium’s surface. However, only the gene in the pilE locus is expressed, and the identity of the gene at this site switches among the different pilin genes by a recombinational mechanism. The region upstream of the pilE locus contains a 12 bp G-rich segment that is required for antigenic variation and that can form a parallel intramolecular G4 structure in vitro. Mutations that eliminate antigenic variation in vivo also eliminate the ability of the segment to form a G4 structure, while mutations in the loop region of the structure affect neither antigenic variation nor G4 structure formation. Moreover, treating cells with the G4 ligand N-methyl mesoporphyrin IX affects pilE gene conversion events. The N. gonorrhoeae RecQ helicase is one of several proteins required in trans for efficient antigenic variation, providing additional evidence for a role for RecQ helicases at G4 structures in vivo. G4-based N. gonorrhoeae pilE recombination is perhaps the best evidence for a functional role of G4 DNA.
Although different in their three-dimensional conformation, G4 structures and the other non-B-form DNA secondary structures included in BOX 1 display some similarities. First, they can all form readily under the proper in vitro conditions. Second, formation of all of these secondary structures can help to relax negative DNA supercoiling, and Hoogsteen base pairing is often involved in stabilizing the structures. Third, the evolutionary conservation of the motifs capable of forming these secondary structures and the cellular machinery available to resolve them (for example, helicases and mismatch repair) argues for their existence in vivo. However, although they are of considerable interest from a chemical standpoint, some chromosome biologists remain sceptical that these secondary structures are physiologically relevant. G4 DNA provides an excellent example of the gulf between the wealth of in vitro data and the relative scarcity of results demonstrating formation and function of these structures in vivo. The findings that G4 motifs are evolutionarily conserved, over-represented in certain regions and associated with a specific subset of genomic features provides good, albeit indirect, evidence for G4 structures in vivo.
Direct evidence for G4 structures in vivo has been slow in coming. G4-specific antibodies and ligands provide support for G4 DNA in vivo, especially at telomeres, but it is difficult to demonstrate convincingly that the specificity of these reagents is high enough to rule out the possibility that their effects are due to association with B-DNA. Genetic experiments provide the most persuasive evidence to date for the in vivo existence of G4 structures during replication64,68,69 and transcription99. Regardless of the process or function in question, one must test directly for positive roles of G4 structures, for instance by mutating G4 motifs in promoter regions or meiotic DSB sites and determining whether loss of the ability to form a G4 structure affects downstream processes. However, in the end, the most convincing evidence for the existence of G4 structures in vivo will be a direct demonstration of these structures in vivo. Doing so will require a creative approach to isolate the structures with sufficient purity that they can be characterized by the kinds of approaches used to analyse in vitro-generated G4 structures.
To summarize, G4 motifs are ubiquitous in prokaryotic and eukaryotic genomes, and their location is often conserved in closely related species. These motifs may form G4 structures in vivo, and the G4 structures may have functional roles, such as regulating recombination, meiotic DSB formation and/or transcription or providing a template for an RNA that forms a G4 structure that affects its post-transcriptional behaviour (see below). Alternatively (or in addition), G4 DNA formation may be pathological, occurring only occasionally owing to a problem in DNA mechanics, such as slowed DNA replication (as in the presence of hydroxyurea), which would provide more time for G4 DNA formation, especially during lagging strand replication. Pathological G4 structures could form at sites where G4 DNA has a direct or indirect function (for example, meiotic DSB sites in mitotic cells) or at sites that are complementary to an RNA containing a G4 structure that has a function in the RNA (in this case, the G4 RNA has a function but its complement in the DNA does not). Although this Review concerns DNA secondary structures, we would be remiss without noting that similar structures, especially G4 structures, can form in RNA. One possibility is that G4 motifs are encoded in the DNA but mainly function at the RNA level. G4 RNA structures are reported to affect mRNA splicing, translation and degradation (reviewed in REFS 8,125,126). It seems clear that the study of non-canonical RNA and DNA secondary structures will provide fertile ground for research for the foreseeable future.
We thank the US National Institutes of Health, the American Cancer Society and the German Research Organization (DFG) for support.
Competing interests statement
The authors declare no competing financial interests.