|Home | About | Journals | Submit | Contact Us | Français|
Certain guanine-rich sequences are capable of forming higher order structures known as G-quadruplexes. Moreover, particular genomic regions in a number of highly divergent organisms are enriched for such sequences, raising the possibility that G-quadruplexes form in vivo and affect cellular processes. While G-quadruplexes have been rigorously studied in vitro, whether these structures actually form in vivo and what their roles might be in the context of the cell have remained largely unanswered questions. Recent studies suggest that G-quadruplexes participate in the regulation of such varied processes as telomere maintenance, transcriptional regulation and ribosome biogenesis. Here we review studies aimed at elucidating the in vivo functions of quadruplex structures, with a particular focus on findings in yeast. In addition, we discuss the utility of yeast model systems in the study of the cellular roles of G-quadruplexes.
The genomes and transcriptomes of many organisms, including those as diverse as E. coli and humans, contain a number of G-rich sequences that, at least in vitro and perhaps in vivo, are capable of forming structures known as G-quadruplexes (G4-DNA and G4-RNA, respectively). These structures are composed of stacked associations of G-quartets, which are planar assemblies of four Hoogsteen-bonded guanines (Figs. 1A and B) [1, 2]. G4 structures can arise through the interactions of guanines present on a single nucleic acid strand (intra-molecular) or multiple strands (inter-molecular). Beyond hydrogen bonding among guanines, the stability of quadruplexes derives from π-orbital interactions among stacked quartets as well as coordination by quartets of centrally located cations (e.g. Na+ or K+). Thus a minimum of two adjacent quartets, but ideally three or more, is required for stable quadruplex formation. G4 structures are stable under physiologic salt and pH conditions in vitro, and some have higher melting temperatures than the duplex DNA that would be formed by providing the complementary strand. There is a high degree of polymorphism among different G4 structures. In principal, 16 different quartet structures can form, which are distinguished by the patterns of glycosidic bond angle of the guanines . Further, the number of stacked quartets, the number and polarity of the phosphodiester backbone strands from which the guanines derive, the type of coordinated cations, and the length, sequence and connectivity of intervening loops may vary .
Although the structures of G-quadruplexes have been well studied in vitro, if, when and where they form in vivo and how they might affect cell biology have remained key questions. The structural heterogeneity of quadruplexes makes it difficult to obtain universal rules to predict their formation or probes to test for their presence. Nonetheless, a good deal of information demonstrating or strongly suggesting their functions in vivo has emerged in recent years. For example, telomeric G4-DNA has been proven to exist in Stylonichia lemnae [4, 5], and sequences with intramolecular quadruplex-forming potential (QFP) have been shown to be highly overrepresented in the promoter regions of diverse organisms and to be connected with control of gene expression [6–12]. In addition, a number of small molecule ligands have been identified that bind to and stabilize quadruplexes (Fig. 1C) [13–21], and some of these have been found to affect expression from QFP-containing loci, indicating that QFP sequences can adopt G4 conformations.
Here we review findings pertaining to the in vivo functions of G-quadruplexes, with an emphasis on findings in yeast. We begin by highlighting the ways in which yeast model systems can help identify and dissect the cellular roles of quadruplex structures. For readers particularly interested in findings outside of yeast, we recommend these outstanding reviews [22, 23]. Yeast genetic tools have significant potential for revealing the full extent to which G-quadruplexes regulate biological processes, as well as for revealing underlying mechanisms.
S. cerevisiae offers several genetic systems that could facilitate exploration of the in vivo functions of G-quadruplexes. Although none is unique to yeast, the ease with which they can be carried out in this single celled eukaryote make it an ideal choice for these studies. We first describe these systems, and in the second half of the review describe findings obtained from their use.
A powerful method for exploring the function of a gene is to ask how phenotypes caused by mutations in the gene are influenced by mutations at other loci. Synthetic enhancement refers to two mutations that together yield a phenotype more severe than that of the sum of the phenotypes caused by the individual mutations (Fig. 2A). In the case of null alleles, the corresponding genes typically function in distinct biological pathways that operate in parallel to achieve similar outcomes. In the case of partial loss of function alleles, synthetic enhancement may instead identify factors that work together in a protein complex or facilitate different steps in a pathway. In contrast to enhancer mutations, suppressor mutations lessen the severity of a phenotype caused by the first mutation (Fig 2B), and can reflect activation of a step in a biological pathway downstream of impairment by the first mutation, activation of a process that directly counteracts the effects of the first mutation, or activation of a parallel pathway that compensates for the effects of first mutation. While the interpretation of synthetic enhancement and suppressor screens is thus sometimes not straightforward, such screens can provide important clues to the biological functions of particular genes and illuminate the network of pathways in which they function. Instead of asking how two mutations interact with one another, it is also possible to ask how the effects of a drug interact with different mutations. For example, mutations that enhance the effect of the drug might identify factors that metabolize the drug, destabilize the drug target, or participate in a biochemical pathway that compensates for a toxic effect of the drug. The development in recent years of advanced strategies for combining mutations and for drug screening, together with microarray-based readouts, have dramatically improved the ease, accuracy and precision with which these screens can be accomplished in yeast [24–27]. For example, the epistatic mini array profiling (E-MAP) technique identifies pairs of mutations that have quantitatively similar enhancing or suppressing genetic interactions and can thus provide information on which proteins cooperate in particular biochemical pathways . An E-MAP with all pairwise combinations of 743 mutant alleles of genes involved in chromosome biology yielded numerous insights into the mechanisms of chromosome maintenance . In another example, a wealth of chemical-genetic interactions were obtained by exposing a pool of all viable yeast deletion mutants, each marked with a unique sequence tag, to 82 separate compounds; the relative growth of each mutant in presence of each compound was assessed by quantitative PCR amplification of the tags and microarray hybridization .
One can envision several ways in which enhancer and suppressor screens might be used to investigate the biology of G-quadruplexes. First, mutations that enhance or suppress the toxic effects of small molecule G4 binding ligands could point to processes that are strongly influenced by G-quadruplexes. If the toxicity is mediated by the effect of the ligand on G-quadruplexes, factors related to quadruplex function could be identified. However, the possibility that "off-target" effects, unrelated to G4 structures, might cause toxicity needs to be considered. This problem could be addressed by conducting parallel screens using G-quadruplex ligands from different chemical families to discern pathways consistently perturbed by quadruplex ligands. Second, similar screens that rely not on toxicity of quadruplex ligands, but rather on their ability to perturb a particular quadruplex-related function could be designed. A straightforward example is testing for mutations that influence the activation or repression by a quadruplex ligand of the expression of a particular reporter gene that contains a regulatory element with G4-forming potential (Fig 2C). Third, by screening for mutations that influence a process in a fashion dependent on the presence of a G-quadruplex, factors that naturally (i.e. in the absence of exogenous ligand) influence quadruplex activities could be identified. For example, again using the transcriptional regulation paradigm to illustrate this idea, two versions of a reporter gene could be engineered: one under the control of a regulatory element with quadruplex-forming potential and the second lacking this element but otherwise identical (Fig 2D). Mutations that influenced expression of the first reporter but not the second would do so through the QFP element and would therefore identify factors that might affect quadruplex formation or function. In addition, overexpression of proteins that reduce quadruplex formation (e.g. helicases) could be used to further test the hypothesis that the QFP element functions as a G-quadruplex. A fourth type of screen could involve a search for mutations influencing phenotypes caused by selective loss of G4-related functions of particular proteins. No such selective mutations are yet known, but the mapping of the quadruplex binding domain of the RecQ helicases to the RQC domain  opens the possibility that point mutations within this domain, and by analogy similar domains in other proteins that interact with G-quadruplexes, might have the selective defects desired. In addition, as small molecules and proteins that bind with selectivity to subclasses of quadruplexes are identified, it should be possible to use such agents in combination with the four approaches just described in order to learn about the functions of these individual quadruplex subclasses.
Given the enrichment of sequences with QFP in promoter regions in several organisms, as well as other lines of evidence demonstrating that these sequences can modulate gene expression (see below), it is of great interest to determine the extent to which this regulation takes place on a genome-wide scale. Microarray-based expression analyses provide a relatively straightforward way to accomplish this, and such analyses can be carried out in all of the major model organisms as well as in human cells. The effects of quadruplex ligands, or mutations in the genes encoding quadruplex-interacting factors, on global gene expression could be determined. Genes with altered expression could then be compared with the set of genes possessing quadruplex-forming potential to determine if there is a statistically significant association between these sets. Such an association would imply that the ligands or mutations act via a quadruplex-based mechanism to modulate gene expression. Further, it may be possible to find associations between particular kinds of G4 forming sequences and responses to different classes of ligands; while it is not clear at this point that it will be possible to use ligands that are selective for particular G-quadruplex subclasses to tailor gene expression, such findings would provide a starting point for the investigation of this possibility. Because any particular QFP sequence might or might not form a G4 configuration in vivo, an important advantage of the genome-wide approach is that it allows the question of whether QFP sequences, in general, are associated with gene expression changes related to perturbations in factors that affect G-quadruplex formation or function. The yeast genome has a QFP density that is approximately an order of magnitude less than the human genome. For example, approximately 40% and 1.5% of human and yeast upstream promoter sequences have QFP, using the same definition of QFP in both organisms [6, 9]. The smaller number of sequences with QFP in yeast might provide advantages for certain analyses. In particular, it might be more straightforward to discern associations between QFP and responsiveness to factors that affect quadruplexes than it would be in humans, because changes in the expression of fewer genes would be expected, facilitating analysis of the resulting gene expression patterns. A first examination of global gene expression changes caused by perturbation of factors that bind or unwind G-quadruplexes in yeast was published recently, and the results are reviewed below .
Chromatin immunoprecipitation has provided a powerful tool for understanding the distribution of transcription factors and chromatin modifications throughout the genome. Similar approaches might be taken to survey the distribution of quadruplexes on a genome-wide scale. Chromatin could be prepared, fragmented, fractionated based on affinity of the fragments for a quadruplex-binding ligand, and then fragments retained by the ligand could be identified. Selected genomic regions could be analyzed by PCR-based techniques, but a genome-wide view could be provided by using available tiling arrays that cover the yeast genome at 5 bp resolution . Before such approaches can succeed, several obstacles must first be overcome. One difficulty is that the best method for isolating quadruplexes from other genomic sequences is not yet certain. Although antibodies specific for certain G-quadruplexes have been generated, the diversity of G4 folds makes antibody-based capture a non-ideal way to purify quadruplexes in a universal fashion. The interaction of small molecule ligands, while lower-affinity than typical antibody-antigen interactions, might be more universal. Many of these ligands interact with the outer quartet surfaces or intercalate between quartets of quadruplex stacks, making them perhaps relatively insensitive to particular loop structures of intra-molecular quadruplexes and thus relatively universal quadruplex ligands. Nonetheless, there is evidence that they can discriminate to some extent among different quadruplex folds [21, 32, 33]. Affinity chromatography using the selective quadruplex ligand N-methyl mesoporphyrin (NMM) coupled to acrylic beads has been used successfully to select quadruplexes from complex mixtures of nucleic acids [19, 34], and so use of this ligand and probably others should be possible. Another approach would be to use well-defined quadruplex binding domains of proteins, but again, knowledge of the specificity of these domains is at an early stage. Ultimately, use of a variety of ligands will likely be required to probe the full range of possible quadruplexes. A second obstacle is that during the isolation of chromatin and selection with ligands it is possible that quadruplexes might be artifactually formed or lost, and would thus fail to reflect in vivo quadruplexes. Crosslinking of chromatin might minimize this problem, but it will be essential to demonstrate that the level of quadruplexes that are detected depends on the functional status of factors known to affect quadruplexes in the cells from which the chromatin is derived. For example, showing that the apparent quadruplex levels increase in chromatin isolated from sgs1 mutants, which lack the Sgs1p quadruplex-unwinding helicase and would thus be expected have higher steady-state quadruplex levels (Table 1; and see below), would support the interpretation that these quadruplexes actually exist in vivo.
The approaches described above may reveal associations between particular nucleic acid sequences with quadruplex-forming potential, or proteins with quadruplex-binding activity, and biological events hypothesized to be impacted by G-quadruplexes (e.g. transcriptional stimulation or repression of a QFP-rich promoter by a small-molecule quadruplex ligand). To determine whether any such association reflects a bona fide G-quadruplex function would require additional testing. A logical next step would be to then disrupt quadruplex formation or binding via site-directed mutagenesis, followed by a test for loss of the hypothesized quadruplex-dependent events. This approach could be combined with biophysical studies of the native and mutated QFP sequences, or the native and mutated quadruplex-binding proteins, to compare their intrinsic ability to form or bind quadruplexes, respectively. Ideally, a good correspondence between the in vitro and in vivo analyses would be obtained. We note, however, that the quadruplex-forming ability of nucleic acids in vitro might differ from in vivo settings, where other factors (e.g. chromatin proteins) might facilitate or impede quadruplex formation. Given such a scenario, the in vivo tests might prove more definitive. Because G-quadruplexes might interact with chromatin factors (see below), it might be particularly important that point mutations that perturb quadruplex formation be introduced within a native genomic context, and the relative ease with which this can be accomplished in yeast are a benefit of this system.
Many observations suggest, and in some cases demonstrate, roles for G-quadruplexes in different aspects of cell biology. Each of the sections below describes one such aspect, beginning with general examples from several organisms and then focusing on findings from yeast. Table 1 provides a summary of various yeast proteins implicated in G-quadruplex metabolism.
The 3' ends of most eukaryotic telomeres terminate in a guanine-rich single stranded overhang. Consequently, telomere ends can have high G-quadruplex forming potential. Indeed, telomere sequences from several organisms ranging from yeast to humans readily adopt G4-DNA conformations in vitro. Recently, it was demonstrated that G4-DNA can be detected at telomeres in Stylonichia lemnae cells using highly specific G4-DNA antibodies and immunofluorescence microscopy . Importantly, the possibility that the observed staining was an artifact of G4-DNA formation catalyzed by the antibodies was ruled-out by the subsequent demonstration that the staining depends on expression of TEBPβ, a telomere-binding protein with a high-degree of similarity to the beta subunit of the telomere-binding protein in Oxytricha, which itself has been shown to promote G4-DNA in vitro . Other observations suggest that G4-DNA might form at mammalian telomeres. These include demonstrations that loss of the Werner, Bloom or RTEL helicases results in defects in telomere maintenance in vivo [36–38]. The Werner and Bloom helicases are particularly adept at unwinding G-quadruplex substrates in vitro [39–42], while RTEL is homologous to the C. elegans DOG-1 protein that prevents deletions in G-rich sequences [37, 43]. Further, the human telomere binding protein POT1, which binds the single stranded telomere overhang, inhibits G4-DNA formation in vitro , while treatment of cells with the quadruplex-selective ligand telomestatin displaces POT1 and uncaps telomeres (i.e. causes them to be recognized as DNA breaks) [45, 46]. Therefore, POT1 binding and G4-DNA formation at the telomere are likely mutually exclusive states. However, the distribution of telomeres between these states in untreated cells is currently unknown.
In yeast, several observations suggest roles for G-quadruplexes in telomere metabolism. In S. cerevisiae, the telomere repeat DNA is 300–350 bp in length and follows the consensus 5′-[(TG)0–6TGGGTGTG(G)]n-3′ . An average of 8 nt exists between each GGG run, and the intervening sequence may contain GG dinucleotides ; GG runs may contribute to quadruplex formation, albeit more weakly than GGG runs. Yeast telomere repeats form G4-DNA in vitro [48, 49], although the types of G4-DNA structures formed by natural yeast telomere repeats have not been well characterized. The telomere 3' single strand overhang is short (~10–15 nt in G1, and up to 22 nt in S-phase)  and thus might have limited intra-molecular quadruplex forming potential under most conditions. But G4-DNA might form, perhaps transiently, when duplex telomere repeats become single-stranded, for example during telomere replication or recombination. Yeast telomere repeats at the ends of a linear plasmid were found to mediate physical interaction of the ends . This report mentioned unpublished work indicating that the methylation of guanine N7 does not block the association, suggesting that the associations are mediated by a non-G4-DNA structure. These results do not, however, address telomere G-DNA formation in other settings. We note also that the irregular nature of the telomere repeat does not preclude quadruplex formation; this is particularly true of intra-molecular quadruplexes where guanines not involved in quadruplexes can exist in the loops.
A potential in vivo link between telomere G4-DNA and telomere maintenance in yeast was uncovered in studies using a temperature sensitive point mutant, cdc13-1, of the telomere capping protein Cdc13p. Cdc13p inhibits exonucleolytic degradation of the C-rich telomere strand and also regulates telomerase access to telomeres [52–55]. At non-permissive temperatures, this mutant accumulates long stretches of G-rich ssDNA at telomeres, an environment conducive to forming G4-DNA . Overexpression of the quadruplex-binding protein Stm1p rescues cdc13-1 temperature sensitivity . Stm1p has weak homology with the ciliate TEBPβ telomere binding proteins which, as discussed above, bind to and promote formation of G4-DNA in vivo [4, 5, 57]. While Stm1p itself has not been shown to bind G4-DNA in vivo, it shows such activity in vitro, and additionally, is known to bind guanine-rich telomeric and subtelomeric DNA in vitro [58, 59]. Furthermore, overexpression of the RecQ helicase Sgs1p, which efficiently unwinds G4-DNA substrates [40, 60], abolishes rescue by Stm1p overexpression . And G4-forming sequences at the ends of DNA molecules can inhibit recognitions of the ends by checkpoint proteins in vitro . Together, these findings suggest a role for G4-DNA in telomere capping. This capping might occur only in the absence of functional Cdc13p, or alternatively might play a role in normal yeast cells under conditions where Cdc13p is not bound. It would be interesting determine if telomeric quadruplexes can be detected physically in the cdc13-1 mutants rescued by Stm1p overexpression. Antibodies specific for yeast telomeric G4-DNA structures have been generated recently , which might provide a valuable reagent for future studies. We also point out that Cdc13p itself (in addition to its mammalian homolog Pot1) is known to destabilize G4-DNA in vitro, which might be a biologically relevant function given that G4-DNA structures are known to inhibit telomerase action, and would need to be disrupted in order for yeast to maintain their telomeres through a telomerase-based mechanism [44, 63, 64].
A screen was recently performed to identify yeast single-locus deletion mutants that are either resistant or sensitive to growth inhibition by the highly-selective quadruplex-interacting compound NMM [9, 19]. Remarkably, stm1 mutants were among the NMM-resistant class, consistent with the notion that Stm1p stabilizes G4-DNA and thus provides a target for NMM action. A related mutation, cgi121, yielded NMM-sensitivity. Cgi121p is a member of the KEOPS telomere protein complex, and cgi121 mutation also rescues cdc13-1 temperature sensitivity . A possible explanation for these observations is that loss of Cgi121p somehow stabilizes G4-DNA, similar to Stm1p overexpression, and that the observed NMM-sensitivity is a result of increased G4-DNA targets for the drug . In addition to the stm1 and cgi121 mutations being connected with telomere metabolism, gene ontology (GO) categorization of the mutants from the screen revealed that mutations in genes encoding telomere maintenance proteins were significantly overrepresented in both the NMM-sensitive and NMM-resistant categories. Although further work is required to test this idea, these findings suggest that NMM toxicity might be caused, in part, by stabilization of telomeric G4-DNA or interference with its function.
In addition to telomeres, numerous regions in bacterial and eukaryotic genomes have the potential to form intramolecular G-quadruplexes. These regions include the promoter regions of single copy genes, the ribosomal DNA (rDNA), certain minisatellites and the immunoglobulin (Ig) heavy chain switch regions [4, 66–69]. Multiple search algorithmshave been employed to identify these regions, which generally require four runs of three or more Gs separated by loops of any sequence, with the entire pattern falling within a window of defined size [7, 9–12]. Depending on the algorithm employed, different names have been given to the identified sequences, e.g. QFS (potential quadruplex forming sequence), G4P (sequence with G4-DNA forming potential) and QFP (sequence with quadruplex-forming potential); here, we have used only QFP for simplicity. Although, in general, QFP sequences with longer G-runs and shorter loops between the G-runs are more likely to form G-quadruplexes spontaneously in vitro, it is not yet possible to predict with certainly the true potential of an arbitrary sequence to form G-quadruplexes. Moreover, rules for quadruplex formation are likely different in vivo, where their formation might be modulated by both inhibitory and stimulatory factors. Therefore, it is largely unknown which QFP sequences actually form G-quadruplexes in vivo. Recent studies have been aimed at determining whether sequences that have QFP form G4-DNA in vitro and in vivo, or if these sequences instead serve another unknown function (e.g. binding by transcription factors that recognize duplex sequences with QFP).
QFP sequences have been identified in human, chicken, yeast, and bacterial genomes at greater than random frequency near transcriptional promoters, raising the possibility that they play a role in transcriptional regulation (Fig.3A) [6–9, 11, 12]. There is a particular enrichment of these sequences in the promoters of mammalian oncogenes [6, 10], and for some of these, there is good evidence that the QFP sequences can affect transcription. A prominent example is c-MYC, which has an upstream QFP sequence that represses transcription. This repression is augmented by addition of the porphyrin G4-DNA ligand TMPyP4 to cells [70–72]. Although TmPyP4 has little selectivity for binding G4-DNA in comparison to duplex DNA, repression of c-MYC by the ligand is abolished by point mutations that ought to abrogate quadruplex formation by the QFP sequence. Given this result, one of two scenarios is possible; that these mutations affect binding of a sequence specific transcription factor to duplex DNA or that TmPyP4 exerts its effects on c-MYC expression through a bona fide quadruplex target. Likely, experiments are underway that should discriminate between these possibilities. Similar inhibition of KRAS expression by TMPyP4 and of c-KIT by several selective trisubstituted alloisoxazines has been reported [21, 73], although these effects have not been shown to be dependent on the QFP sequences.
Genome-wide approaches examining the relationship between QFP and transcription in S. cerevisiae have yielded findings that support those obtained in other organisms, and have also yielded novel insights. QFP sequences are present at above random frequencies in open reading frames (ORFs) and, to a greater extent, in promoter regions. Further, there are strong correlations between promoter QFP and reduced occupancy of histones H2A, H3 and H2A.Z . H2A and H3 are histone components of canonical nuclesomes, while H2A.Z is a H2A variant that is present at promoters that are poised for transcriptional activation. These observations suggest that G4-DNA formation might exclude nucleosomes, thereby facilitating transcriptional initiation at such sites. This is consistent with another recent report that QFP is enriched more than 200-fold at nuclease-hypersensitive sites near human promoters, which typically have reduced histone occupancy . The correlation between QFP and reduced nucleosome occupancy was not observed in yeast ORFs, implying a possible connection to the initiation of transcription. While binding of the Rap1p transcription factor is correlated with low histone occupancy in yeast, and Rap1p binding sites resemble QFP sequences, the possibility that Rap1p explains the nucleosome exclusion was ruled-out by removing known Rap1p target genes from the analysis [9, 74]. Still, it remains possible that the binding of factors other than Rap1p to duplex DNA at these regions causes the low histone occupancy, and it also remains to be determined whether QFP sequences, themselves, cause nucleosome exclusion or if they are only proximal to such regions.
A possible connection between G-quadruplexes and chromatin was also revealed by the screen for enhancers of NMM toxicity described above. Among null mutations conferring enhanced sensitivity to NMM, there was highly significant enrichment for loci encoding factors involved in nucleosome remodeling or modification. These included several members of the RSC and SWI/SNF nucleosome remodeling complexes, the ADA histone acetyltransferase complex, and histone H2A.Z (encoded by HTZ1). Perhaps G4-DNA helps remove nucleosomes, consistent with the reduced histone occupancy of promoters with QFP, but binding of G4-DNA by NMM interferes with this activity. Interestingly, htz1 mutants, like NMM treated cells, have a heightened requirement for SWI/SNF complex activity [75, 76], raising the possibility that H2A.Z and G4-DNA might have overlapping roles in chromatin modification. It will be important to test this idea further, using well-defined quadruplex-forming sequences in in vitro and in vivo settings.
Gene expression analyses have demonstrated that manipulations in yeast that are predicted to affect G-quadruplex homeostasis preferentially affect loci with QFP, thus supporting the model that QFP sequences can form bona fide quadruplexes that, in turn, modify gene expression. The first of these manipulations, treatment of cells with the quadruplex-selective ligand NMM, causes upregulation of loci with QFP upstream or downstream of promoters . Binding of G-quadruplexes by NMM might upregulate gene expression in a number of ways. Stabilization of G4-DNA by NMM might upregulate gene expression by occluding repressive factors, such as nucleosomes, or by enabling the binding of transcriptional activators. Alternatively, NMM binding might displace repressors that bind to G4-DNA that forms even in the absence of ligand. In the case of loci with QFP located downstream of the promoter and on the sense strand, the binding of NMM to G4-RNA structures in mRNA transcripts might allow for their stabilization. A second manipulation, deletion of the SGS1 gene, which encodes a G4-DNA-unwinding helicase, caused preferential downregulation of loci with QFP in their ORFs [77, 78]. Remarkably, this association was particularly true for the template strand, raising the possibility that quadruplexes unresolved by Sgs1p provide an impediment progression of RNA polymerase. Future studies should help to determine the precise molecular mechanisms underlying the observed NMM-induced and Sgs1p-mediated upregulation of gene expression in loci with QFP.
An argument against the possibility that G4-DNA forms and affects transcription in vivo is that genes exist within duplex DNA, and the equilibrium between the duplex form and the G4-DNA form (which requires melting of the duplex before it can form) might lie too far in the direction of the duplex. While the energetics that govern this equilibrium will differ among various QFP sequences, and will doubtless differ for naked DNA compared with chromatin, several arguments can be made supporting the notion that such sequences do, indeed, form G-quadruplexes. First, the c-KIT oncogene contains a 20 nucleotide QFP sequence within a transcriptional activation element, and this sequence has been shown to form G4-DNA in vitro. Remarkably, G-quadruplexes were determined to account for roughly 30% of the species formed under physiologic conditions by annealing the strand containing this QFP sequence, and flanked by a total of 76 nt of additional DNA, to its complementary strand . Although a conclusive demonstration that the system was at equilibrium was not provided, these findings argue that the G-quadruplex conformation has stability similar to the duplex form. Second, DNA is typically negatively supercoiled in vivo, which favors duplex unwinding and thus should facilitate G4-DNA formation. Third, QFP sequences located downstream from promoters might give rise to G4-DNA after passage of the first RNA polymerase, affecting the subsequent activity of the promoter. Fourth, G4-DNA might form only transiently, e.g. during replication, but affect the binding of factors (e.g. histones) that could have a persistent effect on transcription. Finally, simple thermodynamics of DNA alone might not dictate the equilibrium between duplex and quadruplex forms in vivo. There are numerous examples in biology of processes that require the input of energy to accomplish beneficial tasks (e.g. proofreading by DNA polymerases); perhaps the regulatory benefits of G4-DNA outweigh any costs of its formation and therefore proteins have evolved to favor its formation under specific conditions and in particular genomic regions.
Another mechanism by which quadruplexes might affect gene expression is through G4-RNA structures in mRNA (Fig 3B). Compared with DNA, a substantial portion of total cellular RNA is single stranded and therefore, energetically more likely to form non-canonical secondary and tertiary structures, including quadruplexes . A pivotal study by Darnell et al. demonstrated that the FMRP protein, which is absent in the fragile X mental retardation syndrome, binds to a class of G-rich mRNAs that form intramolecular G4-RNA . A significant proportion of these mRNAs were also found to have reduced or increased polysome occupancy in fragile X patient cells, which strongly suggests that binding of FMRP to these targets regulates their translation. Further, Khateb et al. demonstrated that the 5’ untranslated region (UTR) of the FMRP transcript contains QFP-rich CGG repeats that, when amplified to the extent observed in fragile X premutation carriers, inhibit translation of FMRP in both cell-free extracts and cultured human cells . Moreover, expression of the G-quadruplex destabilizing proteins hnRNP A2 or CBF-A relieves this inhibition. While there is debate as to whether CGG repeats actually form G-quadruplex or only hairpin structures, the effects of hnRNP A2 and CBF-A on FMRP translation are consistent with a quadruplex-based regulatory mechanism. In addition, Kumari et al. identified a G4-RNA-forming element in the 5’ UTR of the human NRAS proto-oncogene that is capable of inhibiting translation in a cell-free in vitro translation system . To extend their findings, the authors also used sequence analyses to predict the existence of 2,922 other sequences with QFP in the 5’ UTRs of human gene transcripts, raising the possibility regulation by G4-RNA has broad effects on gene expression. An important caveat of all of the above studies is that it has not yet been proven rigorously that G-quadruplex formation per se, and not some other function of the quadruplex-forming sequences (e.g. binding to protein factors), is responsible for modulation of translation.
In yeast, there is a peak of QFP in ORFs just downstream from the start of translation; these QFP sequences are found preferentially in the template (i.e. anti-sense) strand and so would be underrepresented in mRNAs. However, this strand asymmetry is limited to short QFP sequences and actually is reversed for QFP sequences with longer loops. Because G4-DNA with shorter loop regions are generally more stable than those with long loops [84, 85], short-loop G4-RNA might be avoided because it could prove strongly inhibitory to translation. Conversely, G4-RNA with long loop regions might be in dynamic equilibrium with other RNA structures that are permissive for translation, and thus represent a mechanism by which translation could be regulated.
As mentioned above, QFP sequences have been identified within the repeated rDNA loci present in the nucleoli of both yeast and humans. Interestingly, this QFP is highly concentrated in the sense strand, and thus G-quadruplexes might form both in the rDNA genes and their rRNA transcripts. Consistent with a role for quadruplexes in nucleolar function, the protein nucleolin, which is involved in both the synthesis and maturation of ribosomes, binds to quadruplexes with very high affinity . Interestingly, treatment of cultured human cells with a synthetic quadruplex aptamer (AS1411), which binds to nucleolin, results in the mislocalization of nucleolin to the cytoplasm, as well as the altered activity of nucleolin-containing protein complexes . Further evidence for a role of quadruplexes in ribosome metabolism has been provided by studies in S. cerevisiae, using microarray-based techniques to assay the effect of NMM treatment on gene expression . In addition to regulation of QFP loci by NMM (see above), the quadruplex ligand also caused a highly significant downregulation of genes connected with nucleolar function, including those involved in rRNA processing and ribosome biogenesis. This finding is intriguing given the high density of QFP sequences within the rDNA and the 25S and 18S rRNAs, and the observation that these rRNAs were downregulated in response to NMM treatment. Thus, NMM might inhibit rRNA transcription, stability, or processing by binding rDNA or rRNA quadruplexes, and the downregulation of other loci having nucleolar function would be a secondary consequence of these quadruplex-related effects of the ligand. While the NMM studies do not address whether G-quadruplexes form naturally in the nucleolus or if they are formed only in the presence of ligand, the binding specificity of nucleolin and the evolutionary conservation of QFP in the rDNA loci suggests that G-quadruplexes play some role in ribosome biogenesis.
Another consequence of the strand bias of QFP in the rDNA repeats is that the strand likely to give rise to G4-DNA will most often be replicated by lagging strand DNA synthesis. Interestingly, the same is true of the QFP present at telomeres. It is possible that the increased levels of single stranded DNA constituting the lagging strand might facilitate G4-DNA formation, thereby generating potential blocks to the completion of replication. Consistent with this idea, studies in human cells have shown that the absence of the WRN DNA helicase, which like its yeast homologue Sgs1p, is particularly adept at unwinding G4-DNA [39, 40, 42, 87], results in loss of the telomere strand that is replicated by lagging strand synthesis . Telomere defects related to replication in sgs1 mutants might have a similar basis [88, 89]. Further, the unwinding of telomere quadruplexes during Stylonichia telomere replication [4, 5], and the CGG repeat-induced pausing of DNA polymerases in vitro [90, 91], are consistent with G4-DNA providing an impediment to replication. However, telomere repeat DNA from the fission yeast S. pombe causes replication pausing regardless of its orientation to the replication fork , suggesting that the G4-DNA at telomeres might represent a difficult substrate for both leading and lagging strand synthesis. Therefore, it is possible that the natural orientation of telomere and rDNA QFP sequences helps ensure their full replication, because G4-DNA formed after the passage of the replication fork on the lagging strand template might be less deleterious than G4-DNA formed ahead of the fork on the leading strand template. In the latter case, the replication fork might collapse, while in the former case, the ability to prime DNA synthesis from a downstream Okazaki fragment could allow for continued replication fork progression, and subsequent removal of quadruplexes and completion of lagging strand synthesis by gap repair. It would be interesting to test if QFP sequences at other genomic loci are preferentially replicated by lagging strand synthesis, or if deleterious consequences (e.g. DNA breaks) occur more frequently when QFP sequences are copied by leading strand synthesis.
G-quadruplexes have long been hypothesized to play roles in DNA recombination, particularly given the transient single-stranded intermediates that occur in many recombination transactions. G4-DNA might play a role in class switch recombination, a highly-specialized form of recombination that takes place during antibody isotype switching in mammalian cells, and involves recombination between switch (S) regions, which contain G-rich sequences confined largely to one strand [93, 94]. Transcription of these regions is required for this process, and produces a stable R-loop that facilitates G4-DNA formation of the G-rich non-template strand, yielding a structure called a G-loop. Studies of these switch regions on plasmids transcribed in E. coli have shown that G-loop formation is elevated in recQ mutants, which lack the quadruplex unwinding helicase RecQ, therefore providing strong evidence that the observed G-loops are formed in vivo . Furthermore, the Mutsα protein, which facilitates recombination in S regions, binds with high affinity to G4-DNA, supporting the biological relevance of quadruplexes in switch recombination . In addition, there is some evidence that certain chromosomal translocations leading to oncogene activation might occur preferentially in QFP-containing regions that give rise to G-loops . While the authors suggest that these translocations are not targeted to G4-DNA itself, but rather to flanking single stranded DNA, it is possible that quadruplex formation facilitates these translocation events by stabilizing the adjacent single stranded target sites.
Studies in yeast suggest possible roles for G4-DNA in homologous recombination during meiosis. Two proteins involved in meiotic recombination, Hop1p and Kem1p, have been shown to have G4-DNA specific activities. Hop1p is a member of the synaptonemal complex that mediates association of homologous chromosomes during meiosis. In vitro, Hop1p binds to and catalyzes the formation of G4-DNA, and promotes the pairing of double stranded DNA molecules via quadruplex structures . Kem1p is a nuclease that can cleave single stranded DNA 5' from G4-DNA structures in vitro, and kem1 mutants are unable to complete meiosis [98, 99]. However, it is not yet clear how the in vitro activities of these proteins on G4-DNA relate to their in vivo functions.
A key factor in the initiation of meiotic recombination, as well as recombination in vegetative cells, is the MRX complex, which is composed of Mre11p, Rad50p, and Xrs2p [100, 101]. The complex recognizes DNA double strand breaks and then Mre11p-dependent nuclease activity processes DNA ends to generate 3’ single stranded overhangs . Rad50p and Xrs2p function, respectively, to tether the DNA ends and to enhance the nucleolytic activity of Mre11p [103–105]. The 3’ single stranded protrusions generated by Mre11p then invade homologous sequences . Interestingly, Mre11p has been shown to bind G4-DNA with greater affinity than single stranded or double stranded DNA substrates, and to cleave G4-DNA upstream of the guanine runs . Caveats that complicate interpretation of these results include the weak affinity of Mre11p for G4-DNA (~100 nM) and the inefficient cleavage activity observed in these studies. Nevertheless, consistent with these findings, loss of Mre11p also resulted in increased sensitivity to the quadruplex ligand, NMM, indicating that elevated levels of G4-DNA in mre11 mutants might enhance the toxicity of NMM . rad50 mutants were found in the same NMM screen to have the opposite phenotype, suppressing the toxicity of NMM, which was surprising, given that the MRX proteins generally work together as a complex. However, the Rad50p component of the MRX complex was shown recently to attenuate the endonuclease activity of Mre11p on G-quadruplex and other substrates in vitro . Therefore, it is conceivable that rad50 mutants have lower G4-DNA levels due to increased endonucleolytic cleavage of quadruplexes by Mre11p, and are therefore resistant to NMM. More studies will be required, however, to confirm the findings of the NMM screen and to test whether Mre11p and Rad50p indeed have opposing effects on G4-DNA substrates in vivo.
A combination of biophysical, bioinformatics, genetic and cell biological approaches have yielded a remarkable series of findings that argue for the relevance of G-quadruplexes to natural biology. However, more work is required to firmly establish the roles of G4-DNA and G4-RNA in nucleic acid functions and to decipher the mechanisms by which they operate. This knowledge might provide new approaches for selectively targeting processes ranging from transcription and translation to DNA replication and recombination to telomere maintenance. Studies in multiple systems will likely be required to unlock the secrets of G-quadruplexes, and we anticipate a key role for yeast in these exciting explorations.
We thank Li-San Wang, Steve Hershman and Qijun Chen for discussions, and Alex Chavez for insightful discussions and comments on the manuscript. This work was supported by the National Institute on Aging (5R01AG021521 to F.B.J)
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.