Telomeres, nucleoprotein complexes located at the ends of eukaryotic chromosomes, are composed of tandem DNA repeats of guanine-rich sequences (
77). Telomeres are essential for chromosomal stability and genomic integrity, provide sites for recombination events and transcriptional silencing, and appear to play a critical role in cellular aging and cancer (
43,
78–81). Telomeric DNA ends are composed of both duplex and guanine-rich 3′-overhang segments, with the former progressively decreasing in length after each round of cell division in somatic cells (
82). By contrast, telomeric overhangs can be elongated by the enzyme telomerase, a ribonucleoprotein complex with reverse transcriptase activity (
83), which is expressed in the majority of cancer cells, thereby helping to maintain telomere length (
84).
The pairing of homologous chromatids at their telomere ends can be mediated through bimolecular quadruplex formation (
11). Such quadruplex structures may also play a role in chromosome synapsis and recombination during meiosis (
85).
The guanine-rich 3′-overhangs of telomeres, such as TTAGGG repeats in humans can equilibrate between single-stranded and monovalent cation-mediated G-quadruplex folds, with the latter inhibiting the activity of telomerase. The telomeric ends in a single-stranded form are maintained by hPOT1 (
86), while disruption of this interaction leads to quadruplex formation. Thus, ligand-induced stabilization of telomeric G-quadruplex scaffolds in humans constitutes a promising strategy for anti-cancer drug development (
87–91). Therefore, much effort has been devoted to the structural characterization of G-quadruplex topologies formed by one, two, three and four human telomeric TTAGGG repeats as a function of monovalent cation, so as to define the scaffolds for anti-cancer drug discovery.
Though extensive studies have been undertaken on both ciliate (
Tetrahymena and
Oxytricha) and eukaryotic (yeast and human) telomeres, the emphasis in this review will be primarily on human telomeres. Single molecule fluorescence energy transfer (FRET) studies of structure and unfolding kinetics of the intramolecular human telomere G-quadruplex revealed two stable folded conformations in both K
+ and Na
+ buffers (
92). Both folded conformations can be opened by addition of complementary oligonucleotide, with temperature dependent studies indicating that unfolding is entropically driven in K
+ buffers (Δ
H = 6.4 kcal mol
−1 and Δ
S = −52.3 cal mol
−1 K
−1), while unfolding in Na
+ buffers exhibits a more significant enthalpic barrier (Δ
H = 14.9 kcal mol
−1 and Δ
S = −23.0 kcal mol
−1 K
−1). Single-molecule FRET spectroscopy has also been used to probe the dynamics of human telomeric DNA containing four guanine-tracts in K
+ solution. Interconversion was detected between three FRET values, interpreted in terms of an unfolded and two folded G-quadruplex states, each of which was further subdivided into long- and short-lived species (
93). The short-lived species were shown to determine the overall dynamics, apparently because they bridge transitions between the long-lived G-quadruplex states.
Single-repeat sequences
The earliest structural information of the human telomere focused on NMR studies of the single-repeat d(TTAGGGT) human telomere sequence in K
+ cation solution (
4). The NMR data established that the single-repeat human telomere sequence tetramerizes to form an all-parallel-stranded G-quadruplex composed of three stacked G-tetrads with all
anti guanine glycosidic torsion angles.
Two-repeat sequences
The X-ray structure of d(TAGGGTTAGGGT) crystals grown from K
+-containing solution defined the architecture of the G-quadruplex formed by the two-repeat human telomere sequence (
17). The structure contained an unanticipated all-parallel-stranded G-quadruplex following bimolecular association of the two-repeat human telomere sequences, with the TTA segments forming double-chain-reversal (or propeller) loops (a). In addition, the end segments also participate in formation of a T–A–T–A tetrad, through pairing of the major groove edges of Watson–Crick A–T pairs (
17).
NMR studies on the two-repeat human telomere sequence d(TAGGGTTAGGGT) demonstrates interconversion between two dimeric G-quadruplex conformers consisting of three stacked G-tetrads in K
+ solution (
94). One of these conformers adopts a symmetric all-parallel-stranded G-quadruplex with double-chain-reversal loops and all
anti guanines (b), similar to that observed in the crystal structure (
17). This conformer predominates for an analog containing a specific dU (in bold) for T substitution (designated
U6)
5′-T1AGGG5(dU)TAGG10GT-3′
The other conformer adopts an asymmetric anti-parallel G-quadruplex with edge-wise loops composed of six
syn guanines and six
anti guanines (c). This conformer predominates for an analog (designated
U1,brU7) containing specific dU and d
brU (in bold) for T substitutions
5′-(dU)1AGGG5T(dbrU)AGG10GT-3′
NMR-based complementary-strand trap, concentration-jump and temperature-jump methods have been used to monitor the kinetics of interconversion and activation barriers between the parallel and anti-parallel G-quadruplex conformers (
94). The equilibrium shifts towards the anti-parallel G-quadruplex (c) at low temperature and towards the parallel G-quadruplex (b) at high temperature for the
U1,
brU7 sequence, with the corresponding enthalpy being 18.5 kcal mol
−1. Furthermore, the anti-parallel G-quadruplex folds faster, but unfolds slower than the parallel quadruplex at temperatures below 40°C.
A related conformational equilibrium has also been observed between a pair of bimolecular G-quadruplexes formed by the d(TGGGGTTGGGGT) two-repeat
Tetrahymena sequence in Na
+-containing solution (
95).
Three-repeat sequences
NMR-based studies have defined the folding topology (a) and solution structure (b) of the three-repeat human telomere sequence d[G
3(T
2AG
3)
2T]
5′-G1GGTT5AGGGT10TAGGG15T-3′
in Na
+ solution (
96). This sequence forms a unique asymmetric bimolecular quadruplex, in which the core composed of three stacked G-tetrads, involves all three G-tracts from one strand and only the last G-tract of the second strand. In this (3+1) G-quadruplex assembly, there is one
syn–syn–syn–anti and two
anti–anti–anti–syn G-tetrads, two edge-wise loops, three G-tracts oriented in one direction and the fourth oriented in the opposite direction (a).
The (3+1) G-quadruplex topology adopted by the three-repeat human telomere sequence establishes how a segment containing three G-tracts can bind to the 3′-end G-tract of another segment. Such quadruplex formation could occur within the 3′-end overhang of human telomeres or when the 3′-end invades the adjacent double-stranded segment of the telomere to form the so-called t-loop (see schematic in c) (
97).
Earlier studies on four-repeat sequences
In 1993, the NMR-based folding topology (a) and solution structure (b) of the four-repeat human telomeric sequence d[AG
3(T
2AG
3)
3]
5′-A1GGGT5TAGGG10TTAGG15GTTAG20GG-3′
was solved in Na
+ cation solution (
9). The intramolecular fold contained three stacked G-tetrads connected by successive edge-wise, diagonal and edge-wise TTA loops. Each guanine-tract had both parallel and anti-parallel aligned neighboring strands around the G-quadruplex, with guanines adopting
syn–syn–anti–anti glycosidic torsion alignments around each G-tetrad. The grooves were accessible for further recognition within this topology, while the connecting loops restricted access to the outward-directed faces of the terminal G-tetrads at both ends. Finally, the 5′- and 3′-terminii project toward the same ends of the G-quadruplex (a).
The X-ray structure of d[AG
3(T
2AG
3)
3] crystals grown from K
+ cation solution exhibited a completely different and unanticipated fold (c) and structure (d) for the intramolecular G-quadruplex (
17). The G-quadruplex was composed of three stacked G-tetrads, such that all strands are parallel, all guanines adopt
anti conformations and all three loops are of the double-chain-reversal (or propeller) type. The double-chain-reversal loops restrict access to three of the grooves, while access is available to the outward-directed faces of the terminal G-tetrads at both ends. Finally, the 5′- and 3′-terminii project toward opposite ends of the G-quadruplex (c), thereby facilitating potential end-to-end alignments of successive G-quadruplexes.
These very different conformers reported for the four-repeat human telomeric sequence in Na
+-containing aqueous solution (
9) and in K
+-containing crystals (
17) appear to highlight the polymorphic character of G-quadruplex scaffolds (
93) as a function of medium and/or monovalent cation type. Nevertheless, accumulating evidence, including biophysical measurements (
98), implied that the intramolecular parallel-stranded G-quadruplex structure of the human telomere observed in K
+-containing crystals, appears unlikely to be the major form in K
+-containing aqueous solution. To this end, three groups have recently systematically investigated the solution structure(s) of four guanine-repeat human telomeric sequences in K
+ cation solution, while keeping in mind that the more crowded environment of the crystal may more closely reflect the crowded situation in the cell nucleus.
More recent studies on four-repeat sequences
The imino proton NMR spectrum of d[AG3(T2AG3)3] in K+ cation solution is indicative of multiple conformations in equilibrium and hence this sequence context is not readily amenable to structural characterization. Three research groups (those of Hiroshi Sugiyama, Danzhou Yang and our group) have taken somewhat different approaches to overcome this limitation and recently contributed to determination of the solution structure(s) of four-repeat human telomeres in K+ solution. Our group's approach is outlined in detail below and these results are placed in the context of independent contributions from the other two groups.
The imino proton NMR spectra corresponding to distinct predominant conformers together with one or more minor conformers were observed for the d[TAG
3(T
2AG
3)
3] sequence, where a T was added at the 5′-end (
99), and for the d[TAG
3(T
2AG
3)
3TT] sequence, where a T was added at the 5′-end and a TT was added at the 3′-end (
100), both in K
+ cation solution, with both cases maintaining the sequence context of the TTAGG human telomere repeat.
5′-T1AGGG5TTAGG10GTTAGG15GTTAG20GG(TT)-3′
The NMR-based folding topology was determined for the predominant conformer of the d[TAG
3(T
2AG
3)
3] sequence in K
+ cation solution (a), and the solution structure determined for an analog containing terminal modifications (underlined) of this sequence, namely d[T
TG
3(T
2AG
3)
3A], with the latter yielding exceptional NMR spectra reflecting a single conformer, together with the same 2D spectral characteristics of the unmodified sequence (
99). Similarly, insertion of a single 8-bromoguanine at position G16 in the d[TAG
3(T
2AG
3)
3] sequence to enforce a
syn glycosidic bond at this position also resulted in NMR spectra corresponding to a single conformer with all the spectral characteristics of the unmodified sequence (
101). The solution structure has been determined for the d[TAG
3(T
2AG
3)
3] G-quadruplex (designated human telomere G-quadruplex form-1) (b) (
101), whose (3+1) topology differs from folds reported previously in Na
+ solution (a) (
9) and K
+-containing crystal (c) (
17). Instead, this G-quadruplex contains three G-tracts oriented in one direction and the fourth in the opposite direction, one
anti–syn–syn–syn and two
syn–anti–anti–anti G-tetrads, and a double-chain-reversal loop followed by two edge-wise loops (
99).
The same G-quadruplex folding topology (a) has been independently reported for the four-repeat human telomere sequences in K
+-containing solution by two other laboratories, one of which used NMR (
102,
103), while the other used both CD (
104) and NMR (
105). The NMR investigation by the former group focused on the sequence d[
AAAG
3(T
2AG
3)
3AA], with the resulting (3+1) topology (
102) stabilized by a stacked A–A–A triple (
103), associated with introduction of terminal adenine modifications (underlined) at either end of the sequence. The latter groups research avoided terminal modifications and was based on judicious positioning of between four and five 8-bromoguanine substitutions, which enforce a
syn guanine alignment at the corresponding guanines in the sequence (
104,
105).
The NMR-based folding topology has also been determined for the predominant conformer of the d[TAG
3(T
2AG
3)
3TT] sequence in K
+ cation solution (
100). This sequence adopts the same (3+1) G-quadruplex core topology adopted by the predominant conformer of the d[TAG
3(T
2AG
3)
3] in K
+ cation solution (
99) outlined in the previous paragraph, except that the first two linkers are of the edge-wise type and the last linker adopts a double-chain-reversal loop (designated human telomere G-quadruplex form-2) (c). Insertion of a single 8-bromoguanine at position G15 in the sequence to enforce a
syn glycosidic bond at this position resulted in NMR spectra corresponding to a single conformer with all the spectral characteristics of the unmodified sequence (
101). The solution structure of the d[TAG
3(T
2AG
3)
3TT] G-quadruplex form-2 is shown in d (
101). An independent NMR-based study (
106) has reached the same conclusions reported above regarding the folding topology (
100) and solution structure (
101) of form-2.
The demonstration of G-quadruplex forms 1 (a) and 2 (c) for the four-repeat human telomere in K
+, together with the all-parallel-stranded, propeller-groove-linked G-quadruplex observed in crystals grown from K
+ solution (c) (
17), support the view that multiple human telomeric G-quadruplex conformers can coexist in K
+-containing solution, a conclusion reached from single molecule FRET studies of the four-repeat human telomere sequence (
92). Furthermore, these studies establish that even small changes to flanking sequences perturb the equilibrium between different coexisting (3+1) G-quadruplex forms. More recent research has attempted to monitor G-quadruplex formation by the four-repeat human telomere in K
+ solution under polyethylene glycol-induced crowding conditions (
107) that perhaps mimic crystallization conditions.
(3 + 1) G-quadruplex fold
The (3 + 1) G-quadruplex scaffold is unique in that three stands are oriented in one direction and the fourth oriented in the opposite direction. Furthermore, two of the three G-tetrads adopt
anti–anti–anti–syn alignments while the remaining G-tetrad adopts a
syn–syn–syn–anti alignment. This topology was first reported in 1994 for the four-repeat
Tetrahymena telomere sequence, d(T
2G
4)
4, in Na
+ solution (
7) and observed a decade later for a four guanine-repeat variant
bcl-2 promoter in K
+ solution in which two guanines were replaced by thymines (
108) (see
bcl-2 sequence section).
The adaptation of the (3 + 1) core G-quadruplex by the three-repeat human telomere dimeric G-quadruplex in Na
+ solution (a) (
96), as well as by the four-repeat human telomere G-quadruplexes form-1 (a) and form-2 (c) in K
+ solution, established it to be a robust folding topology, thereby highlighting its candidacy as an important platform for structure-based drug design.