|Home | About | Journals | Submit | Contact Us | Français|
Previous studies have demonstrated that nuclease hypersensitivity regions of several proto-oncogenic DNA promoters, situated upstream of transcription start sites, contain guanine-rich tracts that form intramolecular G-quadruplexes stabilized by stacked G•G•G•G tetrads in monovalent cation solution. The human c-kit oncogenic promoter, an important target in the treatment of gastrointestinal tumors, contains two such stretches of guanine-rich tracts, designated c-kit1 and c-kit2. Our previous nuclear magnetic resonance (NMR)-based studies reported on the novel G-quadruplex scaffold of the c-kit1 promoter in K+-containing solution, where we showed for the first time that even an isolated guanine was involved in G-tetrad formation. These NMR-based studies are now extended to the c-kit2 promoter, which adopts two distinct all-parallel-stranded conformations in slow exchange, one of which forms a monomeric G-quadruplex (form-I) in 20mM K+-containing solution and the other a novel dimeric G-quadruplex (form-II) in 100mM K+-containing solution. The c-kit2 promoter dimeric form-II G-quadruplex adopts an unprecedented all-parallel-stranded topology where individual c-kit2 promoter strands span a pair of three-G-tetrad-layer-containing all-parallel-stranded G-quadruplexes aligned in a 3′ to 5′-end orientation, with stacking continuity between G-quadruplexes mediated by a sandwiched A•A non-canonical pair. We propose that strand exchange during recombination events within guanine-rich segments, could potentially be mediated by a synapsis intermediate involving an intergenic parallel-stranded dimeric G-quadruplex.
Guanine-rich DNA and RNA sequences can form G-quadruplexes stabilized by stacked G•G•G•G tetrads in monovalent cation-containing solution (1–4). The length and number of individual G-tracts and the length and sequence context of linker residues define the diverse topologies adopted by G-quadruplexes. These G-quadruplexes can serve as therapeutic targets (5) and considerable effort has gone toward identifying evidence in support of an in vivo functional role for G-quadruplexes (6–8).
Systematic bioinformatic sequence analysis of human genomes has identified a prevalence of guanine-rich tracts capable of G-quadruplex formation (9–13). Such enrichment in putative G-quadruplex-forming motifs is especially prevalent in promoter regions spanning 1kb upstream of transcription start sites of genes (14) and since such sequences also correlate with nuclease hypersensitivity sites, current efforts have been directed toward understanding the role of promoter-mediated G-quadruplex formation in transcriptional regulation (15), as well as gene regulation (16).
The c-kit oncogene is an important target in the treatment of gastrointestinal tumors (17). Since the proto-oncogenic c-kit promoter encodes for a tyrosine kinase receptor, much effort has been directed towards identification of inhibitors of c-kit kinase activity. Even initially successful drugs such as Gleevec (imatinib) (18) have found limited efficacy, since new patterns of resistance mutations within the binding site have impacted on clinical effectiveness (19).
Two guanine-rich sequences containing four guanine tracts, designated c-kit1 and c-kit2, have been identified within the nuclease hypersensitivity region of the promoter segment of the human c-kit gene, upstream of the transcription start site. Both c-kit1 and c-kit2 guanine-rich sequences exhibit nuclear magnetic resonance (NMR) imino proton chemical shifts characteristic of G-quadruplex formation in K+-containing solution (20,21). A potential approach to inhibition of the expression of this gene involves selective stabilization of G-quadruplex structures that may be induced to form in the c-kit promoter region (22,23).
The four guanine-rich repeat c-kit1 22-mer sequence (20) is shown below:
Our group investigated the NMR-based solution structure of this c-kit1 22-mer sequence in K+-containing solution and determined that it formed an unanticipated all-parallel stranded G-quadruplex scaffold stabilized by three stacked G-tetrads (24). The guanine residues involved in G-tetrad formation are shown in bold in the above sequence and for the first time we demonstrated that an isolated guanine (G10) can be involved in G-tetrad core formation, despite the presence of four G-G-G tracts in the sequence. The G-quadruplex scaffold (Figure S1a and b, Supplementary Data) is composed of two single-residue double-chain-reversal loops, a two-residue loop, and a five-residue stem–loop, which contains base-pairing alignments. This novel G-quadruplex scaffold could serve as a specific platform for ligand-based drug design targeted to the c-kit1 promoter (25).
Our NMR-based structural studies have now been extended to the four guanine-rich repeat c-kit2 21-mer promoter sequence (21) shown below:
with guanines involved in G-tetrad formation (based on structural analysis outlined below) highlighted in bold. We show that minimal variants of the c-kit2 sequence adopt two distinct all-parallel-stranded G-quadruplex conformations in slow exchange, one of which forms a monomeric (form-I) and the other a dimeric (form-II) G-quadruplex that forms a new fold.
The c-kit2 promoter sequences for NMR studies were prepared using two protocols, one of which involved gel-filtration through Sephadex G-25 in a centrifuge (spin-down column) and the other involved equilibrium dialysis.
Spin-down columns were prepared by packing of G-25 Sephadex, previously swollen in water, in a 3-ml syringe, whose bottom was covered with glass wool. The suspended gel was packed by centrifugation of the syringe inserted into the centrifuge tube in a bucket rotor for 4min at 1600g. After packing, the column was washed five times with 200µl of the buffer. Subsequently, 200µl of the solution containing oligonucleotide was applied on the column top and recovered after centrifugation in the eppendorf vial inserted under the syringe in the centrifuge tube. (26,27).
Equilibrium dialysis was typically performed in 2ml of the oligonucleotide solution placed inside the Spectrapor membrane tube (Mol. Wt. cutoff 1000 Da). The dialysis against 3.5l of buffer involved the following sequence of changed buffers: water (1h); 100mM (or 20mM) KCl (1h); water (1h); 100mM (or 20mM) KCl (1h); water (1h); 100mM (or 20mM) KCl + 5mM potassium phosphate (overnight). Finally, the concentration of the dialysis buffer was lowered so as to account for the subsequent concentration of the sample for NMR experiments and the sample was dialyzed against this diluted buffer for 1hr. After final dialysis, the sample was lyophilized.
Samples shown on Figures 1b and and5a–c5a–c were prepared by method of gel-filtration through Sephadex G-25 packed in spin-down columns. The remaining samples were prepared by equilibrium dialysis (24,28,29). The strand concentration of the samples varied from 0.2 to 4.0mM (measured using millimolar extinction coefficients 202 and 204 for forms I and II, respectively) and the solution was either 20mM KCl, 5mM K-phosphate buffer, pH 6.8 (form-I) or 100mM KCl, 5mM K-phosphate buffer, pH 6.8 (form-II).
Annealing experiments were performed in the NMR spectrometer. The sample in the NMR tube was heated to 85°C and then gradually cooled down in 5°C decrements. NMR spectra were recorded at each temperature after 5min of equilibration time.
Oligonucleotides containing H8-deuteration of specific guanine residues were synthesized on ABI-392 synthesizer using DMF-protected 8D-dG phosphoramidite (Glen Research). Protection groups were removed by incubation of the oligonucleotide in concentrated NH4OH solution at room temperature for 17h.
Electrophoresis experiment was performed with 10×7cm native gel containing 20% (Figure 1e) or 25% (Supplementary Figure S16c) acrylamide (Acrylamide:Bis-acrylamide=37.5:1) in TBE buffer, pH 8.3 supplemented with 10mM (Figure 1e) or 25mM (Figure S16c) KCl. Each sample contains 5µg DNA at the concentration of 0.3–0.8mM. Gel was viewed by ultraviolet (UV) shadowing (Figure 1e) or after staining with 0.1% toluidine blue (Figure S16c).
Circular dichroism (CD) spectra were recorded on a Jasco-815 CD spectrphotometer using 1cm quartz cuvette in a reaction volume of 600µl at 20°C. Scans between 220 and 320nm were recorded at a rate of 200nm/min, 1nm pitch and 1nm bandwidth. The DNA concentration was 5µM. The sample contained 70mM KCl, 20mM K-phosphate, pH 7.
NMR experiments were performed on 600MHz Varian NMR spectrometers with data recorded at 25°C. Guanine base resonances were assigned unambiguously by using site-specific low-enrichment labeling and through-bond correlation at natural abundance (28,29). Assignment for some residues were verified and confirmed on independently synthesized samples with specific substitutions. Spectral assignments were also assisted and supported by COSY, TOCSY, 13C-HSQC and NOESY spectra. Interproton distances involving exchangeable protons were categorized as strong (1.2–3.8Å), medium (2.0–6.0Å) or weak (3.5–6.5) based on cross-peak intensities recorded in NOESY spectra (50 and 300ms mixing time) in H2O solution. Interproton distances involving nonexchangeable protons were measured from nuclear Overhauser enhancement (NOE) buildups using nuclear Overhauser enhancement spectroscopy (NOESY) experiments recorded at mixing times of 50, 100, 200 and 300ms for form-I and 100, 150, 200 and 300ms for form-II c-kit2G-quadruplexes in 2H2O solution.
The structures of the ckit2G-quadruplexes were calculated using the X-PLOR (30) and XPLOR-NIH (v.2.11-2) (31) programs as described previously (32), with protocols differing in order to account for non-crystallographic symmetry of the dimeric form-II G-quadruplex. The initial folds guided by NMR restraints listed in Tables 1 and and22 were obtained using torsion dynamics with R−6 distance averaging for monomeric form-I and sum-averaging (with ambiguous restraints) for dimeric form-II G-quadruplexes. The distance restraints for dimeric form-II G-quadruplex obtained from build-up measurements were augmented by single-mixing time (300ms) distances from the NOESY spectrum of the I4 analog of c-kit2 promoter sequence (which showed better spectral resolution of resonances associated with G3 and I4 residues). The structures were further refined by Cartesian dynamics and, finally, using relaxation matrix refinement.
The initial fold consisted of an extended DNA strand (two strands in the case of form-II) with randomized chain torsion angles of constituent nucleotides, whose angles and bonds were set up in accordance with the most updated measurements (33,34). Folding of the dimeric form-II G-quadruplex from two extended strands resulted in substantial overpopulation by high-energetic left-handed forms, with the stacking order inverted relative to the chemical order of bases. The initial folding of the dimeric form-II G-quadruplex was therefore achieved in two steps. In the first step, only restraints for the 5′-end G-quadruplex were activated. Three independently obtained 5′-end associated right-handed G-quadruplex molecules were used in the second step, where restraints for both 5′-end and 3′-end G-quadruplexes were activated. At initial stages of dimeric form-II G-quadruplex computations, with lesser amount of restraints used, some 3′- to 3′-end oriented G-quadruplex associations were obtained (with five-residue linker in extended conformation). The NMR spectra were examined for the possible formation of cross-peaks indicating presence of dimeric 3′- to 3′-end G-quadruplex association. We do not observe NOE cross-peaks between the methyl group of T21 and imino-protons of G4 and G8, which, together with other set of observed and assigned cross-peaks, rule out formation of a dimeric 3′- to 3′-end (tail-to-tail) G-quadruplex, in favor of a dimeric 3′- to 5′-end (tail-to-head) G-quadruplex.
In the heating stage, the regularized extended DNA chain was subjected to 60ps of torsion-angle molecular dynamics at 40000K using a hybrid energy function composed of geometric and NOE terms. The van der Waals (vdW) component of the geometric term was set to 0.1, thus facilitating torsional bond rotations, while the NOE term included NOE-derived distances with the scaling factor of 150. The structures were then slowly cooled from 40000K to 1000K over period of 60ps during which the vdW term was linearly increased from 0.1 to 1. At the third stage, the molecules were slowly cooled from 1000K to 300K for 6ps of Cartesian molecular dynamics (35). The 26 best structures of form-I with no 0.5Å distance violations and 20 best structures of form-II, with as much as five 0.5Å distance violations allowed, and minimal energies were selected for further refinement.
Cartesian molecular dynamics was initiated at 300K and the temperature was gradually increased to 1000K during 7ps. The system was equilibrated for 0.5ps, while the force constants for the distance restraints were kept at 1kcalmol−1Å−2. Subsequently, the force constants were linearly scaled up to 150 during 17.5ps. The system was then slowly cooled to 300K in 14ps and equilibrated at 300K for 12ps. The coordinates saved every 0.5ps during the last 4.0ps were averaged. The resulting average structure was subjected to minimization until the gradient of energy was <0.1kcalmol−1. The soft planarity restraints imposed on the G-tetrads with the weight 10kcalmol−1Å−2 and base pairs with the weight 2kcalmol−1Å−2 before the heating process were removed at the beginning of equilibration stage. The electrostatic term was excluded from the energy function to increase the weight of covalent geometry terms during minimization process. The dihedral and hydrogen-bonding restraints for G-tetrad formation were maintained throughout the computations.
To account for spin diffusion effects, 12 best (form-I) and 10 best (form-II) distance-refined structures were next subjected to the energy minimization with back-calculation of the NOESY spectra with X-PLOR (36). The relaxation matrix was set up for the nonexchangeable protons, with the exchangeable imino and amino protons replaced by deuterons. NOE intensity volumes from 223 (form-I) and 145 (form-II) nonexchangeable cross-peaks for each of four mixing times (50, 100, 200 and 300ms for form-I and 100, 150, 200 and 300ms for form-II) were used as restraints, with uniform upper and lower bounds of ±30%. Dynamics was started at 5K, and the system was heated up to 300K in 0.6ps. During the subsequent relaxation, the force constant for NOE intensities was gradually increased from 0 to 300kcalmol−1Å−2 with simultaneous decrease of the distance force constant of nonexchangeable protons from 50 to 30kcalmol−1Å−2. The force constant for exchangeable protons and hydrogen bonds was kept at 100kcalmol−1Å−2. After equilibration at 300K for 3.0ps the resulting structure was subjected to minimization until the gradient of energy was <0.1kcalmol−1. The NMR R-factor (R1/6) improved from initial value of 6.0% to 3.0% with simultaneous improvement of structural convergency. Two different rotational conformers obtained for form-II at distance refinement step converged to single form after intensity refinement. One additional round of distance refinement with subsequent intensity refinement was undertaken to generate 10 best conformers of the dimeric form-II G-quadruplex.
Our studies have focused on the c-kit2 sequence analog where T21 replaced G21 (designated c-kit2 T21 promoter, Figure 1a), because this analog, unlike the parent sequence, gave better resolved imino proton NMR spectra in K+-containing solution, as reported previously (21).
We used either a spin-down column or equilibrium dialysis methods for purification and buffer exchange of the c-kit2 samples used for NMR studies (see ‘Materials and methods’ section for protocol details). An NMR sample (0.21mM in strands) prepared using the spin-down approach was dissolved in 20mM KCl, 5mM phosphate, H2O, next heated to 90°C, and subsequently cooled over a 2-h period. The imino proton NMR spectrum recorded at 43°C after the sample stood overnight at this temperature is shown in Figure 1b, and consisted of doubling of the imino proton resonances, indicative of roughly equimolar amounts of two conformations in slow exchange on the NMR timescale. One of the conformers corresponded to form-I, while the other is designated form-II.
We subsequently found that G12 to T12 substitution of c-kit2 T21 promoter (designated c-kit2 T12/T21 promoter) stabilized form-I in 20mM KCl solution. An NMR sample of the c-kit2 T12/T21 promoter (0.54mM in strands), prepared using the equilibrium dialysis approach, was dissolved in 20mM KCl, 5mM K-phosphate, H2O, pH 6.8. Its imino proton NMR spectrum recorded at 25°C is shown in Figure 1c, and consists of a single conformer exhibiting 12 partially resolved imino protons between 11.0 and 12.0ppm, thereby allowing us to collect two-dimensional NMR spectra of form-I in the absence of form-II.
An NMR sample of the c-kit2 T21 promoter (0.24mM in strands), prepared using the equilibrium dialysis approach, was dissolved in 100mM KCl, 5mM K-phosphate, H2O, pH 6.8. Its imino proton NMR spectrum recorded at 25°C is shown in Figure 1d, and consists of a single conformer exhibiting 12 partially resolved imino protons between 10.8 and 12.0ppm corresponding to form-II. This spectrum recorded under higher KCl (100mM) conditions, has allowed us to collect two-dimensional NMR spectra for form-II in the absence of form-I.
In the following sections, we report on our NMR-based structure determination of form-I G-quadruplex formed by c-kit2 T12/T21 promoter in 20mM KCl-containing solution and form-II G-quadruplex formed by c-kit2 T21 promoter in 100mM KCl-containing solution. Our structural studies establish that form-I is an all-parallel-stranded monomeric G-quadruplex, while form-II is a novel all-parallel-stranded dimeric G-quadruplex. The conclusions related to oligomerization state (monomer versus dimer) are verified by gel shift data outlined in Figure 1e, with the data presented and analyzed at the beginning of the ‘Discussion’ section.
Imino proton NMR spectra of the c-kit2 T12/T21 promoter (prepared using equilibrium dialysis, 1.12mM in strands) in 20mM KCl-containing solution have been recorded at 25°C (Figure S2a), as well as immediately following heating to 85°C and gradual annealing to 25°C (Figure S2b), and at this temperature after 7 days (Figure S2c) and 14 days (Figure S2d). The imino proton spectra are characteristic of the c-kit2 form-I conformation both prior to (Figure S2a) and after annealing (Figure S2b), as well as over time at 25°C (Figures S2c and d).
Imino proton NMR spectra of the c-kit2 T21 promoter (prepared using equilibrium dialysis, 0.065mM in strands) in 20mM KCl-containing solution have been recorded at 25°C (Figure S3a), as well as immediately following heating to 85°C and gradual annealing to 25°C (Figure S3b), and at this temperature after 1day (Figure S3c), 4 days (Figure S3d) and 8 days (Figure S3e). The imino proton NMR spectra of the freshly prepared sample is a mixture of c-kit2 forms-I and –II (Figure S3a), which converts to predominantly form-I after annealing (Figure S3b), and then gradually converts to form-II over time (Figure S3c–e; characteristic form-I resonances indicated by a pair of arrows).
Imino proton NMR spectra of the c-kit2 T21 promoter (prepared using equilibrium dialysis, 0.065mM in strands) in 100mM KCl-containing solution have been recorded at 25°C (Figure S4a), as well as immediately following heating to 85°C and gradual annealing to 25°C (Figure S4b), and at this temperature after 16h (Figure S4c), 46h (Figure S4d) and 97h (Figure S4e). The imino proton NMR spectra of the freshly prepared sample is characteristic of c-kit2 form-II (Figure S4a), converts to a mixture of forms-I and –II after annealing (Figure 3b), and then gradually converts back to form-II over time (Figure S4c–e; characteristic form-I resonances indicated by a pair of arrows).
It should be noted that the imino proton NMR spectra of the c-kit2 T21 promoter (prepared using equilibrium dialysis) in 100mM KCl-containing solution at 25°C, with spectral characteristic of c-kit2 form-II, remains unchanged as the strand concentration varies from 0.065mM (Figure S5a), to 0.13mM (Figure S5b), to 0.26mM (Figure S5c), to 0.52mM (Figure S5d). This implies that the c-kit2 form-II conformer is independent of concentration over a 10-fold dilution range.
All NMR experiments on form-I were performed on samples prepared by equilibrium dialysis for purification and buffer exchange. The exchangeable proton spectrum of c-kit2 T12/T21 promoter (0.54mM in strands) corresponding to form-I in 20mM KCl, 5mM K-phosphate buffer, pH 6.8 at 25°C is plotted in Figure 2a. We observe 12 exchangeable imino protons between 11.0 and 12.0ppm, in a spectral range characteristic of guanine imino protons involved N-H••O hydrogen bond formation, a feature characteristic of G-tetrad formation (3). We pursued a strategy of assigning the guanine imino protons by incorporating 2% 15N-labeled-guanines one at a time into the c-kit2 T12/T21 promoter sequence (Figure 2b) (29) and then correlating these established assignments with guanine H8 protons (Figure 2c) through long-range through-bond J-coupling experiments (data collected on a sample that was 1.7mM in strands, Figure 2d) (28). The imino proton assignments of the c-kit2 T12/T21 promoter are listed above the control NMR spectrum in Figure 2a, and are analyzed in the context of the four-guanine tracts G2-G3-G4, G6-G7-G8, G14-G15-G16 and G18-G19-G20 in the 21-mer sequence (Figure 1a). We observe four slowly exchanging guanine imino protons assigned to G3, G7, G15 and G19 in the spectrum of the c-kit2 T12/T21 promoter recorded 1h following transfer from 20mM KCl, 5mM phosphate, H2O solution (after lyophilization) to its 2H2O counterpart (Figure 2e).
The expanded NOESY contour plot (250ms mixing time) correlating base and sugar H1′ protons of c-kit2 T12/T21 promoter (1.7mM in strands) in 20mM KCl, 5mM K-phosphate buffer, H2O, pH 6.8 at 25°C is shown in Figure 2f. The corresponding NOESY contour plot (250ms mixing time) of the c-kit2 T12/T21 promoter (1.1mM in strands) recorded on a separately prepared sample in 2H2O at 25°C is plotted in Figure S6 and exhibits better resolution. We can trace the NOEs between the base protons and their own and 5′-flanking sugar H1′ protons and these connectivities are traced in Figure 2f (sample in H2O) and Figure S6 (separate sample in 2H2O), yielding the base and sugar proton assignments listed in Table S1 (Supplementary Data). The remaining sugar protons were assigned by through-bond COSY and TOCSY experiments and are also listed in Table S1. We do not observe strong NOEs between base and sugar H1′ protons at short (50ms) mixing times, indicative of anti torsion angles (37) for all guanines for form-I of the c-kit2 T12/T21 promoter.
We have assigned guanines to the three G-tetrads by monitoring NOEs between guanine imino and H8 protons around individual G-tetrad planes (Figure 3a) in the NOESY spectrum (mixing time 250ms) of c-kit2 T12/T21 promoter (1.7mM in strands) in 20mM KCl, 5mM K-phosphate, H2O, pH 6.8, at 25°C (Figure 3b). Thus, the imino proton of G2 shows an NOE to the H8 proton of G6 (labeled 2/6 in red), the imino proton G6 shows an NOE to the H8 proton of G14, the imino proton of G14 shows an NOE to the H8 proton of G18, and the imino proton of G18 shows an NOE to the H8 proton of G2, thereby linking the guanines and their order around the G2•G6•G14•G18G-tetrad plane (assignments in red) (Figure 3b–d). Related NOE tracings identify the G3•G7•G15•G19 (assignments in orange) and G4•G8•G16•G20 (assignments in blue) G-tetrads that generate the form-I G-quadruplex (Figure 3b–d). Since the imino protons of the G3•G7•G15•G19G-tetrad exhibit the slowest exchange rates (Figure 2e), this central G-tetrad must be bracketed on either side by stacked G2•G6•G14•G18 and G4•G8•G16•G20G-tetrads, resulting in the backbone tracing from G1 to T21 for the c-kit2 monomeric form-I G-quadruplex as shown in Figure 3d. All guanines adopt anti glycosidic torsion angles and all three loops are of the double-chain-reversal type (38). Loop 1 is made of a single nucleotide (C5), loop 2 is composed of 5nt (C9-G10-C11-G12-A13), while loop 3 is also made up of a single nucleotide (A17).
Initial distance-restrained and subsequent intensity-restrained molecular dynamics calculations (see ‘Materials and methods’ section) of the solution structure of the c-kit2 T12/T21 form-I monomeric G-quadruplex were guided by exchangeable and non-exchangeable proton restraints, with the numbers listed by category in Table 1. The ensemble of 12 refined superpositioned structures is shown in stereo in Figure 4a, with a representative refined structure in the same orientation shown in a ribbon representation in Figure 4b and a surface representation in Figure 4c. The ensemble of refined structures is well converged, exhibiting pairwise rmsd values in the 0.43-range for the stacked G-tetrad core (Table 1).
The first loop is of the double-chain-reversal type and is bridged by a single C5 nucleotide, which unexpectedly stacks with C9 from the second double-chain-reversal loop (Figure 4d, stereo view of refined structures in Figure S7a), indicative of unanticipated interaction between residues in adjacent double-chain-reversal loops. In this regard, the observation of NOEs between the sugar protons of C5 and C9, namely C5(H2′,2′′)-C9(H5), validates this proposed alignment. The second loop is also of the double-chain-reversal type and is composed of five nucleotide-segment C9-G10-C11-G12-A13 (Figure 4e), and despite its length, is reasonably well ordered (stereo view of refined structures in Figure S7b). The partial overlap of the bases of the non-adjacent G10 and T12 in this 5-nt double-chain-reversal loop in the c-kit2 monomeric form-I G-quadruplex (Figure S8a) can be compared with the partial overlap of the bases of the non-adjacent T5 and A7 in the four-repeat human telomere G-quadruplex for the all-parallel-stranded form in the crystal (Figure S8b) (39) and for the related (3+1) form in solution (Figure S8c) (40–42). The third loop is of the double-chain-reversal type and is bridged by a single well-defined A17 nucleotide (Figure S9a and stereo view of refined structures in Figure S7c). Finally, a reversed C1•A13 non-canonical pair stacks over the G14•G18 segment of the top G2•G6•G14•G18 tetrad (Figure S9b), while terminal residue T21 stacks on G20 of the bottom G4•G8•G16•G20 tetrad (Figure S9c).
The majority of NMR experiments on form-II were performed on samples prepared by spin-down column for purification and buffer exchange. The only exceptions involved the use of equilibrium dialysis for experiments with H8-deuterated G labels, I for G substitutions and 5meC for C substitutions. The exchangeable proton spectrum of c-kit2 T21 promoter (0.3mM in strands) corresponding to form-II in 100mM KCl, 5mM K-phosphate buffer, pH 6.8 at 25°C is plotted in Figure 5a. We observe 12 partially resolved exchangeable imino protons between 10.7 and 12.0ppm, in a spectral range characteristic of guanine imino protons involved in G-tetrad formation. The imino proton NMR spectrum of the c-kit2 T21 promoter in 100mM KCl prepared by the spin-down approach (Figure 5a) differs somewhat from its counterpart prepared by the equilibrium dialysis approach (Figure 1d), but both exhibit four resolved imino protons between 11.5 and 12.0ppm, while exhibiting small chemical shift differences among imino proton resonances between 10.9 and 11.3ppm.
We pursued a strategy of assigning the guanine imino protons by incorporating 2% 15N-labeled-guanines either in pairs or one at a time into the c-kit2 T21 promoter sequence (Figure 5b) and then correlating these established assignments with guanine H8 protons (data collected on a sample that was 3.9mM in strands, Figure 5c) through long-range through-bond J-coupling experiments (Figure 2c). The imino proton assignments of the c-kit2 T21 promoter are listed above the control NMR spectrum in Figure 5a, and are analyzed in the context of the four-guanine tracts G2-G3-G4, G6-G7-G8, G14-G15-G16 and G18-G19-G20 in the 21-mer sequence (Figure 1a). We observe four slowly exchanging guanine imino protons assigned to G3, G7, G15 and G19 in the spectrum of the c-kit2 T21 promoter recorded 1h following transfer from 100mM KCl-containing H2O solution (after lyophilization) to its 2H2O counterpart (Figure 5d).
The expanded NOESY contour plot (300ms mixing time) correlating base and sugar H1′ protons of c-kit2 T21 promoter (0.5mM in strands) in 100mM KCl, 5mM K-phosphate, 2H2O, pH 6.8 at 25°C is shown in Figure 6a. Due to spectral overlap, we have also prepared samples containing single guanines deuterated at the H8 position, so as to independently confirm specific guanine H8 assignments in crowded regions of the NOESY contour plot. Such assignments can be readily made following comparison of the expanded NOESY contour plot for undeuterated c-kit2 T21 promoter (Figure 6b), with its counterparts following deuteration of the H8 of G12 (Figure 6c), and the H8 of G14 (Figure 6d). Missing cross-peaks as a consequence of H8 deuteration are labeled with an ‘x’ in Figure 6c and d. We can trace the NOEs between the base protons and their own and 5′-flanking sugar H1′ protons and these connectivities are traced in Figure 6a.
Inosine for guanine substituted derivative oligonucleotides (at positions G4 and G16) were also investigated (specific expanded regions of NOESY contour plots shown in Figure 6e and f), with for example, I4 for G4 substitution, resulting in resolution of certain spectral overlaps in the NOESY contour plot in Figure S10 and were used for assignment purposes. Thus, we observe NOEs between H2 of I4 and H8 of G8 (peak a, Figure 6e) and G7 (peak b, Figure 6e), and between H2 of I16 and H8 of G20 (peak a, Figure 6f) and G19 (peak b, Figure 6f), as well as between the imino of I16 and H8 of G20 (peak c, Figure 6f) and G19 (peak d, Figure 6f).
Assignment of cytosines was achieved by incorporation of 5-methyl-cytosines at positions C11 (Figure 6g) and C5 (Figure 6h), which resulted in upfield shifts of the respective H6 protons (cross-peaks labeled in green for C11 and in red for C5 in Figure 6g and h).
The remaining sugar protons were assigned by through-bond COSY and TOCSY experiments, with proton chemical shifts listed in Table S2. We do not observe strong NOEs between base and sugar H1′ protons at short (50ms) mixing times, indicative of anti torsion angles (37) for all guanines for form-II of the c-kit2 T21 promoter.
We have assigned guanines to specific G-tetrads by monitoring NOEs between guanine imino and H8 protons around individual G-tetrad planes (Figure 7a) in the NOESY spectrum (mixing time 250ms) of the c-kit2 T21 promoter (1.1mM in strands) in 100mM KCl, 5mM K-phosphate, H2O, pH 6.8, 25°C (Figure 7a). In an effort to overcome assignment problems associated with near degeneracies in cross-peak overlaps, we have also utilized samples selectively deuterated at the H8 positions of individual guanines in the c-kit2 T21 promoter. Expanded NOESY contour plots are compared for the undeuterated sample (Figure 7b), as well as deuteration of H8 of G3 (Figure 7c), H8 of G14 (Figure 7d), and H8 of G19 (Figure 7e). Missing cross-peaks as a consequence of H8 deuteration are labeled with an ‘x’ in Figure 7c and d.
We observe an unusual pattern of reciprocal NOEs as exemplified by cross-peaks between the imino proton of G2 and the H8 proton of G6, as well as between the imino proton of G6 and the H8 proton of G2 (boxed cross-peaks in red in Figure 7a), consistent with formation of a G2•G6•G2•G6 tetrad (red, Figure 7f). Other pairs of reciprocal NOEs (Figure 7a) identify G3•G7•G3•G7 (green, Figure 7f), G4•G8•G4•G8 (blue, Figure 7f), G14•G18•G14•G18 (cyan, Figure 7f), G15•G19•G15•G19 (magenta, Figure 7f) and G16•G20•G16•G20 (orange, Figure 7f) tetrads. Such a pattern of six G-tetrads can only be consistent with an all-parallel-stranded dimeric G-quadruplex shown schematically in Figure 7g for the c-kit2 form-II G-quadruplex. Within each monomeric G-quadruplex component of the c-kit2 dimeric form-II G-quadruplex, G3•G7•G3•G7 and G15•G19•G15•G19 constitute the central G-tetrads (Figure 7g), consistent with the observed slow exchange of the imino protons at positions G3, G7, G15 and G19 (Figure 5d).
Further, in NOESY spectra of inosine-substituted oligonucleotides (I4 for G4 and I16 for G16), we observe cross-peaks I4(H2)/G8(H8) and I4(H2)/G7(H8) (peaks a and b, respectively, Figure 6e), as well as I16(H2)/G20(H8) and I16(H2)/G19(H8) (peaks a and b, respectively, Figure 6f), which independently corroborate features of the dimeric G-quadruplex core shown schematically on Figure 7g. All guanines adopt anti glycosidic torsion angles and the single-nucleotide first (C5) and third (A17) loops are of the double-chain-reversal type, with the central 5-nt C9-G10-C11-G12-A13 segment bridging the individual 5′ (or upper) and 3′ (or lower) G-quadruplex components in the context of the dimeric G-quadruplex.
Initial distance-restrained and subsequent intensity-restrained molecular dynamics calculations (see ‘Materials and methods’ section) of the solution structure of the c-kit2 T21 dimeric form-II G-quadruplex were guided by exchangeable and nonexchangeable proton restraints, with the numbers listed by category in Table 2. The ensemble of 10 refined superpositioned structures is shown in stereo in Figure 8a, with a representative refined structure in the same orientation shown in a ribbon representation in Figure 8b and a surface representation in Figure 8c. The ensemble of refined structures is well converged, exhibiting pairwise rmsd values in the 0.57 range for the pair of stacked G-tetrad cores (Table 2).
The backbone of the C9-G10-C11-G12-A13 linker sequence, which connects the 5′ (at position G8) and 3′ (at position G14) G-quadruplexes, adopts an inverse S-shaped fold (Figure 8d). A13 residues from partner strands form a trans A13•A13 non-canonical pair aligned through pairing of their Watson–Crick edges, with the 5′ and 3′ G-quadruplexes sandwiching this pair (Figure 8e and f).
The residues within the C9-G10-C11-G12-A13 linker sequence are well-defined, including stacked C11 and G12 bases, indicative of an ordered five-nucleotide linker segment (Figure 8d and stereo view in Figure S11; also stereo views of the refined structures in Figure S12a and b). The inter-residue NOE cross peaks that contribute to the fold of the C9-G10-C11-G12-A13 linker sequence are listed in Table S3, with an interpretation of the key NOEs listed as footnotes to Table S3.
The relative alignment of the A13•A13 non-canonical pair sandwiched between the G4•G8•G4•G8 and G14•G18•G4•G18 tetrads (Figure 7g) is defined by a set of inter-residue NOE cross-peaks listed in Table S4, with an interpretation of the key NOEs listed as footnotes to Table S3.
Single residue loops C5 and A17 are positioned with respect to each other by approx 90o rotation around the helical axis (colored in purple in stereo views of the refined structures in Figure S12a and b), such that residue A17 becomes a part of the cluster of C9-G10-C11-G12-A13 linker residues. The individual grooves of the 5′ and 3′ G-quadruplexes, that are separated by the A13•A13 non-canonical pair, align to form one continuous groove in the structure of the dimeric G-quadruplex (represented by a pair of dashed lines in Figure 8e).
We observe small chemical shift changes for a subset of imino protons in the c-kit2 promoter dimeric form-II G-quadruplex spectrum on standing over period of many months in the refrigerator (Supplementary Figure S13a changes over time to Figure S13b). The signature of a c-kit2 promoter dimeric form-II G-quadruplex is retained over time since we still observe four well-dispersed imino protons (G2, G14, G6 and G21) between 11.5 and 12.0ppm (Figure S12b). The small time-dependent chemical shift changes are observed for several guanine imino protons, amongst those include G4, G8, G14 and G18, the very guanines located at the interface between 5′ and 3′ quadruplexes (Figures S13a and S13vb). These results imply the existence of an as-yet undefined time-dependent conformational stabilization at the interface between the two halves of the dimeric form-II G-quadruplex.
Our studies establish that the c-kit2 promoter T12/T21 sequence folds to form an intramolecular monomeric all-parallel stranded form-I G-quadruplex in 20mM KCl-containing buffer solution (Figures 3d and and4b),4b), while the c-kit2 promoter T21 sequence folds to form a dimeric all-parallel stranded form-II G-quadruplex in 100mM KCl-containing buffer solution (Figures 7g and and8b).8b). Further, our accumulated experimental observations are that the c-kit2 promoter T21 and T12/T21 sequences switch over time from the form-I monomeric G-quadruplex to the form-II dimeric G-quadruplex in KCl-containing solution, establishing that the form-II dimeric G-quadruplex is the more thermodynamically stable conformer in solution (Figures S3 and S4).
The oligomerization states of the c-kit2 promoter as a function of K+ cation concentration has been confirmed from gel shift studies. The 18-nt J192T1 sequence which forms a monomeric G-quadruplex (Phan,A.T. unpublished data) was used as a monomer control while the 16-nt 93del (43) and the 16-nt J19 (Phan,A.T. unpublished data) sequences, which form dimeric G-quadruplexes, were used as dimer controls in the gel shift studies (Figure 1e).
A sample of c-kit2 T21/T12 promoter prepared in 20mM KCl-containing solution exhibited an imino proton NMR spectrum consisting of a monomeric form-I G-quadruplex. This was also reflected in the gel shift pattern, which consists predominantly of the monomer G-quadruplex band (lane labeled form-I, Figure 1e). By contrast, a sample of c-kit2 T21 promoter prepared in 100mM KCl-containing solution exhibited imino NMR spectra consistent with formation of a dimeric form-II G-quadruplex. This is also reflected in the gel shift pattern, which consists predominantly of the dimeric G-quadruplex band (lane labeled form-II, Figure 1e).
All parallel-stranded intramolecular G-quadruplexes were first reported in solution from NMR studies of the d(GGA)2G sequence, which formed a dimeric G-quadruplex containing a stacked G•G•G•G tetrad and A•(G•G•G•G)•A hexad (44). NMR studies have also investigated the d(GGA)4 sequence, which forms an all-parallel-stranded G-quadruplex formed by stacked tetrads and heptads (45–47). Soon after, an all-parallel-stranded intramoleuclar G-quadruplex was reported for the crystal structure of the four-repeat TTAGGG human telomere from crystals grown in K+-containing solution (39).
The next demonstration of all-parallel stranded intramolecular G-quadruplex formation emerged from studies of the folding topology of the nuclease hypersensitivity element III of the human c-myc promoter containing six-guanine tracts (48,49). NMR-based studies of sequences containing combinations of four- of the six-guanine tracts established that the c-myc-2345 (containing guanine tracts 2, 3, 4 and 5) (50,51) and c-myc-1245 (50) both formed intramolecular all-parallel-stranded G-quadruplexes in K+-containing solution. Notably, two of the three double-chain-reversal loops in both G-quadruplexes involved a single nucleotide bridging three G-tetrad planes.
Elements of an all-parallel-stranded G-quadruplex topology containing three double-chain-reversal loops were also observed for the NMR-based structures of the five-guanine tract c-myc-23456 sequence (52) and the four-guanine tract c-kit1 sequence (24), both in K+-containing solution. Both structures adopt unique G-quadruplex folding topologies where one of the columns is restricted to a G-G rather than a G-G-G step, and both contain snap-back (3′-G for c-myc-23456 and 3′-G-G for c-kit1) elements completing G-tetrad formation (24,52).
CD studies have verified the structural conclusions of all parallel-stranded topologies for form-I monomeric and form-II dimeric G-quadruplexes adopted by the c-kit2 promoter as a function of K+ concentration. The CD spectrum for the form-I monomeric (c-kit2 T12/T21 promoter in 20mM KCl, green curve, Figure S14) and form-II dimeric (c-kit2 T21 promoter in 100mM KCL, red curve, Supplementary Figure S14) exhibit a negative peak at 240nm and a positive peak at 260nm, characteristic of parallel-stranded G-quadruplex formation (53).
The c-kit2 T12/T21 promoter monomeric form-I G-quadruplex adopts a standard all-parallel-stranded G-quadruplex topology involving three stacked G-tetrads, four G-G-G columns, and three double-chain reversal loops (Figure 3d). The single bases in both the first (C5) and third (A17) double-chain-reversal loops are well defined (stereo pairs of refined structures in Figure S7a and c), as is the five-nucleotide C9-G10-C11-T12-A13 linker associated with the second double-chain reversal loop (Figure 4e and stereo pair of refined structures in Figure S7b). This could reflect in part the unprecedented stacking between C5 of the first double-chain-reversal loop with C9 of the second double-chain-reversal loop (Figure 4d and stereo pair of refined structures in Figure S7a), an unique feature of cross-talk between adjacent loop residues reported here for the first time.
There is no precedence for the topology of the c-kit2 promoter dimeric form-II G-quadruplex presented in this paper, which is distinctly different from all other dimeric G-quadruplexes reported in the literature (43–46,54–56). The two monomeric units in the c-kit2 form-II G-quadruplex adopt a parallel orientation, namely tail-to-head alignment (Figure 7g), in sharp contrast to all previously reported dimeric parallel-stranded G-quadruplexes, where the orientation is anti-parallel, namely head-to-head, with respect to the alignment of the two monomeric units (Figure S15a–e).
The novelty of this structure resides in its in-register self-assembly of two 5′-guanine tracts (G2-G4 and G6-G8) and two 3′-guanine tracts (G14-G16 and G18-G20) to form an all-parallel-stranded dimeric G-quadruplex scaffold (Figures 7g and and8b).8b). The 5′ and 3′ G-quadruplexes sandwich a trans A13•A13 non-canonical pair (Figure 8e and f), with the five-residue C9-G10-C11-G12-A13 linker segment adopting an inverted S-shaped fold, stabilized by stacking interactions (Figure 8d and stereo pair in Figure S12a and b).
The inverted S-shaped fold of the C9-G10-C11-G12-A13 linker segment results in base-sugar stacking interactions as reflected in upfield shifts of the sugar H1′ protons of C9 (5.20ppm), G10 (5.47ppm), C11 (5.19ppm), G12 (5.46ppm) and A13 (5.73ppm) within the c-kit2 promoter dimeric form-II G-quadruplex (Table S2 and Figure 6a). Several of these sugar H1′ protons are stacked over guanine rings of the linker segment and/or terminal G-tetrads (sugar H1′ protons are colored as blue balls in the stereo pair in Figure S11), thereby accounting for their upfield shifts. By contrast, such upfield shifts were not observed for the corresponding sugar H1′ protons of the c-kit2 promoter monomeric form-I G-quadruplex (Table S1 and Figure 2f).
A pair of phosphates (C9 and C11) are positioned in close proximity within the 5-nt linker segment (Figure 8d) connecting the 5′ and 3′ G-quadruplexes of the c-kit2 promoter dimeric form-II G-quadruplex. It is conceivable that monovalent cations bridge these phosphates under higher KCl (100mM)-containing buffer conditions and stabilize the inverted S-shaped fold of the C9-G10-C11-G12-A13 linker segment.
The 5-nt linker segment and A13•A13 non-canonical pair between the 5′- and 3′ G-quadruplexes is likely to constitute a dynamic interface in the dimeric form-II G-quadruplex. This is reflected in broadening of aromatic proton-sugar H1′ proton cross-peaks for residues C9-G10-C11-G12 (Figures 6a and S10), as well as in the absence of protection for protons of the G4•G8•G4•G8 and G14•G18•G14•G18 junctional G-tetrads during proton for deuterium exchange (Figure 5d).
We have monitored the impact of addition of flanking sequences to either end of the c-kit2 promoter sequence on dimeric form-II G-quadruplex formation. The imino proton spectra of the CC(c-kit2 T21)AG and TT(c-kit2 T21)TT sequences in 100mM KCl-containing solution, where 2-nucleosides were added to the 5′- and 3′-ends, are plotted in Figure S16a and S16b, respectively). Each sequence forms a dimer under non-denaturing polyacrylamide gel electrophoresis conditions (panel 2 for CC(c-kit2 T21)AG and panel 3 for TT(c-kit2 T21)TT, Figure S16c). This implies that the addition of flanking sequences appear not to impact on c-kit2 dimeric form-II G-quadruplex formation.
Genome-wide analysis of recombination prone regions (57) and a fine-scale map of recombination rates and hot spots across the human genome (58), have shown that they can occur within guanine-rich segments of DNA sequence space. We postulate that all-parallel-stranded monomeric G-quadruplexes embedded within duplex segments (Figure S17a) can undergo synapsis, following strand exchange, to form the all-parallel-stranded dimeric G-quadruplex shown in Figure S17b. The folding topologies of the monomeric and dimeric G-quadruplexes shown in Figure S17a and b are those reported in this paper for the monomeric form-I (Figure 3d) and dimeric form-II (Figure 7g) G-quadruplex folds of the c-kit2 guanine-rich sequence. Following further strand exchange, the monomeric G-quadruplexes could be regenerated in a recombined manner (Figure S17c). Our postulate implies that two strands running in a parallel direction could undergo recombination after cleavage (designated by crosses at A•A non-canonical base pair, Figure S17b), rotating by 180o and subsequent rejoining mediated by an as-yet undefined nuclease-topoisomerase enzyme complex.
Such strand exchange processes as shown in Figure S17 build on an earlier postulated model of intergenic recombination involving parallel-stranded DNA alignments (59) and self-recognition of guanine-rich motifs in meiotic prophase involving parallel-G-quadruplexes (60).
During the writing of this paper, a related manuscript on the solution structure of the c-kit2 promoter G-quadruplex was reported in the literature (61). This study also undertook NMR studies on the c-kit2 T21 promoter sequence in K+-containing buffer (70mM KCl, 20mM potassium phosphate, pH 7), on a sample exhibiting imino proton NMR spectra characteristic of a mixture of at least two conformers, with the major conformer under study exhibiting an imino proton NMR spectral pattern (namely, imino protons of G2, G14, G6 and G7 between 11.5 and 12.0ppm) similar to our c-kit2 T21 promoter form-II spectrum in 100mM K+-containing buffer shown in Figures 1d and and5a.5a. The authors of this paper interpreted their NMR data to conclude that the major conformer in their spectral mixture of conformers adopts a monomeric all-parallel-stranded G-quadruplex (61).
This interpretation (61) is at variance with our conclusion that c-kit2 T21 promoter form-II adopts a dimeric all-parallel-stranded G-quadruplex (Figure 7g). Our study of the c-kit2 T21 promoter form-II G-quadruplex has the advantage that the NMR spectrum in Figure 5a correspond to a single species and the analysis is not complicated by cross-peaks from other species. Further, we have identified a monomeric all-parallel-stranded G-quadruplex (Figure 3d), but this c-kit2 T12/T21 promoter form-I G-quadruplex exhibits a very different imino proton NMR spectrum (recorded in 20mM KCl-containing buffer, Figure 2a), from that of the c-kit2 promoter T21 form-II G-quadruplex (recorded in 100mM KCl-containing buffer, Figure 5a). Our study of the c-kit2 promoter T12/T21 form-I G-quadruplex also has the advantage that the NMR spectrum in Figure 2a corresponds to a single species, thus facilitating assignment and structure determination.
We remain confident about our conclusions given that the NMR studies reported in our paper were intentionally undertaken under single conformer conditions for both c-kit2 promoter monomeric form-I and dimeric form-II G-quadruplexes, thereby avoiding potential complications over data interpretation from minor conformations.
The coordinates of the c-kit2 T12/T21 promoter form-I (accession code: 2KYP) and c-kit2 T21 promoter form-II (accession code: 2KYO) G-quadruplexes have been deposited in the protein data bank.
Supplementary Data are available at NAR Online.
National Institutes of Health Grant GM34504 (to D.J.P.); the NY Structural Biology Center supported by NIH grant GM66354 (to D.J.P.); and the Singapore Ministry of Education (grants ARC30/07 and RG62/07 to A.T.P.). Funding for open access charge: National Institutes of Health Grant GM34504 (to D.J.P.).
Conflict of interest statement. None declared.