|Home | About | Journals | Submit | Contact Us | Français|
We report here a direct and nondestructive method that can be utilized to unambiguously determine the folding structure of an I-motif DNA secondary structure formed from a native, nonmutated DNA sequence. The I-motif DNA secondary structure is a four-stranded structure consisting of parallel-stranded DNA duplexes zipped together in an antiparallel orientation by intercalated, hemiprotonated cytosine+–cytosine base pairs1 (Figure 1A). Since its first report in 1993, the biological role of I-motif structures as well as their potential in nanotechnological applications have been extensively explored.2,3 For its nanotechnological applications, research revealed that non-DNA based C-rich sequences can form I-motifs as well. Different modifications or inclusions of RNA residues, locked nucleic acids, 2′-fluoro substitutions, and peptide nucleic acids can be incorporated into the I-motif structures.4–7 The biological significance of I-motif structures is still being heavily investigated. Oligonucleotide fragments using the sequences from naturally occurring C-rich strands in the genome of human and other species have been shown to form intramolecular and intermolecular I-motifs in Vitro, such as the Tetrahymena thermophila and human telomeric repeats, centromeric sequences, the human insulin minisatellite, the fragile X repeat, and oncogene promoters.8–12 In particular, polyguanine/polycytosine (polyG/polyC) tracts have recently been demonstrated to be highly prevalent in human proximal promoter regions, especially those of oncogenes that are related with growth and proliferation.13–17 These G/C-rich promoters are highly dynamic in their structures and are found to associate with nuclease hypersensitive elements (NHEs). Under superhelicity conditions, these G/C-rich regions can form alternative conformations different from the typical B-DNA structure.18 While the G-rich strands can form DNA G-quadruplex structures, which have been demonstrated in a number of oncogene promoters and are suggested to function as regulatory elements in gene transcription, the complementary C-rich strands have the potential to form I-motif structures which may also be associated with transcriptional regulation.13,14,19
Nuclear magnetic resonance (NMR) is a major tool for structural studies of I-motifs.1,10–12,20 In contrast to the number of G-quadruplex structures in the public domain, the molecular structures of I-motifs are more limited. Similarly to DNA G-quadruplexes, I-motif structures can be formed by one, two, or four strand(s). A tetrameric I-motif formed by short oligonucleotide strands appears to be more straightforward for NMR structural characterization, because the NMR spectra are simpler due to the symmetry of the system and thus the smaller number of resonances observed in the equivalence strands. For the more biologically relevant unimolecular (intramolecular) I-motif structure formed by single-stranded DNA, the lack of symmetry of the structure, the resonance overlapping, and possible conformational exchange make the NMR spectral assignment much more challenging.1,10–12,20 This is similar to G-quadruplexes; however, the NMR spectral analyses of the unimolecular I-motifs could be more perplexing. While the DNA G-quadruplex is a four-stranded structure of stacked G-tetrads, each G-tetrad is formed by four cyclically H-bonded guanine residues in the same plane and therefore the four guanines involved in one tetrad are connected and exhibit specific NOE connections with each other. In contrast, an I-motif structure consists of the intercalated hemiprotonated cytosine+–cytosine base pairs (Figure 1A). The intercalated cytosine+–cytosine base pairs are from two parallel duplexes and are not connected with each other and, thus, are more difficult to be spectrally assigned by NMR. In addition, not all cytosines of a C-run can be used simultaneously in C+–C hemiprotonated base-pair formation, and the base-pairing partnership between cytosines of the two parallel strands can be more variable. Thus the degree of sequence redundancy is higher in the C-rich sequence for the formation of an I-motif than in the G-rich sequence for the formation of G-quadruplex, which makes it more difficult to determine even the folding structure of an I-motif without full NMR spectral assignment and NOE connections.
To date, all structural studies of I-motifs follow the full proton NMR assignment process for nucleic acids,1,10–12,20 which requires a complete set of COSY, TOCSY, and NOESY spectra, and sometimes further help from 1H–31P correlation spectroscopy and HMBC at natural abundance.21 To determine the folding topology, the characteristic proton connectivities for I-motif structures observed in NOESY spectra are used. The possible C+–C pair combination can be deduced based on the specific short inter-residue 1H–1H distances corresponding to the characteristic intercalation topology, such as those for imino–imino, H1′–H1′, H1′–H2′/H2″, amino-H2′/H2″, H1′–H4′, and H4′–H4′. Due to a more severe NMR spectral overlapping and a larger number of C+–C base pairs in I-motif structures, chemical substitutions are always needed to deconvolute the NMR spectral assignment, assuming the substitutions do not cause structural alteration. These chemical substitutions are used to break the quasi-symmetry and provide a marker readily identifiable from other overlapping peaks. 5-Methyl-C (5mC) (Figure 1A) is the most commonly used for cytosines in the cytosine+–cytosine base pairs. However, the incorporation of 5mC for cytosine can potentially affect the I-motif structures (see below). Here we report a direct method to unambiguously determine the folding structure of an intramolecular I-motif. This affordable method makes use of the site-specific low enrichment (6%) of uniform 15N-labeled nucleotides, which is nondestructive and can be used in a native, nonmutated DNA sequence that forms the I-motif structure. Furthermore, this method can also be applied to unambiguously determine multiple equilibrating I-motif structures coexisting in a sequence.
We use the promoter sequence of the c-myc oncogene for the reported method. This sequence is used because DNA secondary structures formed in the c-myc NHE III1 have been extensively studied.13,15,22 c-myc is the most commonly overexpressed gene in human cancers whose proximal promoter region contains an NHE III1 element, which can form DNA secondary structures that regulate 75–85% of the total transcription activity. The NHE III1 element comprises five consecutive runs of guanines on one strand and cytosines on the other strand. The major G-quadruplex is formed within the 3′- four G-runs whose molecular structure has been determined by NMR.22 The C-rich strand in the c-MYC NHE III1 element has been shown to form I-motif structures at near neutral pH.13 We have identified a sequence of the C-rich strand containing the modified 5′- four C-runs, complementary to the G-rich strand that forms the major c-myc G-quadruplex (Py23, Figure 1B). The one-dimensional 1H NMR spectrum of the Py23 is shown in Figure 1C left. The well-resolved imino proton resonances located at 15–16 ppm indicate the formation of stable I-motif structure(s).1,20 The sharp NMR spectral line widths indicate the I-motif structure(s) is of intramolecular monomeric nature. The monomeric nature of the I-motif structure(s) was confirmed by variable temperature studies by NMR and CD, in which Py23 shows a concentration-independent melting temperature (data not shown).
An essential feature for the I-motif structure is that one cytosine of every C+–C base pair is protonated at N3, and this proton is shared by the two base-paired cytosines. The imino proton resonances of hemiprotonated C+–C base pairs located at 15–16 ppm are characteristic of I-motif structures. The amino protons of the base-paired cytosines are found at approximately 9.5 and 8.4 ppm. The distinct chemical shifts at 15–16 ppm for the I-motif cytosine imino protons result from the downfield shifting by a combination of hydrogen bonding and a positively charged protonation site. Using site-specific low-concentration (6%) incorporation of a 15N-labeled cytosine nucleoside, the imino protons of cytosine residues can be unambiguously assigned. This is similar to the incorporation of 15N-labeled guanines used for G-quadruplex structures;23–26 however, for the I-motif structure, the hemiprotonated C+–C base pairs can be directly determined by this method. For example, each cytosine of the sequence 5′-CTTTCCTAC-CCTCCCTACCCTAA (Py23) (Figure 1B) is 6% labeled by 15N-cytosine one at a time. As the two cytosines forming a hemiprotonated C+–C base pair share one imino H3 hydrogen, the imino proton involved in the hydrogen bonding distributes equally between the two base-paired cytosines (Figure 1A left). The imino proton in a C+–C base pair has one-bond coupling to the N3 atoms of both cytosines and can be readily detected by 1D 15N-filtered HMQC experiments.27 Through the 1H–15N one bond coupling, this proton will be detected as the same 1H resonance in the 1D HMQC experiments for the two DNA samples site specifically 15N labeled at each base-paired cytosine. The assignment of a C+–C base pair in Py23 is shown as an example in Figure 2A. The site-specific substitution of 15N-labeled cytosine at the C21 and C12 positions, respectively, gives rise to a peak with the same cytosine imino proton resonance at 15.65 ppm, indicating C21 and C12 form a hemiprotonated C+–C base pair and shared the same imino proton. This leads to the direct identification of the C+–C base pair between C21 and C12 (Figure 2C). In comparison, the employment of chemical substitution with 5mC destabilizes the I-motif structure (Figure 1C right and Figure S1), while the method of low-enriched site-specific labeling is direct, nondestructive, and affordable. The assignment of each cytosine imino proton involved in the hemiprotonated C+–C base pairs enables the identification of each partner C+–C base pair involved in an I-motif structure and thus the determination of the folding topology. For an I-motif structure, once the folding topology is determined, the molecular structure of the I-motif core can be reasonably calculated by computer modeling because of the nature of such a structure. For example, the neighboring strands are always antiparallel and connected by lateral loops, while the cytosines of the #n C-run are always base pairing with the cytosines of the #(n ± 2) C-run, and the widths of the adjacent grooves are always alternating between wide and narrow.
More significantly, this method can directly detect the equilibrating multiconformations of I-motif structures coexisting in a DNA sequence (Figure 2B). The coexisting multiple conformations are very difficult to directly determine by conventional assignment strategies using homonuclear 2D spectra only. Furthermore, the chemical substitution with 5mC often associated with conventional assignment strategies may destabilize a specific I-motif structure or shift the equilibrium between multiple conformations. As shown in Figure 2B, the C22-labeled DNA sequence Py23 (Figure 1B) shows two imino peaks at 15.5 ppm and 15.45 ppm in the 15N-filtered experiment, indicating that C22 is involved in two different conformations. Each of the C22 imino peaks corresponds to the single imino peak arising from the C13- (15.5 ppm) and C14- (15.45 ppm) labeled Py23, indicating that C22 is base pairing with C13 in one conformation and with C14 in another, respectively. This unexpected result directly indicates that the Py23 sequence forms two stable I-motif conformations in slow equilibrium on the NMR time scale (ms), as they have sharp and well-resolved NMR peaks (Figure 2B).
This method can also be applied to the direct assignment of imino protons of thymine residues that are involved in H-bonding interactions in an I-motif structure. The conformation of thymines in the loop regions and the flanking regions are essential to the structure and stability of an I-motif structure; in particular, for a specific I-motif structure, specific thymines are found to be involved in different capping structures when they are hydrogen bonded with other residues. Only the imino protons from the hydrogen-bonded thymines are clearly observable in NMR at temperatures above 0 °C. In a unimolecular I-motif structure, it is very challenging to assign the multiple thymine residues using the conventional assignment strategy. The low-abundance (6%) site-specific labeling of thymines can be used to solve this problem and determine the thymine residues involved in the H-bonded capping structures. As shown in Figure 2D, using the DNA sequence 5′-CTTTCCTAC-CCTCCCTACCCTAA-3′, Py23(C14T), the resonance of imino protons of T23 was unambiguously assigned based on the one-bond 15N–1H coupling in the 15N-filtered experiments.
In summary, the reported approach using site-specific low-enrichment 15N-labeled cytosine provides a direct and unambiguous determination of the hemiprotonated C+–C base pairs in an I-motif structure by NMR with affordability. This direct detection of the C+–C base pairs can unambiguously determine the folding topology of a unimolecular I-motif structure. Because the C-rich strand possesses an inherent sequence redundancy in the formation of unimolecular I-motif structures, the unambiguous determination of the folding topology of I-motif has been a challenging and arduous task using conventional NMR spectral assignment strategies. More significantly, the reported method can directly and unambiguously determine the equilibrating multiple I-motif conformations coexisting in a single DNA sequence, which would be a very difficult task using the conventional assignment strategy. This method can also be applied to the direct detection of the H-bonded thymines that are involved in the capping structures. The reported method is direct and easy to use and can provide direct folding topology and specific capping structure information. In addition, this method can aid the full spectral assignment for the complete NMR structure determination; e.g., the assignment of the base H5 and H6 protons can be obtained by long-range connections with the imino protons. The direct assignment of the cytosine and thymine imino protons can also provide important internucleotide NOEs of the stacking cytosines and capping thymines for NMR structure determination. Furthermore, the approach can be applicable to I-motif structures involving non-DNA residues and provides a direct and affordable method to tackle related structure problems.
This research was supported by the National Institutes of Health (1S10 RR16659, CA122952, and CA94166) and the Arizona Biomedical Research Commission (0014). We acknowledge Tiffanie Bialis, who assisted with the NMR experiments. We thank Megan Carver for proofreading the paper.