|Home | About | Journals | Submit | Contact Us | Français|
The nucleic-acid bases carry structural and energetic signatures that contribute to the unique features of genetic sequences. Here we review the connection between the chemical structure of the constituent nucleotides and the polymeric properties of DNA. The sequence-dependent accumulation of charge on the major- and minor-groove edges of the Watson-Crick base pairs, obtained from ab initio calculations, presents unique motifs for direct sequence recognition. The optimization of base interactions generates a propellering of base-pair planes of the same handedness as that found in high-resolution double-helical structures. The optimized base pairs also deform along conformational pathways, i.e., normal modes, of the same type induced by the binding of proteins. Empirical energy computations that incorporate the properties of the base pairs account satisfactorily for general features of the next level of double-helical structure, but miss key sequence-dependent differences in dimeric structure and deformability. The latter discrepancies appear to reflect factors other than intrinsic base-pair structure.
The energetic and structural information encoded in the DNA base-pair sequence has direct bearing on the overall shape and stability of the double-helical molecule. The basis of this code lies in the electronic characteristics of the heterocyclic bases — adenine (A), thymine (T), guanine (G), and cytosine (C) —and their hydrogen-bonded Watson-Crick complexes (Figure 1). Although much of the net negative charge per nucleotide is concentrated on the phosphate group, there is a build-up of appreciable fractional charge on certain atoms of the heterocyclic bases. This article highlights a number of factors — partial atomic charges, intrinsic structures, binding energies, dipole moments, electrostatic potential surfaces, and rigid-body motions — that distinguish the two Watson-Crick base pairs and draws connections between these local chemical features and the higher-order structural information that is used to recognize and process the genetic information stored in long base-pair sequences. The capability to link basic chemical information with the polymeric properties of DNA rests upon technical advances that make it possible (i) to determine, with high accuracy, the electronic structures of hydrogen-bonded base complexes (Frisch et al. 2001; Frisch et al. 2003), (ii) to represent the electrostatic potential of the base pairs at high resolution (Boschitsch et al. 2002; Boschitsch and Fenley 2004), and (iii) to compare these chemical features with relevant spatial information (Lu and Olson 2003; Lu and Olson 2008) in the growing database of high-resolution nucleic-acid structures (Berman et al. 1992) and with classic physical measurements.
Importantly, the distributions of electronic charge on the Watson-Crick base pairs differ from those on the free bases (see the Supplementary Materials for a complete list of the atomic charges of base and base-pair atoms obtained within the Gaussian 98 and Gaussian 03 suites of programs (Frisch et al. 2001; Frisch et al. 2003)). The characteristic accumulation of electronic charge on the exposed major- and minor-groove edges of the base pairs presents unique motifs for direct sequence recognition (Seeman et al. 1976). For example, the much larger negative charge on the C5 atom of cytosine compared to the corresponding atom of thymine, e.g., −0.69 vs. −0.22 esu in the Watson-Crick pairs, influences the interactions of charged ligands with the major-groove edges of the two pyrimidines (see below).
These and other subtleties in local electronic structure surface at the base-pair level in terms of distinctly different A·T and G·C dipoles, with the computed magnitude of the G·C dipole more than double that of the A·T pair (6.6. vs. 2.4 Debye). Confidence in these predictions is strengthened by the agreement between the computed dipole moment of free adenine (3.0 Debye) and the measured dipole moment (3.0±0.2 Debye) of 9-n-butyl-adenine in CCl4 (DeVoe and Tinoco Jr. 1962). On the other hand, the predicted dipole moment of guanine (7.2 Debye) exceeds the reported value (5.5 Debye) (Párkányi et al. 2002). The latter measurement, performed in dioxane, is not directly relevant to the computations in that the polar solvent may form hydrogen bonds with the base and thereby perturb its electronic structure. No such interactions occur in the non-polar CCl4 solvent. The orientations and directions of the computed dipole moments, depicted in Figures 2 and and33 by black arrows, point to the unique electronic character of the free bases as well as the base pairs.
Conventional wisdom attributes the (negative) propeller twisting of the Watson-Crick base pairs seen in high-resolution DNA structures to a base-stacking effect, i.e., the out-of-plane rotation of complementary bases about the long base-pair axis seemingly enhances stacking overlaps with bases in adjacent residues (Levitt 1978); see illustrative image of propeller-twisted base-pair planes in Figure 4. Calculations, however, show that negative propeller twist is intrinsic to isolated, unstacked A·T and G·C pairs and is a direct consequence of the pyramidal geometry of the proton-donating, exocyclic amino groups — an idea anticipated in earlier research (Komarov and Polozov 1990).
The displacement of the amino hydrogens out of the starting plane introduces non-zero values of propeller twist while concomitantly lowering the binding energy. For example, the A·T pair takes on negative propeller values and the energy drops by −1.2 kcal/mole if the hydrogens attached to N6 of adenine fall below the starting base-pair plane. The hydrogens attached to a negatively propellered G·C pair, by contrast, lie above the starting plane owing to the different placement of the amino group on the G·C pair. The energy decrease in the propeller-twisted G·C pair, −1.0 kcal/mole, is not quite as pronounced as that of the similarly deformed A·T pair.
The degree of nonplanarity in the derived base pairs, reported in Table 1 in terms of the exocyclic torsion angles described by relevant heavy atoms, agrees fairly well with the mean values found in ultra-high resolution structures of double-helical B-DNA, although the signs of rotation in the lowest-energy base-paired structures differ from the X-ray observations. The computed out-of-plane displacement of the exocyclic nitrogens in free, unpaired bases typically exceeds that in the paired bases, but there are no crystal data to test this prediction. The predicted displacement of nitrogen, however, is intermediate between that detected in neutron-diffraction studies of related compounds (McMullan et al. 1980; Weber et al. 1980; Klooster et al. 1991) and that deduced from measurements of the direction of the infrared transition moments of adenine and cytosine (Dong and Miller 2002; Choi et al. 2005) (see Supplementary Materials for numerical values).
The predicted propeller twisting of the Watson-Crick base pairs, however, is less pronounced than that found in well-resolved X-ray structures, particularly for A·T pairs (Table 1). In addition to the standard rationalization of DNA melting properties in terms of the number of hydrogen bonds in A·T vs. G·C base pairs, the calculations suggest that there may be a built-in, sequence-dependent pressure for AT-rich DNA to disassemble more easily than GC-rich DNA. That is, the intrinsic structure of the free A·T base pair appears to be more significantly compromised than that of the free G·C pair in duplex DNA. On the other hand, the conformational stiffness of so-called DNA A-tracts, i.e., short stretches of 4–6 consecutive A·T base pairs, is often attributed to the bifurcated hydrogen bonding made possible by stacks of highly propeller-twisted A·T pairs (Nelson et al. 1987; Chan et al. 1993). Such hydrogen-bond stabilization occurs as well in overtwisted CA·TG dimer steps (Timsit 1999), which are known to be highly deformable (Olson et al. 1998).
The energy-optimized Watson-Crick base pairs show many of the same subtle, sequence-dependent conformational features found in high-resolution DNA structures (Table 1). For example, the G·C pair opens in the opposite sense from the A·T pair, concomitantly shortening, straightening, and presumably strengthening the G(O6)···C(H41)–C(N4) interaction compared to the A(N6)–A(H61)···T(O4) hydrogen bond formed at the corresponding major-groove location on A·T. The former interaction also appears to be stronger, in terms of its more nearly ideal geometry, than the G(N2)–G(H21)···C(O2) hydrogen bond located on the minor-groove side of the G·C pair. The stronger hydrogen bond contributes, in turn, to the strong G·C dipole noted above.
The predicted base-pair structures account as well for gas-phase measurements. For example, the angle between the planes of paired adenine and thymine, which can be extracted from the buckle and propeller angles, is remarkably similar to the dihedral angle estimated from the measured moments of inertia of 2-aminopyridine·2-pyridone, an analog of the A·T base pair, in the gas phase, i.e., 6.7° predicted vs. 6.1° found experimentally (Roscioli and Pratt 2003). The predicted length of the A(N1)···T(H3)–T(N3) hydrogen bond also closely matches the value found in the same experiment, i.e., 2.87 Å computed vs. 2.898 Å observed. The measured value of the A(N6)–A(H6)···T(O4) interaction, however, is less than that determined here, i.e., 2.81 Å observed vs. 2.99 Å computed (Roscioli and Pratt 2003).
The global motions of the base pairs, deduced from the low-frequency normal modes of the energy-minimized structures, also mimic the pattern of rigid-body variation observed in well-resolved structures (see Supplementary Materials). That is, the buckling and propeller-twisting motions, seen to dominate the deformations of Watson-Crick base pairs in high-resolution protein-DNA structures, are of lower frequency (energy) than all other normal modes. The predicted movements of base pairs are described in Figure 4 by a color-coded spectrum of normal modes dominated by fluctuations of one of the six complementary base-pair parameters. The least costly (lowest-frequency) modes are softer than the torsional modes of the C1′ methyl groups (with frequencies of ~50 cm−1). The stiffest rigid-body modes are comparable in frequency to the out-of-plane bending modes of individual bases (data not shown).
The buckling and propeller-twisting motions do not appreciably distort the hydrogen bonding of complementary bases and are relatively insensitive to base-pair composition. By contrast, the rigid-body deformations via more costly modes (opening, shear, or stretch), which have a greater effect on hydrogen-bond geometry, differ in A·T vs. G·C pairs. The A·T pair deforms via stretch or opening much more easily than the G·C pair in both solid-state examples and the computed model, presumably reflecting the cost of distorting three G·C vs. two A·T hydrogen bonds. Changes in shear are easier, however, for G·C than A·T, possibly taking advantage of shared/bifurcated hydrogen-bonding that is possible in G·C but not A·T.
Images generated by color-coding biomolecular surfaces according to the electrostatic potential have become extremely valuable in obtaining qualitative information about preferred ligand (metal, drug, protein) binding sites and the degree of electrostatic complementarity between molecules involved in recognition and association processes (Honig and Nicholls 1995). Such is the case here for the isolated DNA bases and the Watson-Crick base pairs, which although electronically neutral, have distinct, sequence-dependent electrostatic fingerprints (Figures 2, ,3).3). The distribution of hydrogen-bonding donor and acceptor atoms on the edges of the heterocyclic species gives rise to well-defined patterns of positive and negative electrostatic potential and confirm many of the general trends suggested by the very first calculations of the electrostatic potential from approximate atomic charges (Pullman et al. 1979; Lavery and Pullman 1981; Weiner et al. 1982).
The absolute potentials of selected donor and acceptor atoms on the isolated bases and base pairs immersed in a simulated aqueous medium (see Supplementary Materials) are fairly small because of the null net charges on these chemical species. Literature values of the electrostatic potentials (Pullman and Pullman 1981; Weiner et al. 1982; Fogolari et al. 2002; Hud and Plavec 2003), based on various distributions of molecular charge, are typically computed with a single low dielectric constant and thus of much greater magnitude. Published potential surfaces, based on two-dielectric Poisson-Boltzmann modeling of the DNA duplex and surrounding aqueous solvent, are of greater magnitude but of lower resolution than the current images. The earlier, more approximate surfaces, however, do not lend themselves to the quantitative descriptions of atomic surface potentials reported here.
The images and numerical data clearly show that (i) the minor-groove edge of the A·T pair is more electronegative than the major-groove edge, (ii) the major-groove side of guanine is more electronegative than the minor-groove side, and (iii) the major-groove edge of cytosine is highly electropositive. The negative electrostatic potential of the A·T minor groove presumably underlies the preferential binding of ions (Hud and Polak 2001) and small cationic molecules, such as netropsin and distamycin (Kopka et al. 1985), and the specificity of cationic amino-acid side groups, such as those in the bacteriophage 434 repressor protein (Mauro and Koudelka 2004), for such sites. The deeper major-groove potential of the G·C pair seemingly accounts for the well-known binding of small cations with the N7 and O6 atoms on the major-groove edge of guanine (McFail-Isom et al. 1999; Auffinger and Westhof 2000; Hud and Polak 2001; Egli 2002; Subirana and Soler-López 2003). The positive potential on cytosine forms the basis for the many close contacts of small inorganic anions with the C(N4) atom in nucleic-acid structures (Auffinger et al. 2004) and the frequent participation of the C(C5) atom in ‘weak’ C–H···O hydrogen bonds with bound proteins (Mandel-Gutfreund et al. 1998). The anionic amino acids, aspartic acid and glutamic acid, have long been expected to bind to the amino groups on A and C (Suzuki 1994).
The conventional assignment of the same sets of partial charges to free and paired nucleic-acid bases ignores the neutralization of the electrostatic potential surface upon base-pair formation. The strength of the potentials on the Watson-Crick edges of the free bases ostensibly contributes to both base-pair formation and ligand-binding specificity. For example, the highly electronegative N3 of the unpaired, catalytically-essential cytosine 75 of the hepatitis delta virus ribozyme acts as a general base, accepting a proton from the substrate 2′-OH during phosphate cleavage (Ke et al. 2004), and the exposed electropositive N1 (NH) and N2 (NH2) atoms of unpaired guanines often donate hydrogens to the anionic side-chain carboxyl of glutamic acid in protein-RNA structures (Kim et al. 2003), including the trp RNA-binding attenuation protein (Antson et al. 1999) and various tRNA synthetases (Yaremchuk et al. 2001), and in the structures of proteins complexed with small guanine-containing ligands (Nobeli et al. 2001).
In addition to reproducing high-resolution structural data, the calculations reviewed here account satisfactorily for many of the traditional physical benchmarks used to assess computational reliability. For example, the Watson-Crick binding energies match gas-phase observations, −13.0 kcal/mole for A·T and −21.0 kcal/mole for G·C (Yanson et al. 1979). The calculated values, −14.3 kcal/mole for A·T and −25.0 kcal/mole for G·C, however, exceed the recent findings of Jurecka and Hobza (−15.4 and −28.3 kcal/mole, respectively) (Jurecka and Hobza 2003), who assert that their results represent the lower boundaries of the true stabilization energies.
The future promise of atomic-level simulations in deciphering the sequence-dependent properties of DNA rests on continuing improvements of the force fields that underlie the calculations. The configurations and charges of the Watson-Crick base pairs presented here account for general features in the next level of double-helical structure, i.e., the preferred arrangements of neighboring base pairs, but miss key sequence-dependent differences in intrinsic structure and deformability. Correct predictions include (i) the anisotropy of DNA bending, i.e., preferable deformation via roll as opposed to tilt, (ii) the relative ease of bending compared to twisting, and (iii) the preferential displacement of base pairs via shift and slide. The predicted sense of AA·TT vs. GG·CC deformation, however, is incorrect (see Supplementary Materials).
The A·T base pair adopts a different, more highly propellered arrangement within double-helical structures that appears to contribute to the observed stiffness of AA·TT compared to GG·CC dimers. High-resolution DNA structures also reveal a sequence-dependent build-up of water molecules and amino acid residues around the nucleic-acid bases (unpublished data). Approximation of these features through reduction of the base-pair charges contributes to the ‘A-philicity’ of GG·CC compared to AA·TT steps (Ivanov and Minchenkova 1995), i.e., the tendency of the GG·CC dimer to assume positive roll angles that close the major groove and negative values of slide that displace the G·C pairs with respect to the double-helical axis and concomitantly deepen the major groove. Interestingly, previous calculations that successfully mimicked the tendencies of GG·CC dimers to adopt A-like conformational states, treated the partial charges of base atoms with a less polar, albeit highly approximate Poltev charge set (Zhurkin et al. 1980) and approximated the intervening solvent with a sigmoidal, distance-dependent dielectric constant (Mazur et al. 1989). The appropriate balance of charge on the DNA bases, backbone, and surrounding chemical environment is key to the correct prediction of DNA fine structure and interactions.
The U.S. Public Health Service (research grant GM20861 to WKO) and the National Science Foundation (Advance Fellows Award 0137961 to MOF) have generously supported this work. We thank Dr. Suse Broyde for helpful discussions and Mr. Mauricio Esguerra for the scripts used to extract files from structural databases.