|Home | About | Journals | Submit | Contact Us | Français|
Oxidatively-generated interstrand cross-links rank among the most deleterious DNA lesions. They originate from abasic sites, whose aldehyde group can form a covalent adduct after condensation with the exocyclic amino group of purines, sometimes with remarkably high yields. We use explicit solvent molecular dynamics simulations to unravel the structures and mechanical properties of two DNA sequences containing an interstrand cross-link. Our simulations palliate the absence of experimental structural and stiffness information for such DNA lesions and provide an unprecedented insight into the DNA embedding of lesions that represent a major challenge for DNA replication, transcription and gene regulation by preventing strand separation. Our results based on quantum chemical calculations also suggest that the embedding of the ICL within the duplex can tune the reaction profile, and hence can be responsible for the high difference in yields of formation.
The large variety of DNA defects that can be formed upon initial attack of reactive oxygen and nitrogen species is indicative of a rich and complex underlying chemistry (1). Amongst the ~80 oxidatively-generated lesions that have been characterized so far, abasic (Ap) sites and 8-oxoguanine are the prominent single-nucleotide defects, but more complex lesions involving two nucleobases that become covalently tethered have also been identified (2) and are widely investigated (3–11). One possible degradation pathway implies a subsequent reactivity of Ap, which can condensate with the exocyclic amino group of purines. This generates interstrand cross-link (ICL) lesions, which have been firmly identified over the last decade (7,8,12,13): this proves in passing that the B-DNA helix is flexible enough to host the formation of a carbon–nitrogen linkage anchoring the two strands. A single ICL defect can trigger unyielding obstruction to DNA replication, because the strand separation is blocked, and can cause cellular apoptosis (14). Price et al. have recently reported (7,8) remarkably high yields (15–70%) for the formation of such defects upon attack of the N6-amino group of adenine (Ap-dA or CL(A)). They also evidenced a more rare Ap-dG (or CL(G)) defect (ca. 2–3%), involving the exocyclic N2-amino group of guanine (see Figure Figure1).1). It is particularly intriguing that such lesions were found to proceed selectively with a purine opposite a 3΄-adjacent nucleotide, rather than directly with the orphan adenine (dA32). It was suggested by the authors that the formation of ICLs reflects the ease with which the two partners can approach under the mechanical constraint of a B-helical environment (7). But very recently Greenberg et al. have challenged the idea that thermal destabilization can determine the various yields of formation of these highly mutagenetic lesions (15). Covalently tethering the two strands inevitably implies a loss of B-helicity via an alteration of Watson–Crick hydrogen bond (HB) network, possibly ranging over several base pairs, and a redefinition of stacking interactions. The field suffers from the absence of structural data that are crucial to interpret the lack of repair.
Indeed, approaches using synthesized oligonucleotides and nucleotides (16) are not compatible with the production of milligram quantities or high-quality crystals. Only the chemical structure for the isolated lesion, obtained after enzymatic digestion, can be proposed after careful inspection based on tandem mass spectrometry coupled to high pressure (or high performance) liquid chromatography (13) and/or nuclear magnetic resonance (8). One irremediably looses the structural information concerning the way the ICL redefines the oligonucleotide. A crystal structure of a nucleoside model for the Ap-dG cross-link was recently reported (17), but experimental structures of an ICL within B-DNA are not known.
Moreover, any experimental information concerning the mechanical flexibility or stiffness of these lesions is missing. The peculiar lesion mechanics may be decisive in recognizing the damage for repair. The complex repair mechanisms of ICLs are a subject of intense research (18–20). To palliate the lack of experimental information, molecular modeling can prove useful by analogy to what has been proposed for DNA-drug adducts (21,22) but also for covalently-tethered intrastrand lesions (23–25). In this work, relying on a computational approach, we resolve the structure of two 21-bp oligonucleotides featuring an ICL. We then derive the mechanical properties induced by the cross-link and we determine the stabilization or destabilization of the ICL within the B-DNA local environment. The stabilization energies and entropies can be related to the very different yields of formation of Ap-dA and Ap-dG as the ICL induction obeys a thermodynamic control.
We focused on two 21 base pair (bp) duplex DNA sequences, A and B (Figure (Figure1),1), investigated experimentally by Gates et al. (7). All classical molecular dynamics (MD) simulations were performed using the Amber 12 suite of programs (26). We started with reference DNA duplexes featuring an abasic site at the 11th nucleobase position (Ap11). Since no experimental structure has been reported for the ICL-containing duplex DNA, the two duplexes were built using the nab module of Amber, with conformational parameters characteristic of B-DNA. The parmbsc0 force field was used for DNA, the force field parameters for the Ap site were generated using Antechamber and atomic point charges were assigned using the RESP procedure. They are comparable to parameters similarly derived in the literature (27–29), and we verified that a similar behavior is obtained with the parmbsc1(30) force field, which is the current reference and most tested set of parameters for undamaged DNA (31).
Each duplex was neutralized with 40 potassium ions using Dang parameters, and solvated with TIP3P water molecules in an octahedric simulation box with a 10 Å buffer. After 10 000 steps of energy minimization including 5000 steps of steepest descent and 5000 steps of conjugate gradient, the temperature was increased to 300 K in a 30 ps thermalization run. During the rest of the simulation, the temperature was kept constant using the Langevin thermostat with a collision frequency γln of 1 ps−1. Then, each system was equilibrated during 1 ns in NPT conditions. Structures obtained after a 200 ns production run provide starting geometries bringing close together the Ap site and the offset purine forming the ICL identified by Gates et al. MD simulations were also performed with the open form of the abasic site over 200 ns, which albeit more rare is responsible for the ICL formation alongside a multi-step pathway (32).
As the C1’…N distance gets close to 4–5 Å, it becomes possible to enforce a covalent linkage between Ap11 and G33 or A31 by introducing a harmonic force constant between these two atoms (atom types were changed accordingly still using the parmbsc0 force field, see Supplementary Data). Hybrid quantum mechanics/molecular dynamics (QM/MD), where the ICL is described at the semi-empirical AM1 level, were performed over 1 ns to further validate the classical parameters that were assigned. No strong deviation was observed between the structures obtained by hybrid and unrestrained 500 ns MM-MD simulations (see Supplementary Data). Also the distribution of interatomic distances is comparable at the MM and semi-empirical level. From the final MM-MD structures, a fragment encompassing the ICL lesion and four vicinal nucleobases (as depicted in round dashed box in Figure Figure1)1) was extracted, and energies were evaluated within the framework of density functional theory (DFT) at the BLYP-D3BJ/6-31G(d) level of theory. This level of theory has been calibrated on DNA systems featuring non-covalent interactions (33) and in particular the use of a hybrid functional such as B3LYP leads to the formation of spurious hydrogen bonds for the isolated ICL-containing cluster (34). These DFT calculations were performed using the Gaussian09 suite of programs (35). As a control, two B-DNA duplexes analogous to sequences A and B but with no lesion were built, equilibrated and simulated for 200 ns as described above. Representative snapshots were found using the cpptraj module of AMBER, the rigid base and basepair coordinates were computed using the 3DNA program (36).
The simulations of the two Ap-containing sequences allow to monitor the structural changes that take place when a covalent linkage is about to form between the Ap site and the base (G33 or A31, respectively, Figure Figure1),1), providing insight into the reactant structures. The structures are flexible enough to undergo structural adaptation, with formation of new hydrogen bond(s) replacing the disrupted Watson–Crick pairing and deviation of the backbone. Figure Figure22 shows cartoon representations of the damaged sites and time evolutions of atomic distances, indicating that the Ap site seeks to fill up the empty space left by the loss of thymine (27–29). For the oligonucleotide A (left), it can be seen that the distance between the carbon C1’ of the AP site and the nitrogen of the amino group of G33 is 4.5 ± 0.3 Å that contrasts to the distance of 8.4 ± 0.5 Å to A32 (blue and green lines, respectively). Our simulations thus corroborate a higher propensity for the ICL to form with a nucleobase adjacent to the orphan base, as reported experimentally (7). For the oligonucleotide B (right), one can see that the Ap site approaches A32 and A31 up to 4–5 Å, yet more transiently, as the distance fluctuates on average around 7.4 and 8.0 Å (green and blue lines, respectively). This behavior holds when the transient open form of the Ap site is considered (see Supplementary Data). In what follows we carefully inspect the structural reorganization and flexibility of the ICL-containing oligomers.
Figure Figure33 shows atomic structures of parts of Sequences A and B surrounding the lesion, with the covalent linkage already present. The structures were obtained as representative snapshots, in the sense of minimal sum of RMSD, from the MD ensembles of the ICL-containing duplexes (the pdb files are in Supplementary Data). Schematic representations and time evolution of selected hydrogen bonds are also shown.
It is seen that for Sequence A (Figure (Figure33 left), the cross-linked site CL(G) forms three Watson–Crick HBs to the C10 base in the opposite strand for most of the time. However, the system occassionally undergoes a conformational switch to an alternative substate in which the C10-CL(G) hydrogen bonding is lost and C10 forms one HB to the A12 base in the same strand instead (around 100 ns and 300 ns). Besides that, there is one short time interval in which C10 loses all hydrogen bonding and tends to flip out of the helix (around 400 ns). We exclude this major, but isolated and short lived perturbation from all subsequent analysis. As for Sequence B (Figure (Figure33 right), there is only one, somewhat unstable Watson–Crick HB triplet, namely that between G10 and C33. Thus, the HB patterns in the two lesions are quite different. The same applies to the stacking patterns: while in Sequence A there is just one unpaired base (A32) sandwiched between the C10-CL(G) and A12-T31 pairs, the CL(A) in Sequence B forms a stack together with the flanking unpaired bases T12 and A32.
These local differences are reflected in different overall structures and flexibilities of the two damaged sites. In order to quantify them, we define a virtual basepair step consisting of base pairs A9-T34 and A13-T30 for Sequence A, and the analogous pairs T9-A34 and G13-C30 for Sequence B (Figure (Figure3).3). They are intact WC pairs flanking the lesion. The structure of the lesion as a whole is then defined by the six standard local basepair step coordinates (tilt, roll, twist, shift, slide and rise) associated with the virtual step. By analyzing each snapshot of the MD trajectory using the 3DNA software (36), we obtain time series of the step coordinates. The histograms of the coordinates (Figures S7 and S8 in Supplementary Data) indicate nearly Gaussian distributions. This may be surprising given the non-trivial local conformational dynamics involving two substates in Sequence A and an unstable base pair in Sequence B (Figure (Figure3).3). A possible explanation is that the global structure results from multiple contributions, both harmonic and anharmonic that, thanks to the Central Limit Theorem, add up to an effective harmonic behavior.
The nearly harmonic probability distribution of the coordinates enables us to apply the model of equilibrium shape and harmonic stiffness (37) to the virtual step. The step coordinates, assembled in a six-dimensional vector , are associated with the effective harmonic deformation energy U of the form
The model contains two sets of parameters, namely the equilibrium coordinate values assembled in vector and the symmetric, positive definite stiffness matrix . These parameters are related to the moments of the coordinate distribution as
where is the coordinate mean (first moment) and is the coordinate covariance matrix (second moment). In our analysis, we estimate the moments using the coordinate time series obtained from the unrestrained MD simulations.
Table Table11 shows the equilibrium coordinates of the virtual step in the two damaged sequences together with the same virtual step taken from the control, undamaged DNA duplexes. The structure of ICL A exhibits increased local bending (a tilt of ca. −21°) but is otherwise quite similar to the control. In contrast, ICL B affects bending only slightly but shows other important structural deviations from the control. In particular, it is undertwisted by 20°, its lateral displacement of the base pairs (shift and especially slide) change and the vertical displacement (rise) increases by >4 Å, reflecting the peculiar stacking pattern (Figure (Figure33).
The effective mechanical flexibility as defined by the stiffness matrix is also different for the two lesions. Each stiffness matrix contains 21 independent entries. A convenient way to capture the overall stiffness by one single number is to compute the entropy per one basepair step coordinate sc (38), given by
where N = 6 is the number of coordinates, e = 2.718… is the base of the natural logarithm and det denotes the matrix determinant. The values of sc are shown in Figure Figure4.4. While ICL A is only slightly more flexible (has higher entropy) than the control undamaged duplex, ICL B exhibits dramatically higher flexibility than the control. A more detailed insight can be obtained by inspecting the stiffness matrix diagonal. The diagonal entries are stiffness constants for deforming just that one coordinate, while the other coordinates remain unchanged. The values (not shown) reveal that ICL B is more flexible than the control in all the coordinates.
The virtual step model captures the overall flexibility of the ICL site. To obtain more insight, we also consider a more detailed model in which each base within the ICL site (Figure (Figure3)3) is treated as a rigid body. The configuration is defined by a set of intra-basepair, basepair step and single-strand inter-base coordinates as introduced in the 3DNA program (36). The MD time series of the coordinate vector are then treated exactly as above to obtain the stiffness matrix and the entropy. Here, however, the coordinate distributions are significantly anharmonic in some cases (Supplementary Figure S9), meaning that the use of the harmonic model involves a certain coarse-graining of the probability distribution, namely replacing the actual (possibly complicated) distribution by an effective Gaussian with the same mean and covariance. Finally, we computed the entropy of the ICL sites (Figure (Figure3)3) using the atomic-resolution quasi-harmonic scheme of Andricioaei and Karplus (39). Notice that this model is also based on the harmonic approximation.
We complement the structural description by measuring the overall bending of the oligomers, using the method described in (40). Briefly, three groups of bases are defined within the oligomer, one at each end and one in the middle. Base-fixed coordinate frames in each group are averaged to yield one mean frame per group. Bending coordinates in two perpendicular directions (global roll and global tilt) are averaged over the MD trajectory to define the mean bending. It is then expressed in terms of bending magnitude (angle between the z-axes of the end frames) and bending direction angle, defined as the angle between the bending direction and the x-axis of the middle frame pointing into the major groove. Thus, a direction of 0° indicates bending toward the major groove, 180° means bending toward the minor groove in the middle of the oligomer.
Here, we chose bases in pairs 4-5 and 17-18 as the end groups, and bases in the the two pairs defining the virtual step (see above) as the middle group. The results are summarized in Table Table2.2. Both control sequences are slightly bent toward the minor groove. Introducing the lesion in Sequence A changes the bend direction toward the backbones, which is consistent with the negative tilt introduced by the lesion (Table (Table1).1). The lesion in Sequence B straightens the oligomer. It can be understood as the effect of increased roll (i.e. local bending towards the major groove, Table Table1)1) at the lesion site that compensates for the minor groove bending in the rest of the oligomer.
Finally, we compute global bending and twist stiffness of the damaged and undamaged oligomers. The calculations are done exactly as in (38). Briefly, a quadratic model analogous to Equations (1) and (2) is used, with coordinates involving bending angles toward the grooves and toward the backbone together with total twist. The first two diagonal entries of the 3 × 3 stiffness matrix are stiffness constants for groove and backbone bending, respectively, from which the equivalent isotropic bending stiffness is computed as their harmonic average. The results in Table Table22 again indicate exceptionally small stiffness of the oligomer containing ICL B.
Overall, our data indicate that the two lesions exhibit quite different hydrogen bonding and stacking patterns and affect the DNA properties in a different way. In particular, ICL A introduces moderate local bending toward the backbones and is only slightly more flexible than a piece of regular duplex DNA, while ICL B introduces undertwisting as well as a lateral displacement and creates a hotspot of high flexibility.
Our simulation of Ap-contanining 21-bp duplexes provide a dynamic view of the approach between Ap11 and the vicinal purines A31 and G33. It differs significantly from the distances inferred from X-ray structures of undamaged DNA (e.g. PDB ID code 3BSE (7)) or on ‘frozen’ molecular models, which both neglect the rearrangement the duplex undergoes upon formation of an Ap site. Our simulations go beyond these inspections, as we can infer approaching distances between the Ap site and proximal purines, as we also reported for other interstrand cross-links such as dCyd341 (13,41). The two reactants exhibit different eases of approach: Ap11 gets closer to G33 spontaneously within ~50 ns, prefiguring the cross-link Ap-dG. Interestingly, Ap and A31 in sequence B do not approach whereas the corresponding lesion Ap-dA was found experimentally to present a much higher yield of formation.
The selective formation of the ICL Ap-dA31 (15–70%) versus Ap-dA32 hence cannot be explained by the reactant structure. Furthermore, Ap approaches dG33 easily, but its yield of formation is reported to be only 2–3% (7). Our simulations demonstrate that not only the positioning of the two reactive partners within the oligonucleotide but also, and perhaps more importantly, the final structure of the product and its thermal stability determine the ease of formation of the ICL adduct: this also can be proposed since the reaction has been reported to be a reversible process (6).
The two interstrand cross-links we characterize by computational means also single out compared to other intrastrand lesions implying a covalent linkage. For instance, we do not observe the exclusion of proximal pyrimidines, in spite of the marked narrowing of the minor groove. The value of the bend angle remains within the thermal fluctuations, again in contrast to what is known for photolesions such as (6-4)PP (42) and intrastrand cross-links (23–25). Other single-nucleotide lesions do not affect the B-helix so markedly. The most distinctive feature of this class of oxidatively-induced lesions is to prevent DNA strand separation: our simulations reveal more finely contrasted structural behavior. We evaluated notably the bending, which can be compared to interstrand lesions induced by cross-linking agents (43): diamminedichloroplatinum induces a bend by 35° to 45° and an unwinding by 79° for the cis conformer, but a less marked distortion for the trans (unwound by 12° and bent by 26°). Nitrogen mustard only weakly perturbates B-helicity, with a static bend of ca. 14° in DNA, whereas other agents induce no bending such as mitomycin and psoralens. It appears that each interstrand cross-link lesion can be associated to a structural signature, and our simulations for oxidatively-induced ICLs provide such data that are crucially missing.
It is legitimate to ask whether or not the contrasting structure and flexibility of Sequences A and B may be related to the different yields of ICL formation measured experimentally. Indeed, within the hypothesis of a thermodynamic control, and the ICL formation being a reversible process (6), the ICLs thermal stability can be a decisive factor. We can provide an estimate of the reaction energies based on density functional theory (DFT, see Materials and Methods). To this end we compute the ICL formation (de)stabilization energies ΔEICL = Eprod − Ereact between the reactants and the products for a fragment featuring the reactive system and the two vicinal base pairs. They account for +5.4 and −14.5 kcal/mol for Sequences A and B, whereas the values for the isolated Ap...A or Ap...G systems are ca. +6–7 kcal/mol (see Supplementary Figure S1). Hence the proximal nucleobases tune the thermal stability of the products, which in turn affects the energy profile, giving rise to contrasted yields. Indeed, from our unconstrained simulations of the Ap-containing oligonucleotides, the Ap sites spontaneously approach purines on the opposite strand at 4–5 Å (see Figure Figure2).2). Our DFT analysis suggests that the thermal stability of products is decisive for the selective formation of ICLs, whereas the reactant valley can be more shallow. Our results reinforce the postulate of a correlation between structural distortion and thermal stabilities of DNA interstrand cross-links (15). These overall energies reflect a subtle interplay between different non-covalent interactions. Ap-dA is characterized as a flexible hotspot with an intrinsic chemical stability, which suggests that this lesion in duplex DNA may have the power to block DNA-processing enzymes involved in transcription and replication (9). The ICL induction is more favored for Ap-dA (Sequence B) than for Ap-dG (Sequence A), as the former is associated with a more favorable reaction energy. Moreover, the Ap-dG ICL is further stabilized by the higher entropy of the lesion site (Table (Table3).3). To reach a more quantitive assessment, free energies and sequence effects should be tackled, which constitutes a perspective of this work.
The structural and mechanical properties we derived agree fairly well with available experimental data, but an in-depth understanding and validation of our modellization findings can only arise from additional experiments. Our simulations hence call for an experimental proof: whereas the obtention of nuclear magnetic resonance or X-ray structures would be the most definitive probe and validate the calibration of the timescale of our simulations, it is a considerably difficult task. Another key information accessible experimentally is the yield of induction of oxidatively-induced ICLs, and also sometimes kinetic measurements (44). Hence, free energies should be determined with a high accuracy to reach a synergetic view with classical MD simulations: this would require accurate free energy calculations, since the scarce data available indicate differences that can be as low as 0.3–1.0 kcal/mol (44). Our DFT cluster analysis is probably only capable to characterize ICLs that differ importantly in energy. An equally important information that is more easily accessible consists of experimental measurements of oligonucleotides flexibility. The method of DNA cyclization can provide the magnitude and direction of bending as well as torsional and bending stiffness of a short, intact DNA sequence motif (45). Thus, if the lesion is incorporated in the middle of a short DNA sequence, then the same method should be able to measure the effect of the lesion on these properties. The measurements can then be directly compared to our computed values in Table Table2.2. Only several copies of the lesion would be needed: for instance, no more than three copies of the sequence motif were used in (45). Measures of AP sites and interstrand cross-link half lifetime is also a valuable experimental information (46) characterizing the constrasted mechanical properties DNA lesions experience once in B-DNA or nucleosomal DNA.
Interstrand cross-links are particularly deleterious lesions, yet a deep understanding of their formation remains to be laid on a firm basis. In this work, explicit-solvent MD simulations combined with quantum chemical DFT calculations were performed to unravel the structure, dynamics and mechanical properties of two experimentally characterized ICL-containing oligonucleotides. The results refine the initial rigid model provided by experimentalists, who invoke a higher reactivity for the purine presenting the shortest proximal approaching distance to the electrophilic Ap site. This criterion implies the selectivity onto a purine adjacent to the orphan nucleobase rather than onto the orphan nucleobase itself, yet is not sufficient to explain the lower yield of Ap-dG versus Ap-dA.
Our simulations allow to build up oligonucleotides containing these two defects, and we evidence constrasted structural and mechanical properties that probably induce a different free energy profile. A natural perspective would be to determine a free energy profile for the multi-step reaction, or alternatively free energies of destabilization. This would corroborate a nascent correlation between structure and thermal destabilization for interstrand cross-links (15).
Furthermore, the contrasting mechanical rigidity of the two lesion sites may affect the supercoiling properties of the damaged DNA and play a role in recognizing the ICLs during the initial stages of their repair. The replication-dependent repair requires one or possibly two replication forks to meet at the ICL and thus assumes a large-scale deformation well beyond the harmonic approximation adopted here. The same is true for the transcription-dependent repair that assumes a transcription bubble. Besides that, however, ICL repair in quiescent DNA also takes place, possibly to relieve the negative impact of the tethered strands on DNA topology and supercoiling (18). In this case, the peculiar stiffness of the lesion site may play a role in the ICL recognition, just as the sequence-dependent stiffness of undamaged DNA does for many DNA-binding proteins. The specific mechanisms of the ICL repair have only just started to be unravelled. A clear and systematic assessment of sequence effects, coupled to free energy calculations, would be helpful and will be a topic for future work.
This work was performed within the framework of the LABEX PRIMES (ANR-11-LABX-0063) of Université de Lyon, within the program ‘Investissements d'Avenir’ (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR). Calculations were performed using the local HPC resources of PSMN at ENS-Lyon. E.B. is grateful for a PhD fellowship from the French Ministry of Higher Education and Research. T.D. and F.L. were supported by the Grant Agency of the Czech Republic (14-21893S).
Supplementary Data are available at NAR Online.
Université de Lyon [LABEX PRIMES (ANR-11-LABX-0063)]; French National Research Agency (ANR) [‘Investissements d'Avenir’ (ANR-11-IDEX-0007)]; Grant Agency of the Czech Republic [14-21893S]. Funding for open access charge: LABEX PRIMES [ANR-11-LABX0063 of Université de Lyon].
Conflict of interest statement. None declared.