|Home | About | Journals | Submit | Contact Us | Français|
Molecular dynamics simulation studies of protein-nucleic acid complexes are more complicated than studies of either component alone the force field has to be properly balanced, the systems tend to become very large, and a careful treatment of solvent and of electrostatic interactions is necessary. Recent investigations into several protein-DNA and protein-RNA systems have shown the feasibility of the simulation approach, yielding results of biological interest not readily accessible to experimental methods.
In the post-genomic era interest shifts from the properties of individual components of living cells (e.g. proteins) to their many interacting complexes, of various size, composition and duration, that are at the heart of what makes the cell a living entity. Cellular components that do not communicate or interact, simply would neither benefit from, nor contribute to, the sustained organization characteristic of life. The intricate interplay between proteins and nucleic acids exhibited by all forms of life that we know of today thus forms the biological backdrop of the present review.
For complex systems molecular dynamics (MD) simulations are often the only way to obtain detailed atomic-level information, and the several program packages (some of the most commonly used are AMBER, CHARMM, GROMOS, GROMACS, and NAMD) that have been developed specifically for atomistic simulations of biomolecules have over the last two decades been used in well over 10000 published studies. However, MD simulations of nucleic acids (NA) initially lagged behind protein simulations for several reasons: Nucleic acids are highly charged, usually quite non-globular, and there were not very many structures available by which to develop and validate NA simulation methodologies. Not surprising for nucleic acids it has been shown to be necessary to treat the solvent and electrostatics more carefully than what had been required for simulations of small globular proteins. Inclusion of an explicit representation of the solvent leads to a significant increase in the number of particles that have to be treated, and it is not uncommon that 80 – 90% of the computational cost comes from solvent-solvent interactions. More recently implicit solvent methods have also been used with some success for systems containing nucleic acids, but for large systems these methods are also computationally expensive. In contrast to proteins, which in many cases are quite well-behaved even in gas-phase simulations using relatively short cutoffs with group-based spherical truncation of the Coulomb interactions, nucleic acids when treated in the same way lose their structural integrity. This was never a problem in CHARMM, which has sophisticated atom-based smoothing of electrostatic forces[6–8], but for some other programs it was not until efficient lattice sum methods became available that nucleic acids could be reliably simulated.
A further complication for simulations of protein-NA complexes lies in the force field, which has to be balanced in terms of pair interactions involving all combinations of solvent and the solute atoms. Much effort has gone into the development and validation of force fields, as outlined in the first section below, contributing to the success of simulation studies of protein-NA complexes.
One of the earliest studies of a protein-nucleic acid complex was of a mononucleotide bound to ribonuclease T1 . Later, well-behaved simulations of duplex DNA  were soon accompanied by studies of transcription factors bound to DNA . In the decade that followed vast increases in available computer power together with a rapidly growing number of experimental structure determinations of protein-DNA and, in particular, protein-RNA complexes stimulated an increased interest in MD simulations of these systems. In the two main sections of this review we show that it is now possible to tackle issues of relevance for molecular biology by using examples taken from some recent studies of protein-DNA and protein-RNA complexes, with the focus on all-atom simulations in explicit solvent
Two force fields commonly used for simulations of nucleic acid-protein complexes are AMBER and CHARMM.[14,15] Both force fields treat DNA and RNA as well as modified nucleic acids. Several modifications of the AMBER nucleic acid [16–18] and protein[19–21] force fields have been published, potentially leading to complications in comparison of results from different studies. A recent extension of the CHARMM protein force field to improve treatment of the peptide backbone, termed CMAP, has been published  and now represents the standard for CHARMM protein and protein-NA simulations. Other force fields potentially may be used for protein-NA simulations including GROMOS,  OPLS,  MMFF  and CFF, though the number of studies using these models is limited. Finally, an interesting development is the potential use of GB based methods for protein nucleic acid complexes. While some success in the method was noted it should be emphasized that the lack of explicit solvent in the simulations, which play important roles in NA structure as well as protein-nucleic acids interactions, may lead to limitations in this approach.
MD simulations of DNA-protein complexes have recently yielded insights into the balance between the inherent structural properties of nucleic acids and the impact of the protein partner on those properties. In many of these studies distortion of the DNA when bound to the protein was investigated, including simulations of DNA in the presence of the transcription factor p53 , important because of its implication in a large number of cancers. Studies of p53 have also investigated the role of Zn+2 on both recognition and stability  and various models of the tetrameric p53 core domain bound to DNA duplex structures as well as to Holiday junctions have been assessed based on conforomational samples generated from MD simulations.  An interesting study combined MD simulations with the Molecular mechanics/Generalized Born/Solvent Accessible Area (MM/GBSA) approach to investigate the cooperativity seen in the binding of the RUNT transcription factor with its DNA target sequence and with a second protein, CBFβ. Results indicate that the presence of the third partner in the trimer enhances the intermolecular interactions between the partners in both the Runt-DNA and Runt-CBFβ dimers. Using a similar approach MD simulations were combined with MM/Poisson-Boltzmann-SA (MM/PBSA) free energy analysis to understand the impact of carcinogens on binding of the TATA sequence to the TATA-Box Binding Protein. In general, approaches where MD simulations are used to sample the conformational space of structures with the free energies of the systems evaluated via either the MM/GBSA and MM/PBSA methods represents a powerful approach to obtain estimates of the free energies of binding for large complexes. However, it should be emphasized that adequate sampling needs to be performed to assure convergence of the free energies and care must be taken to obtain accurate estimates for the configurational entropies of the systems.
In the area of specific recognition of DNA sequences by proteins a number of interesting studies were reported. It was shown that energy calculations alone on mutant sequences modeled based on the crystal structures of DNA-protein complexes could identify consensus binding sequences for a number of complexes, including examples where indirect readout via water molecules contributes to specificity. A study on Hoxc8 using a similar approach supplemented with MD simulations was also successful in identifying the cognate sequence. Emphasizing the role of water in DNA-protein recognition was a simulation study of the restriction endonuclease BamHI where it was proposed that a “hydration fingerprint” associated with interfacial waters can mediate specificity. Notable was the use of Grand Canonical Monte Carlo simulations to allow for proper sampling of water in regions of the DNA-protein interface not exposed to the bulk solvent. Concerning DNA binding proteins that lead to flipping of a base or bases out of the duplex and into the enzyme catalytic site, an energetic or dynamic recognition mechanism was shown to impart the specificity for GCGC sequences by the cytosine methyltransferase from HhaI (M.HhaI, Figure ). Specificity is based on the ability of the enzyme to only be able to successfully flip out and stabilize the target cytosine base in the presence of the correct sequence. However, in the case of the 8-oxoguanine-glycosylase, free energy calculations in combination with crystallographic analysis indicates specificity to be related to specific recognition of 8-oxo-G versus G in the enzyme active site. While yet to be experimentally determined, for the general class of “flipping proteins” it is likely that the specificity of DNA repair enzymes will rely more on unique properties associated with the chemical alterations in the DNA whereas enzymes that target specific sequences in unmodified DNA, as with the methyltransferase, may rely on an energetic or dynamic recognition mechanism.
A powerful method to overcome the inherent time limitation of MD simulations (~100 ns) is using a potential of mean force (PMF) calculation. In this approach an external or biasing potential, often referred to as an umbrella potential, is applied to overcome large energy barriers along a reaction coordinate between conformational states. The energetic contribution of the umbrella potential is then corrected for allowing the intrinsic free energy along the reaction coordinate (i.e. the PMF) to be obtained. This approach was used in base flipping studies of DNA in the presence of M.HhaI to show that the protein actively facilitates flipping of the base out of the DNA helix as well as to describe the mechanism of sequence specificity discussed above. PMF calculations have also been used to investigate the stereospecificity of base excision in DNA glycosylase and elegant studies on conformational changes in DNA polymerase based on transition path sampling have been performed. Examples of the application of biased sampling methods to large structural perturbation are the rotation of DNA around topoisomerase I and the movement of PcrA Helicase along ssDNA. While PMFs were not extracted in those studies, they are examples of how external potentials may be used to drive large scale structural changes in DNA-protein complexes. Again, checks must be performed to make sure that adequate sampling is performed in the MD simulations to ensure that the calculated free energy profile is adequately converged.
Simulation studies have been performed on a number of RNA-protein complexes in the cellular machinery involved in processing genetic information, from transcription and splicing to translation of the mature mRNA. MD simulation of a model of the RNA Polymerase II (PolII) elongation complex with both RNA and DNA, including the unwound DNA bubble, in the pre-translocation state indicated that several loops in the protein are involved in keeping the DNA bubble open, and that fork loop 1 of PolII is implicated in the unidirectionality of PolII motion along the DNA. Special hardware was used for simulation of the large 0.5 million particle system, though only 2 ns of simulation time was reported, so that robust conclusions from the simulation are limited. A 50 ns simulation of a complete virus and a shorter targeted MD simulation of the entire ribosome are additional noteworthy efforts that establish the stability of large scale simulations on heterogeneous systems.
The spliceosome, which in eukaryotes carries out removal of introns and splicing of the coding regions to form mature mRNA, is a dynamic ribonucleoprotein complex consisting of five small nuclear RNAs and associated proteins. It has attracted considerable attention from the simulation community, and several interesting studies of the U1A protein-RNA complex have been reported. The U1A protein is one of the best characterized proteins containing an RNA binding domain (RBD, also known as an RNA-recognition motif (RRM)). The first studies[46,47] focused on elucidating the molecular basis for the high affinity of the protein for the RNA hairpin and internal loop (Figure 2f), and on possible mechanistic effects of salt concentration on the stability of the complex ; their main importance was in establishing the feasibility of simulating RNA-protein complexes using either proper spherical cutoff or lattice sum[46,48] methods. A recent extension of the study by Tang and Nilsson, showed the key role of a set of positively charged amino acid residues for proper recognition of the RNA, and a simulation analysis of the structural consequences of experimental point mutations involving aromatic acids in U1A provides a foundation for further attempts to modify the stability of RBD-RNA complexes. Other studies of U1A-RNA have shown the importance of correlated motions for the identification of cooperative networks involved in the binding[51–53], and that such collective motions are absent in a weakly binding mutant. A simulation of another RBD-RNA complex highlighted the flexibility of residues, as well as the presence of several quite rapidly exchanging water molecules at the protein-RNA interface.
Most studies of translation deal with the ribosome, but in at least one study the MM/PBSA approach was used to show that the affinity of tRNAGln for its synthetase depends on entropic contributions from the internal mobility in the tRNA core. Several studies have used part of the ribosome, typically around the decoding center in the small subunit. Replica exchange MD simulations indicate that the energy barrier for the conformational change of the key nucleotides (A1492 and A1493 in the small ribosomal subunit) upon tRNA binding is low enough to allow binding of ligands to influence the equilibrium. Specificity of codon-anticodon interactions in the ribosomal decoding center has been analyzed in structural terms using standard MD simulations  and also in energetic terms using the linear interaction energy method. Free energy perturbation calculations, together with an empirical valence bond approach indicate an entropic origin for the rate enhancement of the peptide bond synthesis reaction at the other end of the tRNA. Interactions between ribosomal RNA and individual ribosomal proteins have also been investigated in terms of conformational and hydration properties.
From the above it is clear that at this time, December 2007, MD simulations have advanced to a point that allows for a wide range of DNA-protein and RNA-protein studies to be performed with confidence that the obtained results are representative of the experimental regimen. Accordingly, it is anticipated that future studies will more closely link MD studies with experimental results, with such collaborative efforts taking advantage of free-energy methods beyond free energy perturbations calculations, including studies based on PMF calculations and using MM/GBSA and MM/PBSA approaches. Explicit solvent MD simulations will also allow for a more detailed understanding of the role of hydration, including interfacial waters, in protein-NA interactions. Finally, the ever increasing availability of computational power will allow for increased sampling, in the form of longer simulations as well as multiple simulations as required to obtain more precise free energy estimates and perform statistical analysis of simulation results, not to mention studies on extremely large macromolecular complexes.
Financial support is acknowledged from the NIH (GM051501, ADM Jr.) and the Swedish Research Council (LN).
Alexander D. MacKerell, Jr., Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore MD 21201, USA.
Lennart Nilsson, Department of Biosciences and Nutrition, Karolinska Institutet, SE-141 57 HUDDINGE, Sweden.