|Home | About | Journals | Submit | Contact Us | Français|
RNA molecules are now known to be involved in the processing of genetic information at all levels, taking on a wide variety of central roles in the cell. Understanding how RNA molecules carry out their biological functions will require an understanding of structure and dynamics at the atomistic level, which can be significantly improved by combining computational simulation with experiment. This review provides a critical survey of the state of molecular dynamics (MD) simulations of RNA, including a discussion of important current limitations of the technique and examples of its successful application. Several types of simulations are discussed in detail, including those of structured RNA molecules and their interactions with the surrounding solvent and ions, catalytic RNAs, and RNA–small molecule and RNA–protein complexes. Increased cooperation between theorists and experimentalists will allow expanded judicious use of MD simulations to complement conceptually related single molecule experiments. Such cooperation will open the door to a fundamental understanding of the structure–function relationships in diverse and complex RNA molecules.
The fundamental importance of RNA in a wide variety of biological contexts is becoming increasingly apparent. Since the discovery that RNA can catalyze chemical reactions in addition to its ability to carry genetic information,1,2 it has become clear that RNA takes on a wide variety of roles in the cell and is involved in all cellular processing of genetic information, from splicing to translation to modification, and particularly regulation, via RNA interference in eukaryotes and riboswitches in bacteria (for review please see, e.g., Ref. 3 - Ref. 21). Understanding how this central molecule of life is able to accomplish such a variety of functions will necessarily involve understanding RNA structure and dynamics at the atomistic level, a challenge that can be addressed only using a combination of experimental studies and molecular dynamics (MD) simulations.22
When planning and conducting MD simulations, the challenges that must be taken into account are conceptually very similar to those associated with single molecule experiments. One of the primary limitations of MD simulations is the insufficient sampling, stemming from the 1 to 100+ ns time scale of the technique. In single molecule experiments, likewise, there is concern about collecting large enough data sets to ensure that the conclusions drawn are generally valid, rather than true only for a particular molecule. The solutions to the sampling problem are also similar for in silico and in vitro approache: increase the time window of interrogation of a single molecule or perform short observations on many molecules in parallel. In addition, by combining theoretical and experimental approaches, each can help to synergistically explain and verify results from the other. Thus, MD simulation and single molecule techniques are inherently complementary, and direct cooperation between theorists and experimentalists working on related issues will benefit both groups and promote valuable, widely applicable solutions.
Opinions about MD simulations range from a flat rejection of everything that is calculated to the indiscriminate application of computational methods to problems that are far beyond the applicability of the technique. Both extreme views are equally unjustified. Modeling, as any other research method, has its scope, limitations, and error margins. Limitations are significant and need to be carefully evaluated when planning simulations and interpreting the results. Computers readily provide numbers and structures, but are not responsible for the reliability of the results and their interpretation. Unjustified application of simulations therefore provides data of limited value and can be misleading. Wisely applied simulations, by contrast, can considerably complement and aid in understanding of experiments and provide qualified predictions. In fact, experiments often can only be understood with substantial molecular modeling. In general, qualitative applications of computational methods are more appropriate than quantitative studies. This review provides an overview of MD simulations of RNA, with examples of successful applications and suggestions for future use.
The first simulations on RNA were limited by the resources available at the time. The force field parameters describing RNA had not been sufficiently optimized, and the simulation protocols before 1995 (particularly the treatment of long-range electrostatic forces) were unable to provide stable trajectories even on the 500 ps time scale due to a drastic accumulation of errors. Another major problem was the scarcity of crystal structures of RNA, and thus a lack of reliable high-resolution structures from which to start MD simulations. In the early 1990s, only a few tRNA crystal structures were available. By the mid-1990s, several more RNA structures had been solved, but some turned out to have limited relevance.23 This limitation left early simulations at a distinct disadvantage, since the typically limited sampling of MD means that a simulated structure may be unable to diverge substantially from its starting structure. (Assuming a Boltzmann distribution and standard Arrhenius kinetics, 10-ns scale simulations are only able to overcome free energy barriers that are on the order of ~5–6 kcal/mol.) Thus, a realistic assessment of dynamics from a simulation depends on a realistic and accurate starting structure.
Some of the controversial results from the earliest RNA simulations were likely due to inadequate or less relevant starting structures.24,25 For example, several MD simulations were based on the crystal structure of the minimal hammerhead ribozyme,26 which was recently shown to be quite different from the active structure.27 Over the last 10 years RNA crystallography has improved substantially. For example, Doudna28 discusses the recent successful use of hexammines to aid in phasing and the importance of chemical modifications such as 2′-O-methyl, 2′H, and vanadate in “trapping” various states of ribozyme cleavage reactions. There are now almost 500 RNA crystal structures in the PDB,29 vastly increasing the possibilities for both the variety and the size of simulations. We have reached the point where even all-atom simulations of the ribosome are becoming feasible,30,31 but we must keep in mind that simulations are still limited by the crystal structures they are based on, and that low resolution in loop regions or uncertainty regarding metal binding can still contribute to error in simulations.
The force fields for simulating RNA have been refined and stable simulation protocols for polyanionic RNA molecules have been introduced, primarily the Particle-Mesh Ewald (PME) treatment of the strong, long-range electrostatic forces.32 A broad set of RNA applications has been tested using the AMBER code and associated force field. This method matches quantum chemical data for stacking33 and base pairing34 (including the complex non-Watson Crick interaction patterns) and, perhaps more importantly, has provided long stable simulations of numerous complex RNA molecules. We have tested the AMBER force field in a total of ~4 μs of RNA simulations (each 10–200 ns long) and so far have observed rather satisfactory performance. To the best of our knowledge, no other force field has been validated in long stable simulations of complex folded RNA molecules. By comparison, the CHARMM force field, when applied to simple duplexes, showed accelerated opening (or breathing) events of A-helical RNA base pairs that were not observed in AMBER simulations or in imino proton exchange experiments by 1H-NMR.35,36 This strong foundation, however, does not imply that AMBER-based MD simulations are entirely accurate, and it is always important that MD studies are executed with careful consideration and discussion of possible simulation artifacts as detailed here.
The computing power available for MD calculations has increased dramatically in recent years, and strategies for distributed computing, such as Pande's Folding@Home,37,38 allow computations that use tens of thousands of processors in parallel to reach hundreds of microseconds of total simulation time.38 Yet, even if future computers are fast enough to allow in silico folding of RNAs of unknown structure, there is no guarantee that the force field will find the correct global minimum. Thus, MD applications will continue to primarily provide analytical insights about known RNA structures and their modest sequence and structural variants, with specific in-depth insights into the complex roles of dynamic molecular interactions in nucleic acids. The combination of more reliable starting structures, improved force fields, and increased computational power make MD a very feasible and promising technique to study the structure and dynamics of RNA, especially when MD is used in conjunction with (single molecule) experiments.
The goal of MD simulations is to mimic the real-time dynamics of single solvated biomolecules in order to elucidate atomic-resolution details of their structural dynamics and eventually estimate the free energies of specific conformations. Simulated RNA molecules are immersed in a sufficiently large (extending at least 10 Å from the RNA) box of water molecules and ions that is periodically extended in all directions (periodic boundary condition). The molecules are described by simple pair additive atomistic potentials (force fields) that treat atoms as van der Waals spheres with partial, constant, point charges localized at the individual atomic centers, linked by harmonic springs mimicking covalent structure and supplemented by simple torsion profiles.
The main limitations of MD simulations stem from: (i) the time-scale of simulations (trajectories are typically 1−100+ ns) and (ii) force field inaccuracies. The calculation of long-range electrostatics has previously been described as a major limitation,39 but introduction of the PME treatment of electrostatics appears to remove all major drawbacks in these calculations.32 It has been suggested that PME can overstabilize simulated systems,40,41 but this has not yet been clearly shown, and the problem should be rather marginal compared to sampling and force field limitations.
The short timescale and resulting limited conformational sampling are often discussed, frequently with the conclusion that faster computers would lead to better results. However, longer simulations may also expose force field deficiencies that have cumulative effects over time, which will become a significant problem once computers can routinely produce microsecond-long trajectories. Additionally, the timescale of real events is of course still much longer than affordable simulation timescales. However, this limitation can sometimes be overcome by running multiple simulations with distinct starting structures or by using enhanced sampling MD methods (see Enhanced Sampling section). Furthermore, a simulated RNA conformation is typically not fully equilibrated as the simulation begins so that the probability to capture structural changes is initially higher than for real systems, facilitating sampling.
Force field deficiencies are often not discussed, although they are potentially more serious than sampling limitations. The force field is (remain in the future for a variety of reasons) so simple that it cannot capture accurately all force contributions simultaneously, especially when using multipurpose biomolecular force fields. One can tune the force field to reproduce experimental data for one aspect of the simulated system, but this tends to increase errors elsewhere. For example, polar hydrogens have small van der Waals radii when interacting with polar groups but quite large ones when they are in contact with nonpolar groups, and so the fixed radii in a simulation must be a compromise.42
Force field deviations from reality are not random. For the AMBER force field, rigorous comparison with quantum mechanical electronic structure calculations has revealed that base stacking and hydrogen bonding parameters are described surprisingly well, including those of base pairs that utilize the 2′-OH group.33,34 AMBER uses atomic charges that were derived to reproduce the electrostatic field around the monomers, which turns out to describe the molecular interactions of nucleobases quite well. The van der Waals term derived from Lennard–Jones potentials is also well balanced. The sugar–phosphate backbone, however, is considerably more difficult to deal with for two reasons: (i) it is flexible and thus the constant point charges do not reproduce the electrostatic potential equally well for distinct backbone conformational classes43,44 and; (ii) the phosphodiester represents a highly polarizable anion, with a complicated electronic structure that changes with solvation. These contributions are neglected by the force field. Despite these problems, the behavior of the backbone in RNA simulations has been quite realistic (see, for example, Refs. 45 and 46).
A significant deficiency of the force field is the description of divalent cations. The Mg2+…N7(guanine) interaction energy term is ~210 kcal/mol, while the force field accounts for only ~150 kcal/mol. This discrepancy is due to an interaction energy of ~70 kcal/mol contributed by nonadditive effects (neglected by the purely additive force field) in the first ligand shell of Mg2+, mainly inter-ligand polarization repulsion.47 In reality, the first-shell water molecules are heavily polarized by the ion, and their properties are very different from those of bulk water molecules (for example, they have the capacity to form strong, low-barrier hydrogen bonds with large bathochromic infrared absorption shifts, where the hydrogen can easily jump between heteroatoms or may even be delocalized48). Additionally, simulations are unable to provide a well-equilibrated distribution for multivalent ions. Present force fields are biased towards direct (inner-shell) binding of Mg2+ to solute and require careful initial equilibration, since divalent cations consequently sample very poorly.49 Although the overall artifacts in simulations that include divalents are reduced through partial error compensation, divalent ions should be included in simulations only when absolutely necessary. It is not necessary to include divalents in simulations even when complex tertiary structured RNAs require them for folding in experiments, since the timescale of MD is typically too short to result in unfolding due to the absence of divalent ions.50
The description of anions is also quite challenging, since they have electrons localized far from their atomic centers and are typically highly polarizable. A comprehensive description would require the use of a polarizable force field.51 In addition, high-salt simulations that include anions can cause new artifacts, such as the formation of KCl clusters when using standard AMBER force field parameters.52,53
Force field errors are considerably smaller for monovalent cations, which also sample quite well in 25+ ns simulations.45,54-56 Analysis of the residence times and locations of monovalent cations can confidently predict binding sites with occupancies of >50%, though studies of weaker binding sites, such as those in B-DNA minor grooves, are less straightforward. Taking all these observations into consideration, the common approach of running simulations containing only neutralizing monovalent cations (~0.2M) is well justified.
Currently, MD simulations are not capable of predicting three-dimensional structures of RNA molecules on their own but rather require meaningful starting geometries, preferably high-resolution X-ray crystal structures. If major structural rearrangements occur during the equilibration or the initial stages of a simulation (which is commonly observed when starting from modeled and some NMR-derived structures) they almost always indicate inaccuracies of the starting conformations and not of the force field.57
When assessing the outcome of simulations, it is important to consider all of the aforementioned limitations. The force field may be sufficient for some applications and fail for others. It is prudent to study scientific questions where stacking and base pairing are important and complementary experimental data are available for reference, as such problems are more likely to be properly addressed by contemporary MD simulations. It may even happen that, in a given simulation, different aspects of the simulated molecule(s) are described at different levels of accuracy. For example, simulations have been found to realistically reflect the overall dynamics of guanine quadruplex (G-DNA) stems and the overall electrostatic role of cations in their stabilization, yet the direct, local interactions of the G-quartets with cations are imperfectly described.58,59 In the latter studies, minor distortions of quartet geometries were observed and the radius of the cations appeared oversized (see Figure 1). The force field also fails to properly describe some properties of the flexible single-stranded connecting loops of the G-DNA (i.e., the predicted global minimum does not conform to experiment), and it falls short of predicting the experimentally observed cation binding sites at the stem–loop junctions.58 These observations should be taken into consideration for RNA simulations as well, where it is likely that simulations of, for example, the structural dynamics of hairpin loops or bulged bases are more susceptible to force field imbalances than are simulations of compact stem structures.
Explicit solvent MD simulations represent the gold standard in contemporary modeling of the structural dynamics of nucleic acids in the context of their aqueous environment, which often plays important structural and functional roles. These studies can be further complemented by enhanced sampling techniques. However, such enhancement necessarily introduces additional, quite substantial, approximations. Locally Enhanced Sampling (LES), replica exchange MD (REMD) and targeted MD techniques aim to address the limited sampling problem.
LES splits a selected small part of the molecule (e.g., a loop) into N (3–5) copies that move independently. Although the method does not use a proper Boltzmann distribution, the height of conformational free energy barriers between distinct substates is reduced roughly by 1/N.60 We have successfully applied LES to studies of the single-stranded loops of G-DNA.58 During our studies of the hepatitis delta virus (HDV) ribozyme, however, LES applied to loop 3 and the catalytically involved trefoil turn motif did not converge.61 LES has also been applied to different regions of RNA Kink-turns, although the results were clearly dependent on the number of simulated copies.62
In REMD, several noninteracting copies (replicas) are simulated independently at different temperatures (e.g., 300–500 K). In contrast to LES, each replica is a copy of the whole system. Conformations of the individual simulated systems are exchanged during simulations using Metropolis-like algorithms that consider the probability of sampling each conformation at the alternate temperature.38,63
In targeted MD, force restraints drive the simulated molecule swiftly from a starting to a final conformation. This approach accelerates the simulation timescale, but also introduces a major sampling bias along the route.31 Finally, elevating the temperature (to, e.g., 400 K instead of 300 K) is another crude way to destabilize the simulated molecules and overcome conformational free energy barriers. However, raising the temperature will not produce sampling equivalent to that of a prolonged simulation, and since the pair additive potential is tuned for room temperature simulations, adverse effects such as empty cavities in the bulk solvent may occur at higher temperatures.
The costly explicit solvent can be replaced by a continuum solvent using Poisson Boltzmann (PB) or more approximate Generalized Born (GB) approaches, along with a Surface Area (SA) term. A fundamental advantage of continuum solvent methods is the assessment of hydration and free energies that cannot be derived from explicit solvent models. However, these approaches are based on additional substantial approximations, require parameterization, and their results are very sensitive to the atomic radii used to define the solute cavity. The GB/SA approach can be used to run MD simulations, but there is limited experience with this approach on RNA, and the range of its applicability has yet to be established. GB/SA dynamics show faster sampling because of the absence of friction between solute and solvent, but the method may significantly overstabilize folded structures. Continuum solvent methods can also be applied to systematic molecular mechanical conformational searches (as done recently for kink-turns64). It should be noted that the simplest electrostatic models in vacuo or based on a distance dependent permittivity provide an incorrect (reversed) order of stability for different conformations of a given molecule, while GB/SA is thought to perform better.65
A more accurate alternative is to perform standard explicit solvent MD simulations and subsequently apply the continuum solvent method, by stripping the explicit solvent and averaging free energies over a sufficient number of snapshots (referred to as MM-GBSA or MM-PBSA methods). Since free energy is a state function, one can evaluate free energy differences between distinct substates without simulating the transition. We found this MM-PBSA method to be very instructive in comparing different substates of guanine quadruplex DNA,66 while it failed to predict the absolute values of DAPI binding to B-DNA. In the latter case, the computed results were ~0 or −20 kcal/mol, depending on whether the solute entropy was considered, while the experimental values are −10 to −12 kcal.67 This study also demonstrated that careful parameterization with the aid of quantum chemical calculations is required to obtain a meaningful force field for a drug, while ad hoc force fields (based on analogy with existing atom types or obtained via automated parameterization procedures) can be quite inaccurate, especially regarding torsion profiles between distinct ligand segments. Unfortunately, few MD studies of nucleic acid–drug complexes pay sufficiently close attention to the drug force field parameterization.
We have recently applied the MM-GBSA and MM-PBSA methods to estimate binding energies of ribosomal packing interactions,68 RNA kissing complexes (unpublished), and to monitor the free energy changes along Kink-turn simulation trajectories.64 The calculations, however, were not sufficiently accurate to achieve conclusive results. Zacharias applied such calculations quite successfully to investigate the context dependence of the structure of G/A base pairs in internal RNA loops, though large error margins in the calculations were reported.69
Among the other applications of continuum solvent methods, PB theory was used to roughly estimate binding affinities for several antibiotics to the ribosome,70 and Brooks and coworkers used PB methods to qualitatively look at the assembly of the ribosome. As expected, the results are quite sensitive to parameters such as dielectric constant, dielectric boundaries, and partial charges.71 GB/SA dynamics were also applied to assess the dynamic behavior of the 16S rRNA central domain in the presence and absence of the ribosomal protein S15.72 Finally, Barthel and Zacharias found qualitative (but not quantitative) correspondence between potential-of-mean-force explicit solvent free energy calculation and implicit solvent GB model conformational transitions of single nucleotide RNA bulges.73
In summary, continuum solvent approaches can provide useful insights into the molecular interactions but one should keep in mind that the accuracy of such calculations is rather modest.
In the following sections we describe recent representative simulations of single- and double-stranded RNAs, catalytic RNA molecules, small molecules binding to RNA, and increasingly large RNA–protein complexes. Some studies represent a synergistic interaction between simulation and experiment. While we cannot possibly discuss every publication describing RNA MD simulations (of which there are now several hundred), these examples give a sense of the variety in systems and techniques, and the vast scope of information available from MD simulations.
Much work has been done to model the behavior of various configurations of single-74 and double-stranded75,76 RNA molecules, including simulations focusing on the flexibility of the RNA,35,49,77-81 and the effects that noncanonical base pairs54,68,69,82-84 and modifications85-90 have on RNA structure and dynamics. The stability and dynamics of various RNA structures have been studied, including bulges73 and stem-loop structures,87,90-98 which are often used as model systems to test enhanced MD methods,99,100 as well as more complex structures such as entire tRNA molecules.101 Specific RNA motifs observed in the ribosome such as kink turns,50,62,64,102 the sarcin-ricin loop,103 and loop E54,104-106 have been studied separately, assuming that the modular organization of complex RNA molecules makes an isolated module a good model. Additionally, MD has been used to investigate RNA–RNA interactions, such as HIV-1 dimerization,57,107-112 as well as the interaction of RNA with solvent and ions.45,46,54,57,64,84,104,105,113,114 MD simulation is even being used to study the diffusion and electrophoretic mobility of single-stranded RNA115,116 and the folding of a small RNA hairpin.37
tRNA has been studied using MD techniques for more than 20 years. It was the first RNA molecule to have an X-ray structure and consequently was the first RNA molecule to be studied computationally.117 The MD techniques available for this first simulation of tRNAPhe were very limited, with no explicit treatment of solvent, counterions, or even the hydrogen atoms of the RNA molecule, and a poor treatment of electrostatics. Thus, many tertiary interactions seen in the crystal structure were lost during the simulation. Nevertheless, the simulation maintained important features of the secondary structure, and represented a first step toward using MD techniques to study the motions of RNA molecules. More recent studies have demonstrated stable simulations of fully neutralized and solvated tRNA molecules101 (see Figure 2), and have continued to use tRNA as a model system for MD simulations on large structured RNA molecules.118
Loop E of 5S rRNA is a unique RNA motif that consists of seven consecutive non-Watson–Crick base pairs and binds to the L25 protein in the E. coli ribosome (see Figure 3). MD simulations, along with phylogenetic analyses and isostericity considerations, were used to predict the structure of spinach chloroplast loop E, based on the structure of loop E from E. coli.54 The sequences have only 60% identity, but simulations predicted a nearly identical three-dimensional structure, since changes to the non-Watson–Crick base pairs (including four G-to-A substitutions) preserved the overall shape and unique pockets of negative electrostatic potential. This prediction was confirmed by solution NMR experiments.123 MD simulations of the loop E motif also showed long residency water molecules bound to specific hydration sites in the RNA molecule for up to 5 ns,54 much longer than the 50–500 ps water binding times typically observed in nucleic acid simulations. Some static long residency water molecules remain bound even longer (up to 25 ns, the entire simulation) in the loop E–L25 protein complex, suggesting that specific hydration sites are probably important in structural stabilization of RNA–protein interactions in the ribosome.122 MD simulation is one of the only methods able to probe detailed solvent dynamics and identify specific long-residency hydration sites.
An entirely different, dynamical long-residency hydration site was identified in ribosomal Kink-turns,50,64 where it participates in the A-minor interaction mediating the tertiary contact between the C and NC stems of these recurrent structural motifs. This hydration site provides an unprecedented flexibility to Kink-turns, allowing them to act as unique molecular elbows during, for example, the elongation cycle. Such dynamics of Kink-turns have been implicated in large scale motions in the ribosome during protein synthesis (see Figure 4).50,64,102
Complex RNA folds are also associated with major cation binding sites that are not typically found around standard A-form RNA helices. For example, the loop E motif and RNA kissing complexes form unique ion binding pockets that are continuously occupied by 2–3 monovalent cations in simulations.54,57,104,105 Despite this high occupancy, the kissing complex ions are delocalized and smoothly exchange with bulk solvent (see Figure 5). Such dynamic monovalent ion binding pockets are unlikely to be detected in X-ray diffraction experiments due to dynamical disorder.
Finally, with recent advances in distributed computing, MD simulations can be used to examine not only dynamics in a defined fold, but also the folding of RNA molecules into known structures. Pande and coworkers have developed a Folding@Home distributed computing network, which approximates the computational power of a 150,000 CPU cluster, allowing the cumulative simulation of hundreds of microseconds of explicit solvent MD on small RNA molecules. A recent study investigating the role of solvent molecules in the folding of a 12-nucleotide RNA hairpin (see Figure 6) found an extrapolated folding time of 8.8 μs that matches experimental values.37 Additionally, a comparison of this study with a previous study using implicit solvent showed that water-mediated interactions were important in hydrophobic collapse of the RNA and in mediating long-range nucleation events, suggesting that water molecules, more so than counterions, play a crucial role in RNA folding.
MD simulations have been applied to ribozymes—small RNA molecules that catalyze chemical reactions. While an accurate view of the chemistry itself requires quantum mechanical calculations, standard MD techniques can shed light on molecular interactions that affect the structure and dynamics globally as well as locally at the active sites of ribozymes. The hepatitis delta virus (HDV),45,61,126 hairpin,46,127 and hammerhead ribozymes128,129 have all been studied using MD.
The HDV ribozyme catalyzes the self-cleavage of a specific phosphodiester bond by a transesterification reaction. This activity requires the presence of the C75 nucleotide, which is in close proximity to the active site and cannot be mutated without decreasing activity by more than 100-fold, suggesting its direct involvement in chemistry. The exact role of C75 remains unclear, however, with biochemical and structural evidence supporting both general acid and general base models.130-135 MD simulations of pre- and post-cleavage HDV ribozyme constructs show hydrogen bonding patterns consistent with a general base mechanism.126 An unprotonated C75 transiently forms a hydrogen bond necessary for general base activity, while a protonated C75 does not form the hydrogen bond required in the general acid mechanism. Thus, assuming that the precursor X-ray crystal structure is sufficiently close to the transition state, the results from MD simulations suggest that C75 acts as the general base rather than the general acid in catalysis (see Figure 7).
The hairpin ribozyme also catalyzes self-cleavage via a transesterification reaction where the details of the catalytic mechanism remain unclear. Our recent study combined MD and single molecule fluorescence techniques to explore the role of water molecules in the catalytic core of the hairpin ribozyme.46 When comparing the wild-type ribozyme to several mutants, a linear relationship was observed between the number of hydrogen bonds lost in the MD simulations and the loss of docking free energy calculated from single molecule fluorescence data, revealing a quantitative agreement between simulation and experiment. Analysis of the MD trajectories showed that a network of hydrogen bonds in the catalytic core of the ribozyme involves several long-residency, trapped water molecules. These water molecules are in a position where they could plausibly be involved in reaction chemistry (see Figure 8), suggesting that interactions with solvent are important not only for structure, flexibility, and folding of RNA molecules as described earlier, but also potentially for catalytic activity. These observations underscore that an analysis of specific hydration sites should probably be done for all explicit solvent MD simulations of RNA.
Understanding how RNA interacts with small molecules is crucial in drug design, where small molecules are optimized to bind a target RNA, and in characterizing natural and in vitro selected RNA riboswitches (aptamers) that bind specific small molecules. Obtaining a high-quality force field for small molecules can be difficult, since specialized parameterization based on quantum chemical data is necessary unless the ligand is rigid.67 Nevertheless, these types of MD simulations can elucidate details of small molecule binding to RNA, and can provide information, often missing from other techniques, on the role(s) of dynamics and solvent molecules in these interactions. Several studies have examined the binding of antibiotics to the hammerhead ribozyme137 and to various sites in the ribosome,53,138 while others have examined small molecule–RNA interactions such as pyrene binding to RNA duplexes139 and ligands such as FMN140 binding to their RNA aptamers.
The interaction between aminoglycoside antibiotics and the A-site of the ribosome has been well studied as a model system for small molecule binding to RNA, with a wealth of available crystal structures and biochemical data. A recent MD study53 on a paromomycin–A-site complex analyzed the hydrogen bonding network connecting the antibiotic to the RNA and verified that the conserved neamine portion of paromomycin was most stably bound to the RNA, with both more numerous and longer-lived hydrogen bonds than are established by rings III and IV of paromomycin. An analysis of hydration patterns reproduced crystallographic water binding sites and suggested additional long-residency sites that are thought to play a key role in mediating antibiotic–RNA interactions (see Figure 9).
RNA–protein interactions are ubiquitous and play a key role in cellular regulation. MD simulations of RNA–protein interactions have dramatically increased in recent years, especially since high-resolution crystal structures of the ribosome became available.141 The large variety in RNA–protein simulations now includes proteins binding RNA fragments,142-147 HIV RNA–protein interactions,148-151 RNase–RNA interactions,144 various ribosomal protein–ribosomal RNA interactions,72,122,152,153 simulations of peptide bond formation,154 coarse-grained155 and all-atom30,31 simulations of the complete ribosome, and even an all-atom simulation of the entire satellite tobacco mosaic virus.156 One protein–RNA interaction that has been studied in great detail using MD simulations is the U1A–RNA complex,157-164 discussed later.
The U1A protein, part of the U1 small nuclear ribonucleoprotein (snRNP) complex, is a well-studied RNA-binding protein that contains RNA recognition motifs (RRMs) and binds to both a hairpin in U1 RNA and its own 3′-UTR with picomolar affinities. Such U1A–RNA complexes were among the first protein–RNA interactions to be studied using MD techniques, and simulations of U1A–RNA complexes continue to enrich understanding of RNA binding proteins. Initial studies of the U1A protein bound to either RNA binding site158 were consistent with experimental data, showing stable hydrogen bonding between the protein and a loop contained in both RNAs, while the limited flexibility of the RNA–protein interface agreed with experimental thermodynamic data. This agreement established the validity of using MD to study protein–RNA complexes. Since then, this technique has been used alongside experiment to probe the U1A–RNA interaction in much greater detail, with recent studies focusing on the roles of specific residues. One simulation, for example, showed substantial variation in the time three lysine residues spend near the RNA backbone,163 explaining differences in their experimentally observed effects on complex stability. A second study, probing the effects of compensatory mutations in RNA and protein residues,164 used free energy and hydrogen bonding analyses to show that two induced fit processes affect binding. This approach explained experimental affinity data on several mutants (see Figure 10), providing information important for future rational design of RNA–protein interactions. Additionally, recent cross-correlation analysis of a U1A–RNA MD simulation162 was consistent with experimentally observed cooperativity and a statistical covariance analysis, and also made predictions about the roles of individual residues that can be further tested experimentally.
The range of RNA–protein complexes accessible to MD simulations is continually increasing, and now even all-atom simulations of the ribosome are becoming possible. Sanbonmatsu has used targeted MD techniques on the entire ribosome31 to investigate the process of message decoding, where the tRNA molecule moves from the “A/T” to the “A/A” site in response to a match between mRNA codon and tRNA anticodon. While targeted MD simulations represent unrealistic sampling, they do suggest possible pathways between an initial and final structure than can be tested experimentally. The ribosome study suggested that highly conserved residues in the A loop and Helix 89 interact with the tRNA molecule as it moves to the A/A state (see Figure 11). This observation generally agrees with biochemical data, providing evidence that MD simulations on large systems can aid in understanding the biological function of very complex RNA–protein complexes.
MD simulations of RNA have reached a point where they can be used in a wide variety of systems to complement experiment and verify or explain experimental results. There are a number of limitations in the use and interpretation of simulation data, including the short time scale and force field inaccuracies of MD simulations. However, the timescales of experiment and simulation are getting closer, with experiments able to access faster and faster dynamics and with more computational power leading to longer simulations. The future progress of MD simulations will critically depend on further tuning of the force fields (including development of polarization potentials), development of better methods to estimate free energies (in order to directly link molecular structures and free energies), and integration of molecular mechanics with quantum mechanical treatments (in order to deal with enzymatic reactions). All three tasks are formidable, but will allow MD simulations to address an even wider variety of biologically relevant questions in the future. The gap between theory and experiment is closing, and it is becoming clear that computational methods and particularly single molecule experiments are often highly complementary and synergistic. Collaborations between theorists and experimentalists can therefore be very productive and mutually beneficial. Increased cooperation, facilitated by learning each other's scientific language, will greatly aid in understanding the structure–function relationships of diverse and complex RNA molecules in our newly discovered post-genomic RNA World. This review shares a snapshot of the current state-of-the-art and hopes to inspire increased, yet prudent, use of MD tools in the ever-expanding RNA field.
We thank Jana Sefcikova for critical reading of the manuscript and Martin Zacharias for helpful discussion.
Contract grant sponsor: NSF predoctoral fellowship to S.E.M.
Contract grant sponsors: National Institutes of Health
Contract grant number: GM62357
Contract grant sponsor: Wellcome Trust
Contract grant number: GR067507
Contract grant sponsor: Grant Agency of Czech Republic
Contract grant numbers: GA203_05_0009, GA203_05_0388
Contract grant sponsor: Internal Grant Agency, Academy of Sciences
Contract grant number: 1QS500040581
Contract grant sponsor: Ministry of Education, Youth, and Sports
Contract grant numbers: LC06030, LC512, MSM0021622413, AVOZ50040507, AVOZ40550506