A novel method that uses the conformational distribution of Cα atoms in known structures is used to build short missing regions (‘loops’) in protein models. An initial tree of possible loop paths is pruned according to structural and electron-density criteria and the most likely loop conformation(s) are selected and built.
One of the most cumbersome and time-demanding tasks in completing a protein model is building short missing regions or ‘loops’. A method is presented that uses structural and electron-density information to build the most likely conformations of such loops. Using the distribution of angles and dihedral angles in pentapeptides as the driving parameters, a set of possible conformations for the Cα backbone of loops was generated. The most likely candidate is then selected in a hierarchical manner: new and stronger restraints are added while the loop is built. The weight of the electron-density correlation relative to geometrical considerations is gradually increased until the most likely loop is selected on map correlation alone. To conclude, the loop is refined against the electron density in real space. This is started by using structural information to trace a set of models for the Cα backbone of the loop. Only in later steps of the algorithm is the electron-density correlation used as a criterion to select the loop(s). Thus, this method is more robust in low-density regions than an approach using density as a primary criterion. The algorithm is implemented in a loop-building program, Loopy, which can be used either alone or as part of an automatic building cycle. Loopy can build loops of up to 14 residues in length within a couple of minutes. The average root-mean-square deviation of the Cα atoms in the loops built during validation was less than 0.4 Å. When implemented in the context of automated model building in ARP/wARP, Loopy can increase the completeness of the built models.
model building; loop modelling; Loopy
The highly automated PHENIX AutoBuild wizard is described. The procedure can be applied equally well to phases derived from isomorphous/anomalous and molecular-replacement methods.
The PHENIX AutoBuild wizard is a highly automated tool for iterative model building, structure refinement and density modification using RESOLVE model building, RESOLVE statistical density modification and phenix.refine structure refinement. Recent advances in the AutoBuild wizard and phenix.refine include automated detection and application of NCS from models as they are built, extensive model-completion algorithms and automated solvent-molecule picking. Model-completion algorithms in the AutoBuild wizard include loop building, crossovers between chains in different models of a structure and side-chain optimization. The AutoBuild wizard has been applied to a set of 48 structures at resolutions ranging from 1.1 to 3.2 Å, resulting in a mean R factor of 0.24 and a mean free R factor of 0.29. The R factor of the final model is dependent on the quality of the starting electron density and is relatively independent of resolution.
model building; model completion; macromolecular models; Protein Data Bank; structure refinement; PHENIX
RCrane is a new tool for the partially automated building of RNA crystallographic models into electron-density maps of low or intermediate resolution. This tool helps crystallographers to place phosphates and bases into electron density and then automatically predicts and builds the detailed all-atom structure of the traced nucleotides.
RNA crystals typically diffract to much lower resolutions than protein crystals. This low-resolution diffraction results in unclear density maps, which cause considerable difficulties during the model-building process. These difficulties are exacerbated by the lack of computational tools for RNA modeling. Here, RCrane, a tool for the partially automated building of RNA into electron-density maps of low or intermediate resolution, is presented. This tool works within Coot, a common program for macromolecular model building. RCrane helps crystallographers to place phosphates and bases into electron density and then automatically predicts and builds the detailed all-atom structure of the traced nucleotides. RCrane then allows the crystallographer to review the newly built structure and select alternative backbone conformations where desired. This tool can also be used to automatically correct the backbone structure of previously built nucleotides. These automated corrections can fix incorrect sugar puckers, steric clashes and other structural problems.
RCrane; RNA model building
A semi-automated computational procedure to assist in the identification of bound ligands from unknown electron density has been developed. The atomic surface surrounding the density blob is compared to a library of three-dimensional ligand binding surfaces extracted from the Protein Data Bank (PDB). Ligands corresponding to surfaces which share physicochemical texture and geometric shape similarities are considered for assignment. The method is benchmarked against a set of well represented ligands from the PDB, in which we show that we can identify the correct ligand based on the corresponding binding surface. Finally, we apply the method during model building and refinement stages from structural genomics targets in which unknown density blobs were discovered. A semi-automated computational method is described which aims to assist crystallographers with assigning the identity of a ligand corresponding to unknown electron density. Using shape and physicochemical similarity assessments between the protein surface surrounding the density and a database of known ligand binding surfaces, a plausible list of candidate ligands are identified for consideration. The method is validated against highly observed ligands from the Protein Data Bank and results are shown from its use in a high-throughput structural genomics pipeline.
Electron density assignment; Function annotation; Ligand identification; Ligand assignment; Protein surfaces
A method for automated macromolecular side-chain model building and for aligning the sequence to the map is described.
An algorithm is described for automated building of side chains in an electron-density map once a main-chain model is built and for alignment of the protein sequence to the map. The procedure is based on a comparison of electron density at the expected side-chain positions with electron-density templates. The templates are constructed from average amino-acid side-chain densities in 574 refined protein structures. For each contiguous segment of main chain, a matrix with entries corresponding to an estimate of the probability that each of the 20 amino acids is located at each position of the main-chain model is obtained. The probability that this segment corresponds to each possible alignment with the sequence of the protein is estimated using a Bayesian approach and high-confidence matches are kept. Once side-chain identities are determined, the most probable rotamer for each side chain is built into the model. The automated procedure has been implemented in the RESOLVE software. Combined with automated main-chain model building, the procedure produces a preliminary model suitable for refinement and extension by an experienced crystallographer.
model building; template matching
Recent developments in PHENIX are reported that allow the use of reference-model torsion restraints, secondary-structure hydrogen-bond restraints and Ramachandran restraints for improved macromolecular refinement in phenix.refine at low resolution.
Traditional methods for macromolecular refinement often have limited success at low resolution (3.0–3.5 Å or worse), producing models that score poorly on crystallographic and geometric validation criteria. To improve low-resolution refinement, knowledge from macromolecular chemistry and homology was used to add three new coordinate-restraint functions to the refinement program phenix.refine. Firstly, a ‘reference-model’ method uses an identical or homologous higher resolution model to add restraints on torsion angles to the geometric target function. Secondly, automatic restraints for common secondary-structure elements in proteins and nucleic acids were implemented that can help to preserve the secondary-structure geometry, which is often distorted at low resolution. Lastly, we have implemented Ramachandran-based restraints on the backbone torsion angles. In this method, a ϕ,ψ term is added to the geometric target function to minimize a modified Ramachandran landscape that smoothly combines favorable peaks identified from nonredundant high-quality data with unfavorable peaks calculated using a clash-based pseudo-energy function. All three methods show improved MolProbity validation statistics, typically complemented by a lowered R
free and a decreased gap between R
work and R
macromolecular crystallography; low resolution; refinement; automation
A method is presented for the automatic building of nucleotide chains into electron density which is fast enough to be used in interactive model-building software. Likely nucleotides lying in the vicinity of the current view are located and then grown into connected chains in a fraction of a second. When this development is combined with existing tools, assisted manual model building is as simple as or simpler than for proteins.
The crystallographic structure solution of nucleotides and nucleotide complexes is now commonplace. The resulting electron-density maps are often poorer than for proteins, and as a result interpretation in terms of an atomic model can require significant effort, particularly in the case of large structures. While model building can be performed automatically, as with proteins, the process is time-consuming, taking minutes to days depending on the software and the size of the structure. A method is presented for the automatic building of nucleotide chains into electron density which is fast enough to be used in interactive model-building software, with extended chain fragments built around the current view position in a fraction of a second. The speed of the method arises from the determination of the ‘fingerprint’ of the sugar and phosphate groups in terms of conserved high-density and low-density features, coupled with a highly efficient scoring algorithm. Use cases include the rapid evaluation of an initial electron-density map, addition of nucleotide fragments to prebuilt protein structures, and in favourable cases the completion of the structure while automated model-building software is still running. The method has been incorporated into the Coot software package.
nucleic acid chain tracing; Coot
DEN refinement and automated model building with AutoBuild were used to determine the structure of a putative succinyl-diaminopimelate desuccinylase from C. glutamicum. This difficult case of molecular-replacement phasing shows that the synergism between DEN refinement and AutoBuild outperforms standard refinement protocols.
Phasing by molecular replacement remains difficult for targets that are far from the search model or in situations where the crystal diffracts only weakly or to low resolution. Here, the process of determining and refining the structure of Cgl1109, a putative succinyl-diaminopimelate desuccinylase from Corynebacterium glutamicum, at ∼3 Å resolution is described using a combination of homology modeling with MODELLER, molecular-replacement phasing with Phaser, deformable elastic network (DEN) refinement and automated model building using AutoBuild in a semi-automated fashion, followed by final refinement cycles with phenix.refine and Coot. This difficult molecular-replacement case illustrates the power of including DEN restraints derived from a starting model to guide the movements of the model during refinement. The resulting improved model phases provide better starting points for automated model building and produce more significant difference peaks in anomalous difference Fourier maps to locate anomalous scatterers than does standard refinement. This example also illustrates a current limitation of automated procedures that require manual adjustment of local sequence misalignments between the homology model and the target sequence.
reciprocal-space refinement; DEN refinement; real-space refinement; automated model building; succinyl-diaminopimelate desuccinylase
A method for automated macromolecular main-chain model building is described.
An algorithm for the automated macromolecular model building of polypeptide backbones is described. The procedure is hierarchical. In the initial stages, many overlapping polypeptide fragments are built. In subsequent stages, the fragments are extended and then connected. Identification of the locations of helical and β-strand regions is carried out by FFT-based template matching. Fragment libraries of helices and β-strands from refined protein structures are then positioned at the potential locations of helices and strands and the longest segments that fit the electron-density map are chosen. The helices and strands are then extended using fragment libraries consisting of sequences three amino acids long derived from refined protein structures. The resulting segments of polypeptide chain are then connected by choosing those which overlap at two or more Cα positions. The fully automated procedure has been implemented in RESOLVE and is capable of model building at resolutions as low as 3.5 Å. The algorithm is useful for building a preliminary main-chain model that can serve as a basis for refinement and side-chain addition.
model building; template matching; fragment extension
Automatic modeling methods using cryo-electron microscopy (cryoEM) density maps as constrains are promising approaches to building atomic models of individual proteins or protein domains. However, their application to large macromolecular assemblies has not been possible largely due to computational limitations inherent to such unsupervised methods. Here we describe a new method, EM-IMO, for building, modifying and refining local structures of protein models using cryoEM maps as a constraint. As a supervised refinement method, EM-IMO allows users to specify parameters derived from inspections, so as to guide, and as a consequence, significantly speed up the refinement. An EM-IMO-based refinement protocol is first benchmarked on a data set of 50 homology models using simulated density maps. A multi-scale refinement strategy that combines EM-IMO-based and molecular dynamics (MD)-based refinement is then applied to build backbone models for the seven conformers of the five capsid proteins in our near-atomic resolution cryoEM map of the grass carp reovirus (GCRV) virion, a member of the aquareovirus genus of the Reoviridae family. The refined models allow us to reconstruct a backbone model of the entire GCRV capsid and provide valuable functional insights that are described in the accompanying publication. Our study demonstrates that the integrated use of homology modeling and a multi-scale refinement protocol that combines supervised and automated structure refinement offers a practical strategy for building atomic models based on medium- to high-resolution cryoEM density maps.
cryo-electron microscopy; density fitting; homology modeling; structure refinement; protein structure prediction
A description is given of new tools to facilitate model building and refinement into electron cryo-microscopy reconstructions.
The recent rapid development of single-particle electron cryo-microscopy (cryo-EM) now allows structures to be solved by this method at resolutions close to 3 Å. Here, a number of tools to facilitate the interpretation of EM reconstructions with stereochemically reasonable all-atom models are described. The BALBES database has been repurposed as a tool for identifying protein folds from density maps. Modifications to Coot, including new Jiggle Fit and morphing tools and improved handling of nucleic acids, enhance its functionality for interpreting EM maps. REFMAC has been modified for optimal fitting of atomic models into EM maps. As external structural information can enhance the reliability of the derived atomic models, stabilize refinement and reduce overfitting, ProSMART has been extended to generate interatomic distance restraints from nucleic acid reference structures, and a new tool, LIBG, has been developed to generate nucleic acid base-pair and parallel-plane restraints. Furthermore, restraint generation has been integrated with visualization and editing in Coot, and these restraints have been applied to both real-space refinement in Coot and reciprocal-space refinement in REFMAC.
model building; refinement; electron cryo-microscopy reconstructions; LIBG
A multivariate analysis of the backbone and sugar torsion angles of dinucleotide fragments was used to construct a 3D principal conformational subspace (PCS) of DNA duplex crystal structures. The potential energy surface (PES) within the PCS was mapped for a single-strand dinucleotide model using an empirical energy function. The low energy regions of the surface encompass known DNA forms and also identify previously unclassified conformers. The physical determinants of the conformational landscape are found to be predominantly steric interactions within the dinucleotide backbone, with medium-dependent backbone-base electrostatic interactions serving to tune the relative stability of the different local energy minima. The fidelity of the PES to duplex DNA properties is validated through a correspondence to the conformational distribution of duplex DNA crystal structures and the reproduction of observed sequence specific propensities for the formation of A-form DNA. The utility of the PES is demonstrated through its succinct and accurate description of complex conformational processes in simulations of duplex DNA. The study suggests that stereochemical considerations of the nucleic acid backbone play a role in determining conformational preferences of DNA which is analogous to the role of local steric interactions in determining polypeptide secondary structure.
A map-likelihood function is described that can yield phase probabilities with very low model bias.
The recently developed technique of maximum-likelihood density modification [Terwilliger (2000 ▶), Acta Cryst. D56, 965–972] allows a calculation of phase probabilities based on the likelihood of the electron-density map to be carried out separately from the calculation of any prior phase probabilities. Here, it is shown that phase-probability distributions calculated from the map-likelihood function alone can be highly accurate and that they show minimal bias towards the phases used to initiate the calculation. Map-likelihood phase probabilities depend upon expected characteristics of the electron-density map, such as a defined solvent region and expected electron-density distributions within the solvent region and the region occupied by a macromolecule. In the simplest case, map-likelihood phase-probability distributions are largely based on the flatness of the solvent region. Though map-likelihood phases can be calculated without prior phase information, they are greatly enhanced by high-quality starting phases. This leads to the technique of prime-and-switch phasing for removing model bias. In prime-and-switch phasing, biased phases such as those from a model are used to prime or initiate map-likelihood phasing, then final phases are obtained from map-likelihood phasing alone. Map-likelihood phasing can be applied in cases with solvent content as low as 30%. Potential applications of map-likelihood phasing include unbiased phase calculation from molecular-replacement models, iterative model building, unbiased electron-density maps for cases where 2Fo − Fc or σA-weighted maps would currently be used, structure validation and ab initio phase determination from solvent masks, non-crystallographic symmetry or other knowledge about expected electron density.
MolProbity structure validation will diagnose most local errors in macromolecular crystal structures and help to guide their correction.
MolProbity is a structure-validation web service that provides broad-spectrum solidly based evaluation of model quality at both the global and local levels for both proteins and nucleic acids. It relies heavily on the power and sensitivity provided by optimized hydrogen placement and all-atom contact analysis, complemented by updated versions of covalent-geometry and torsion-angle criteria. Some of the local corrections can be performed automatically in MolProbity and all of the diagnostics are presented in chart and graphical forms that help guide manual rebuilding. X-ray crystallography provides a wealth of biologically important molecular data in the form of atomic three-dimensional structures of proteins, nucleic acids and increasingly large complexes in multiple forms and states. Advances in automation, in everything from crystallization to data collection to phasing to model building to refinement, have made solving a structure using crystallography easier than ever. However, despite these improvements, local errors that can affect biological interpretation are widespread at low resolution and even high-resolution structures nearly all contain at least a few local errors such as Ramachandran outliers, flipped branched protein side chains and incorrect sugar puckers. It is critical both for the crystallographer and for the end user that there are easy and reliable methods to diagnose and correct these sorts of errors in structures. MolProbity is the authors’ contribution to helping solve this problem and this article reviews its general capabilities, reports on recent enhancements and usage, and presents evidence that the resulting improvements are now beneficially affecting the global database.
all-atom contacts; clashscore; automated correction; KiNG; ribose pucker; Ramachandran plots; side-chain rotamers; model quality; systematic errors; database improvement
The 2′-deoxyguanosine-3′,5′-diphosphate, 2′-deoxyadenosine-3′,5′-diphosphate, 2′-deoxycytidine-3′,5′-diphosphate and 2′-deoxythymidine-3′,5′-diphosphate systems are the smallest units of a DNA single strand. Exploring these comprehensive subunits with reliable density functional methods enables one to approach reasonable predictions of the properties of DNA single strands. With these models, DNA single strands are found to have a strong tendency to capture low-energy electrons. The vertical attachment energies (VEAs) predicted for 3′,5′-dTDP (0.17 eV) and 3′,5′-dGDP (0.14 eV) indicate that both the thymine-rich and the guanine-rich DNA single strands have the ability to capture electrons. The adiabatic electron affinities (AEAs) of the nucleotides considered here range from 0.22 to 0.52 eV and follow the order 3′,5′-dTDP > 3′,5′-dCDP > 3′,5′-dGDP > 3′,5′-dADP. A substantial increase in the AEA is observed compared to that of the corresponding nucleic acid bases and the corresponding nucleosides. Furthermore, aqueous solution simulations dramatically increase the electron attracting properties of the DNA single strands. The present investigation illustrates that in the gas phase, the excess electron is situated both on the nucleobase and on the phosphate moiety for DNA single strands. However, the distribution of the extra negative charge is uneven. The attached electron favors the base moiety for the pyrimidine, while it prefers the 3′-phosphate subunit for the purine DNA single strands. In contrast, the attached electron is tightly bound to the base fragment for the cytidine, thymidine and adenosine nucleotides, while it almost exclusively resides in the vicinity of the 3′-phosphate group for the guanosine nucleotides due to the solvent effects. The comparatively low vertical detachment energies (VDEs) predicted for 3′,5′-dADP− (0.26 eV) and 3′,5′-dGDP− (0.32 eV) indicate that electron detachment might compete with reactions having high activation barriers such as glycosidic bond breakage. However, the radical anions of the pyrimidine nucleotides with high VDE are expected to be electronically stable. Thus the base-centered radical anions of the pyrimidine nucleotides might be the possible intermediates for DNA single-strand breakage.
The challenges that arise in nucleic acid model building as a consequence of their simpler and more symmetric super-secondary structures are addressed.
The process of building and refining crystal structures of nucleic acids, although similar to that for proteins, has some peculiarities that give rise to both various complications and various benefits. Although conventional isomorphous replacement phasing techniques are typically used to generate an experimental electron-density map for the purposes of determining novel nucleic acid structures, it is also possible to couple the phasing and model-building steps to permit the solution of complex and novel RNA three-dimensional structures without the need for conventional heavy-atom phasing approaches.
nucleic acids; model building; refinement
Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.
The three-dimensional structure of a protein enables it to perform its specific function, which may be catalysis, DNA binding, cell signaling, maintaining cell shape and structure, or one of many other functions. Predicting the structures of proteins is an important goal of computational biology. One way of doing this is to figure out the rules that determine protein structure from protein sequences by determining how local protein sequence is associated with local protein structure. That is, many (but not all) of the interactions that determine protein structure occur between amino acids that are a short distance away from each other in the sequence. This is particularly true in the irregular parts of protein structure, often called loops. In this work, we have performed a statistical analysis of the structure of the protein backbone in loops as a function of the protein sequence. We have determined how an amino acid bends the local backbone due to its amino acid type and the amino acid types of its neighbors. We used a recently developed statistical method that is particularly suited to this problem. The analysis shows that backbone conformation prediction can be improved using the information in the statistical distributions we have developed.
The gas-phase structures of deprotonated, protonated, and sodium-cationized complexes of diethyl phosphate (DEP) including [DEP − H]−, [DEP + H]+, [DEP + Na]+, and [DEP − H + 2Na]+ are examined via infrared multiple photon dissociation (IRMPD) action spectroscopy using tunable IR radiation generated by a free electron laser, a Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR MS) with an electrospray ionization (ESI) source, and theoretical electronic structure calculations. Measured IRMPD spectra are compared to linear IR spectra calculated at the B3LYP/6-31G(d,p) level of theory to identify the structures accessed in the experimental studies. For comparison, theoretical studies of neutral complexes are also performed. These experiments and calculations suggest that specific geometric changes occur upon the binding of protons and/or sodium cations, including changes correlating to nucleic acid backbone geometry, specifically P–O bond lengths and ∠OPO bond angles. Information from these observations may be used to gain insight into the structures of more complex systems, such as nucleotides and solvated nucleic acids.
Electronic supplementary material
The online version of this article (doi:10.1007/s13361-010-0007-6) contains supplementary material, which is available to authorized users.
Diethyl phosphate; Free electron laser; Infrared multiple photon dissociation; Protons; Sodium cations
X-ray crystallographic studies on 3'-5' oligomers have provided a great deal of information on the stereochemistry and conformational flexibility of nucleic acids and polynucleotides. In contrast, there is very little information available on 2'-5' polynucleotides. We have now obtained the crystal structure of Cytidylyl-2',5'-Adenosine (C2'p5'A) at atomic resolution to establish the conformational differences between these two classes of polymers. The dinucleoside phosphate crystallises in the monoclinic space group C2, with a = 33.912(4)A, b = 16.824(4)A, c = 12.898(2)A and beta = 112.35(1) with two molecules in the asymmetric unit. Spectacularly, the two independent C2'p5'A molecules in the asymmetric unit form right handed miniature parallel stranded double helices with their respective crystallographic two fold (b axis) symmetry mates. Remarkably, the two mini duplexes are almost indistinguishable. The cytosines and adenines form self-pairs with three and two hydrogen bonds respectively. The conformation of the C and A residues about the glycosyl bond is anti same as in the 3'-5' analog but contrasts the anti and syn geometry of C and A residues in A2'p5'C. The furanose ring conformation is C3' endo, C2' endo mixed puckering as in the C3'p5'A-proflavine complex. A comparison of the backbone torsion angles with other 2'-5' dinucleoside structures reveals that the major deviations occur in the torsion angles about the C3'-C2' and C4'-C3' bonds. A right-handed 2'-5' parallel stranded double helix having eight base pairs per turn and 45 degrees turn angle between them has been constructed using this dinucleoside phosphate as repeat unit. A discussion on 2'-5' parallel stranded double helix and its relevance to biological systems is presented.
Grouping the 20 residues is a classic strategy to discover ordered patterns and insights about the fundamental nature of proteins, their structure, and how they fold. Usually, this categorization is based on the biophysical and/or structural properties of a residue’s side-chain group. We extend this approach to understand the effects that side-chains have upon backbone conformation and perform a knowledge-based classification of amino acids by comparing their backbone φ,ψ distributions in different types of secondary structure. At this finer, more specific resolution, the torsion angle data is often sparse and discontinuous (especially for the non-helical classes) even though a comprehensive set of protein structures is used. To insure the precision of the Ramachandran plot comparisons, we applied a rigorous Bayesian density estimation method that produces continuous estimates of the backbone φ,ψ distributions. Based on this statistical modeling, a robust, hierarchical clustering was performed using a divergence score to measure the similarity between plots. There were 7 general groups based on the clusters from the complete Ramachandran data: nonpolar/β-branched (Ile & Val), AsX (Asn & Asp), long (Met, Gln, Arg, Glu, Lys, & Leu), aromatic (Phe, Tyr, His, & Cys), small (Ala & Ser), bulky (Thr & Trp), and lastly the singletons of Gly and Pro. At the level of 4 types of secondary structure (helix, sheet, turn, and coil), these groups remain somewhat consistent, although there are a few significant variations. Besides the expected uniqueness of the Gly and Pro distributions, the nonpolar/β-branched and AsX clusters were very consistent across all types of secondary structure. Effectively, this consistency across the secondary structure classes imply that side-chain steric effects strongly influence a residue’s backbone torsion angle conformation. These results help to explain the plasticity of amino acid substitutions on protein structure, and should help in protein design and structure evaluation.
Ramachandran Plot; Torsion Angles; Bayesian Density Estimation; Clustering; Residue Backbone Similarity
The PDB_REDO pipeline aims to improve macromolecular structures by optimizing the crystallographic refinement parameters and performing partial model building. Here, algorithms are presented that allowed a web-server implementation of PDB_REDO, and the first user results are discussed.
The refinement and validation of a crystallographic structure model is the last step before the coordinates and the associated data are submitted to the Protein Data Bank (PDB). The success of the refinement procedure is typically assessed by validating the models against geometrical criteria and the diffraction data, and is an important step in ensuring the quality of the PDB public archive [Read et al. (2011 ▶), Structure, 19, 1395–1412]. The PDB_REDO procedure aims for ‘constructive validation’, aspiring to consistent and optimal refinement parameterization and pro-active model rebuilding, not only correcting errors but striving for optimal interpretation of the electron density. A web server for PDB_REDO has been implemented, allowing thorough, consistent and fully automated optimization of the refinement procedure in REFMAC and partial model rebuilding. The goal of the web server is to help practicing crystallographers to improve their model prior to submission to the PDB. For this, additional steps were implemented in the PDB_REDO pipeline, both in the refinement procedure, e.g. testing of resolution limits and k-fold cross-validation for small test sets, and as new validation criteria, e.g. the density-fit metrics implemented in EDSTATS and ligand validation as implemented in YASARA. Innovative ways to present the refinement and validation results to the user are also described, which together with auto-generated Coot scripts can guide users to subsequent model inspection and improvement. It is demonstrated that using the server can lead to substantial improvement of structure models before they are submitted to the PDB.
PDB_REDO; validation; model optimization
Folded RNA molecules are shaped by an astonishing variety of highly conserved noncanonical molecular interactions and backbone topologies. The dinucleotide platform is a widespread recurrent RNA modular building submotif formed by the side-by-side pairing of bases from two consecutive nucleotides within a single strand, with highly specific sequence preferences. This unique arrangement of bases is cemented by an intricate network of noncanonical hydrogen bonds and facilitated by a distinctive backbone topology. The present study investigates the gas-phase intrinsic stabilities of the three most common RNA dinucleotide platforms — 5′-GpU-3′, ApA, and UpC — via state-of-the-art quantum-chemical (QM) techniques. The mean stability of base-base interactions decreases with sequence in the order GpU > ApA > UpC. Bader’s atoms-in-molecules analysis reveals that the N2(G)…O4(U) hydrogen bond of the GpU platform is stronger than the corresponding hydrogen bonds in the other two platforms. The mixed-pucker sugar-phosphate backbone conformation found in most GpU platforms, in which the 5′-ribose sugar (G) is in the C2′-endo form and the 3′-sugar (U) in the C3′-endo form, is intrinsically more stable than the standard A-RNA backbone arrangement, partially as a result of a favorable O2′…O2P intra-platform interaction. Our results thus validate the hypothesis of Lu et al. (Lu Xiang-Jun, et al. Nucleic Acids Res. 2010, 38, 4868-4876), that the superior stability of GpU platforms is partially mediated by the strong O2′…O2P hydrogen bond. In contrast, ApA and especially UpC platform-compatible backbone conformations are rather diverse and do not display any characteristic structural features. The average stabilities of ApA and UpC derived backbone conformers are also lower than those of GpU platforms. Thus, the observed structural and evolutionary patterns of the dinucleotide platforms can be accounted for, to a large extent, by their intrinsic properties as described by modern QM calculations. In contrast, we show that the dinucleotide platform is not properly described in the course of atomistic explicit-solvent simulations. Our work also gives methodological insights into QM calculations of experimental RNA backbone geometries. Such calculations are inherently complicated by rather large data and refinement uncertainties in the available RNA experimental structures, which often preclude reliable energy computations.
RNA structure; QM calculations; dinucleotide platforms; GpU; ApA; UpC; RNA backbone; RNA submotif; O2′…O2P
We describe a rational approach devoted to modulate the sugar-phosphate backbone geometry of nucleic acids. Constraints were generated by connecting one oxygen of the phosphate group to a carbon of the sugar moiety. The so-called dioxaphosphorinane rings were introduced at key positions along the sugar-phosphate backbone allowing the control of the six-torsion angles α to ζ defining the polymer structure. The syntheses of all the members of the D-CNA family are described, and we emphasize the effect on secondary structure stabilization of a couple of diastereoisomers of α,β-D-CNA exhibiting wether B-type canonical values or not.
The automated building of a protein model into an electron density map remains a challenging problem. In the ARP/wARP approach, model building is facilitated by initially interpreting a density map with free atoms of unknown chemical identity; all structural information for such chemically unassigned atoms is discarded. Here, this is remedied by applying restraints between free atoms, and between free atoms and a partial protein model. These are based on geometric considerations of protein structure and tentative (conditional) assignments for the free atoms. Restraints are applied in the REFMAC5 refinement program and are generated on an ad hoc basis, allowing them to fluctuate from step to step. A large set of experimentally phased and molecular replacement structures showcases individual structures where automated building is improved drastically by the conditional restraints. The concept and implementation we present can also find application in restraining geometries, such as hydrogen bonds, in low-resolution refinement.
Single-molecule tweezers measurements of double-stranded nucleic acids (dsDNA and dsRNA) provide unprecedented opportunities to dissect how these fundamental molecules respond to forces and torques analogous to those applied by topoisomerases, viral capsids, and other biological partners. However, tweezers data are still most commonly interpreted post facto in the framework of simple analytical models. Testing falsifiable predictions of state-of-the-art nucleic acid models would be more illuminating but has not been performed. Here we describe a blind challenge in which numerical predictions of nucleic acid mechanical properties were compared to experimental data obtained recently for dsRNA under applied force and torque. The predictions were enabled by the HelixMC package, first presented in this paper. HelixMC advances crystallography-derived base-pair level models (BPLMs) to simulate kilobase-length dsDNAs and dsRNAs under external forces and torques, including their global linking numbers. These calculations recovered the experimental bending persistence length of dsRNA within the error of the simulations and accurately predicted that dsRNA's “spring-like” conformation would give a two-fold decrease of stretch modulus relative to dsDNA. Further blind predictions of helix torsional properties, however, exposed inaccuracies in current BPLM theory, including three-fold discrepancies in torsional persistence length at the high force limit and the incorrect sign of dsRNA link-extension (twist-stretch) coupling. Beyond these experiments, HelixMC predicted that ‘nucleosome-excluding’ poly(A)/poly(T) is at least two-fold stiffer than random-sequence dsDNA in bending, stretching, and torsional behaviors; Z-DNA to be at least three-fold stiffer than random-sequence dsDNA, with a near-zero link-extension coupling; and non-negligible effects from base pair step correlations. We propose that experimentally testing these predictions should be powerful next steps for understanding the flexibility of dsDNA and dsRNA in sequence contexts and under mechanical stresses relevant to their biology.
DNA and RNA are fundamental molecules in the central dogma of molecular biology. Many biological behaviors of double-stranded DNA and RNA – including transcription/translation by proteins and packaging into compact structures – depend on their ability to flex and twist. Single-molecule tweezers now provide accurate mechanical measurements of DNA and RNA helices under force and torque but have not been used to rigorously falsify and thereby advance computational models. Here we present the first such blind challenge, involving recent dsRNA tweezers data that were kept hidden from modelers and a new HelixMC toolkit that resolves challenges in simulating long double helices from base-pair level models. The predictions gave excellent agreement with bending and stretching measurements of dsRNA but failed to recover twisting properties, pinpointing a critical area of future investigation. HelixMC also predicted that poly(A)/poly(T) and Z-DNA–biologically important variants whose elastic responses have not been studied with tweezers–will have distinct mechanical properties. These results open a route to iteratively falsifying and refining computational models of long nucleic acid helices, as is necessary for attaining a predictive understanding of their biological behaviors.