PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (839855)

Clipboard (0)
None

Related Articles

1.  Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments 
PLoS Computational Biology  2008;4(5):e1000083.
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.
Author Summary
Large-scale DNA sequencing efforts produce large amounts of protein sequence data. However, in order to understand the function of a protein, its tertiary three-dimensional structure is required. Despite worldwide efforts in structural biology, experimental protein structures are determined at a significantly slower pace. As a result, computational methods for protein structure prediction receive significant attention. A large part of the structure prediction problem lies in the enormous size of the problem: proteins seem to occur in an infinite variety of shapes. Here, we propose that this huge complexity may be overcome by identifying recurrent protein fragments, which are frequently reused as building blocks to construct proteins that were hitherto thought to be unrelated. The BriX database is the outcome of identifying about 2,000 canonical shapes among 1,261 protein structures. We show any given protein can be reconstructed from this library of building blocks at a very high resolution, suggesting that the modelling of protein backbones may be greatly aided by our database.
doi:10.1371/journal.pcbi.1000083
PMCID: PMC2367438  PMID: 18483555
2.  On the Characterization and Software Implementation of General Protein Lattice Models 
PLoS ONE  2013;8(3):e59504.
Abstract models of proteins have been widely used as a practical means to computationally investigate general properties of the system. In lattice models any sterically feasible conformation is represented as a self-avoiding walk on a lattice, and residue types are limited in number. So far, only two- or three-dimensional lattices have been used. The inspection of the neighborhood of alpha carbons in the core of real proteins reveals that also lattices with higher coordination numbers, possibly in higher dimensional spaces, can be adopted. In this paper, a new general parametric lattice model for simplified protein conformations is proposed and investigated. It is shown how the supporting software can be consistently designed to let algorithms that operate on protein structures be implemented in a lattice-agnostic way. The necessary theoretical foundations are developed and organically presented, pinpointing the role of the concept of main directions in lattice-agnostic model handling. Subsequently, the model features across dimensions and lattice types are explored in tests performed on benchmark protein sequences, using a Python implementation. Simulations give insights on the use of square and triangular lattices in a range of dimensions. The trend of potential minimum for sequences of different lengths, varying the lattice dimension, is uncovered. Moreover, an extensive quantitative characterization of the usage of the so-called “move types” is reported for the first time. The proposed general framework for the development of lattice models is simple yet complete, and an object-oriented architecture can be proficiently employed for the supporting software, by designing ad-hoc classes. The proposed framework represents a new general viewpoint that potentially subsumes a number of solutions previously studied. The adoption of the described model pushes to look at protein structure issues from a more general and essential perspective, making computational investigations over simplified models more straightforward as well.
doi:10.1371/journal.pone.0059504
PMCID: PMC3612044  PMID: 23555684
3.  Backrub-like backbone simulation recapitulates natural protein conformational variability and improves mutant side-chain prediction 
Journal of molecular biology  2008;380(4):742-756.
Summary
Incorporation of effective backbone sampling into protein simulation and design is an important step in increasing the accuracy of computational protein modeling. Recent analysis of high-resolution crystal structures has suggested a new model, termed backrub, to describe localized, hinge-like alternative backbone and side chain conformations observed in the crystal lattice. The model involves internal backbone rotations about axes between Cα atoms. Based on this observation, we have implemented a backrub-inspired sampling method in the Rosetta structure prediction and design program. We evaluate this model of backbone flexibility using three different tests. First, we show that Rosetta backrub simulations recapitulate the correlation between backbone and side-chain conformations in the high-resolution crystal structures upon which the model was based. As a second test of backrub sampling, we show that backbone flexibility improves the accuracy of predicting point-mutant side chain conformations over fixed backbone rotameric sampling alone. Finally, we show that backrub sampling of triosephosphate isomerase loop 6 can capture the ms/µs oscillation between the open and closed states observed in solution. Our results suggest that backrub sampling captures a sizable fraction of localized conformational changes that occur in natural proteins. Application of this simple model of backbone motions may significantly improve both protein design and atomistic simulations of localized protein flexibility.
doi:10.1016/j.jmb.2008.05.023
PMCID: PMC2603262  PMID: 18547585
flexible backbone sampling; backrub motion; point mutation; Monte Carlo; triosephosphate isomerase loop 6
4.  The VSGB 2.0 Model: A Next Generation Energy Model for High Resolution Protein Structure Modeling 
Proteins  2011;79(10):2794-2812.
A novel energy model (VSGB 2.0) for high resolution protein structure modeling is described, which features an optimized implicit solvent model as well as physics-based corrections for hydrogen bonding, π-π interactions, self-contact interactions and hydrophobic interactions. Parameters of the VSGB 2.0 model were fit to a crystallographic database of 2239 single side chain and 100 11–13 residue loop predictions. Combined with an advanced method of sampling and a robust algorithm for protonation state assignment, the VSGB 2.0 model was validated by predicting 115 super long loops up to 20 residues. Despite the dramatically increasing difficulty in reconstructing longer loops, a high accuracy was achieved: all of the lowest energy conformations have global backbone RMSDs better than 2.0 Å from the native conformations. Average global backbone RMSDs of the predictions are 0.51, 0.63, 0.70, 0.62, 0.80, 1.41, and 1.59 Å for 14, 15, 16, 17, 18, 19, and 20 residue loop predictions, respectively. When these results are corrected for possible statistical bias as explained in the text, the average global backbone RMSDs are 0.61, 0.71, 0.86, 0.62, 1.06, 1.67, and 1.59 Å. Given the precision and robustness of the calculations, we believe that the VSGB 2.0 model is suitable to tackle “real” problems, such as biological function modeling and structure-based drug discovery.
doi:10.1002/prot.23106
PMCID: PMC3206729  PMID: 21905107
energy model; all-atom force field; protonation state assignment; side chain prediction; loop prediction
5.  Autoindexing with outlier rejection and identification of superimposed lattices 
Journal of Applied Crystallography  2010;43(Pt 3):611-616.
After autoindexing, Bragg spot candidates that do not fit on the model lattice can be identified, providing a potentially useful measure of sample quality and giving an avenue for indexing a second lattice, if one is present.
Constructing a model lattice to fit the observed Bragg diffraction pattern is straightforward for perfect samples, but indexing can be challenging when artifacts are present, such as poorly shaped spots, split crystals giving multiple closely aligned lattices and outright superposition of patterns from aggregated microcrystals. To optimize the lattice model against marginal data, refinement can be performed using a subset of the observations from which the poorly fitting spots have been discarded. Outliers are identified by assuming a Gaussian error distribution for the best-fitting spots and points diverging from this distribution are culled. The set of remaining observations produces a superior lattice model, while the rejected observations can be used to identify a second crystal lattice, if one is present. The prevalence of outliers provides a potentially useful measure of sample quality. The described procedures are implemented for macromolecular crystallography within the autoindexing program labelit.index (http://cci.lbl.gov/labelit).
doi:10.1107/S0021889810010782
PMCID: PMC2875182  PMID: 20502598
autoindexing; outlier rejection; superimposed lattices; sample quality
6.  Orientation-dependent backbone-only residue pair scoring functions for fixed backbone protein design 
BMC Bioinformatics  2010;11:192.
Background
Empirical scoring functions have proven useful in protein structure modeling. Most such scoring functions depend on protein side chain conformations. However, backbone-only scoring functions do not require computationally intensive structure optimization and so are well suited to protein design, which requires fast score evaluation. Furthermore, scoring functions that account for the distinctive relative position and orientation preferences of residue pairs are expected to be more accurate than those that depend only on the separation distance.
Results
Residue pair scoring functions for fixed backbone protein design were derived using only backbone geometry. Unlike previous studies that used spherical harmonics to fit 2D angular distributions, Gaussian Mixture Models were used to fit the full 3D (position only) and 6D (position and orientation) distributions of residue pairs. The performance of the 1D (residue separation only), 3D, and 6D scoring functions were compared by their ability to identify correct threading solutions for a non-redundant benchmark set of protein backbone structures. The threading accuracy was found to steadily increase with increasing dimension, with the 6D scoring function achieving the highest accuracy. Furthermore, the 3D and 6D scoring functions were shown to outperform side chain-dependent empirical potentials from three other studies. Next, two computational methods that take advantage of the speed and pairwise form of these new backbone-only scoring functions were investigated. The first is a procedure that exploits available sequence data by averaging scores over threading solutions for homologs. This was evaluated by applying it to the challenging problem of identifying interacting transmembrane alpha-helices and found to further improve prediction accuracy. The second is a protein design method for determining the optimal sequence for a backbone structure by applying Belief Propagation optimization using the 6D scoring functions. The sensitivity of this method to backbone structure perturbations was compared with that of fixed-backbone all-atom modeling by determining the similarities between optimal sequences for two different backbone structures within the same protein family. The results showed that the design method using 6D scoring functions was more robust to small variations in backbone structure than the all-atom design method.
Conclusions
Backbone-only residue pair scoring functions that account for all six relative degrees of freedom are the most accurate and including the scores of homologs further improves the accuracy in threading applications. The 6D scoring function outperformed several side chain-dependent potentials while avoiding time-consuming and error prone side chain structure prediction. These scoring functions are particularly useful as an initial filter in protein design problems before applying all-atom modeling.
doi:10.1186/1471-2105-11-192
PMCID: PMC2874805  PMID: 20398384
7.  Toward high-resolution homology modeling of antibody Fv regions and application to antibody-antigen docking 
Proteins  2009;74(2):497-514.
High-resolution homology models are useful in structure-based protein engineering applications, especially when a crystallographic structure is unavailable. Here, we report the development and implementation of RosettaAntibody, a protocol for homology modeling of antibody variable regions. The protocol combines comparative modeling of canonical complementarity determining region (CDR) loop conformations and de novo loop modeling of CDR H3 conformation with simultaneous optimization of VL-VH rigid-body orientation and CDR backbone and side-chain conformations. The protocol was tested on a benchmark of 54 antibody crystal structures. The median root-mean-square-deviation (rmsd) of the antigen binding pocket comprised of all the CDR residues was 1.5 Å with 80% of the targets having an rmsd lower than 2.0 Å. The median backbone heavy atom global rmsd of the CDR H3 loop prediction was 1.6 Å, 1.9 Å, 2.4 Å, 3.1 Å and 6.0 Å for very short (4–6 residues), short (7–9), medium (10–11), long (12–14) and very long (17–22) loops respectively. When the set of ten top-scoring antibody homology models are used in local ensemble docking to antigen, a moderate to high accuracy docking prediction was achieved in seven of fifteen targets. This success in computational docking with high-resolution homology models is encouraging, but challenges still remain in modeling antibody structures for sequences with long H3 loops. This first large-scale antibody-antigen docking study using homology models reveals the level of “functional accuracy” of these structural models towards protein engineering applications.
doi:10.1002/prot.22309
PMCID: PMC2909601  PMID: 19062174
antibody structure; homology modeling; CDR H3 loop modeling; therapeutic antibodies; ensemble docking
8.  LocalMove: computing on-lattice fits for biopolymers 
Nucleic Acids Research  2008;36(Web Server issue):W216-W222.
Given an input Protein Data Bank file (PDB) for a protein or RNA molecule, LocalMove is a web server that determines an on-lattice representation for the input biomolecule. The web server implements a Markov Chain Monte-Carlo algorithm with simulated annealing to compute an approximate fit for either the coarse-grain model or backbone model on either the cubic or face-centered cubic lattice. LocalMove returns a PDB file as output, as well as dynamic movie of 3D images of intermediate conformations during the computation. The LocalMove server is publicly available at http://bioinformatics.bc.edu/clotelab/localmove/.
doi:10.1093/nar/gkn367
PMCID: PMC2447748  PMID: 18556754
9.  Models of collective cell behaviour with crowding effects: comparing lattice-based and lattice-free approaches 
Individual-based models describing the migration and proliferation of a population of cells frequently restrict the cells to a predefined lattice. An implicit assumption of this type of lattice-based model is that a proliferative population will always eventually fill the lattice. Here, we develop a new lattice-free individual-based model that incorporates cell-to-cell crowding effects. We also derive approximate mean-field descriptions for the lattice-free model in two special cases motivated by commonly used experimental set-ups. Lattice-free simulation results are compared with these mean-field descriptions and with a corresponding lattice-based model. Data from a proliferation experiment are used to estimate the parameters for the new model, including the cell proliferation rate, showing that the model fits the data well. An important aspect of the lattice-free model is that the confluent cell density is not predefined, as with lattice-based models, but an emergent model property. As a consequence of the more realistic, irregular configuration of cells in the lattice-free model, the population growth rate is much slower at high cell densities and the population cannot reach the same confluent density as an equivalent lattice-based model.
doi:10.1098/rsif.2012.0319
PMCID: PMC3479911  PMID: 22696488
cell migration; cell proliferation; lattice-based; lattice-free; random walk
10.  A generalized approach to sampling backbone conformations with RosettaDock for CAPRI rounds 13–19 
Proteins  2010;78(15):3115-3123.
In CAPRI rounds 13–19, the most native-like structure predicted by RosettaDock resulted in two high, one medium and one acceptable accuracy model out of 13 targets. The current rounds of CAPRI were especially challenging with many unbound and homology modeled starting structures. Novel docking methods, including EnsembleDock and SnugDock, allowed backbone conformational sampling during docking and enabled the creation of more accurate models. For Target 32, α-amylase/subtilisin inhibitor-subtilisin savinase, we sampled different backbone conformations at an interfacial loop to produce five high-quality models including the most accurate structure submitted in the challenge (2.1 Å ligand rmsd, 0.52 Å interface rmsd). For Target 41, colicin-immunity protein, we used EnsembleDock to sample the ensemble of nuclear magnetic resonance (NMR) models of the immunity protein to generate a medium accuracy structure. Experimental data identifying the catalytic residues at the binding interface for Target 40 (trypsin-inhibitor) were used to filter RosettaDock global rigid body docking decoys to determine high accuracy predictions for the two distinct binding sites in which the inhibitor interacts with trypsin. We discuss our generalized approach to selecting appropriate methods for different types of docking problems. The current toolset provides some robustness to errors in homology models, but significant challenges remain in accommodating larger backbone uncertainties and in sampling adequately for global searches.
doi:10.1002/prot.22765
PMCID: PMC2952725  PMID: 20535822
SnugDock; EnsembleDock; Flexible Backbone; Protein-Protein Docking; Flexible Loop Docking; Docking NMR Models
11.  Dynamics of the streptavidin-biotin complex in solution and in its crystal lattice: Distinct behavior revealed by molecular simulations 
The journal of physical chemistry. B  2009;113(19):6971-6985.
We present a 250ns simulation of the wild-type, biotin-liganded streptavidin tetramer in the solution phase and compare the trajectory to two previously published simulations of the protein in its crystal lattice. By performing both types of simulations, we are able to interpret the protein’s behavior in solution in the context of its X-ray structure. We find that the rate of conformational sampling is increased in solution over the lattice environment, although the relevant conformational space in solution is also much larger, as indicated by overall fluctuations in the positions of backbone atoms. We also compare the distributions of χ1 angles sampled by side chains exposed to solvent in the lattice and in the solution phase, obtaining overall good agreement between the distributions obtained in our most rigorous lattice simulation and the crystallographic χ1 angles. We observe changes in the χ1 distributions in the solution phase, and note an apparent progression of the distributions as the environment changes from a tightly packed lattice filled with crystallization media to a bath of pure water. Finally, we examine the interaction of biotin and streptavidin in each simulation, uncovering a possible alternate conformation of the biotin carboxylate tail. We also note that a hydrogen bond observed to break transiently in previous solution phase simulations is predominantly broken in this much longer solution phase trajectory; in the lattice simulations, the lattice environment appears to help maintain the hydrogen bond, but more sampling will be needed to confirm whether the simulation model truly gives good agreement with the X-ray data in the lattice simulations. We expect that pairing solution phase biomolecular simulations with crystal lattice simulations will help to validate simulation models and improve the interpretation of experimentally determined structures.
doi:10.1021/jp9010372
PMCID: PMC2791092  PMID: 19374419
protein simulation; molecular dynamics; atomic fluctuations; rotamer; surface residue; lattice
12.  High resolution protein structure prediction and the crystallographic phase problem 
Nature  2007;450(7167):259-264.
Summary
We describe a new approach to refining protein structure models that focuses sampling in regions most likely to contain errors while allowing the whole structure to relax in a physically realistic all-atom force field. In applications to models produced using NMR data and to comparative models based on distant structural homologues, the method can significantly improve the accuracy of the structures in terms of both the backbone conformations and the placement of core side chains. Further, the resulting models satisfy a particularly stringent test: they provide significantly better solutions to the X-ray crystallographic phase problem in molecular replacement trials. Finally, we show that all-atom refinement can produce de novo protein structure predictions that reach the high accuracy required for molecular replacement. Phases for diffraction data for a 112-residue protein have been determined without any experimental phase information and in the absence of any templates suitable for molecular replacement from the Protein Data Bank. These results suggest that the combination of high resolution structure prediction with state-of-the-art phasing tools may be unexpectedly powerful in phasing crystallographic data for which molecular replacement is hindered by the absence of sufficiently accurate prior models.
doi:10.1038/nature06249
PMCID: PMC2504711  PMID: 17934447
13.  Classifying proteinlike sequences in arbitrary lattice protein models using LatPack 
HFSP Journal  2008;2(6):396-404.
Knowledge of a protein’s three-dimensional native structure is vital in determining its chemical properties and functionality. However, experimental methods to determine structure are very costly and time-consuming. Computational approaches such as folding simulations and structure prediction algorithms are quicker and cheaper but lack consistent accuracy. This currently restricts extensive computational studies to abstract protein models. It is thus essential that simplifications induced by the models do not negate scientific value. Key to this is the use of thoroughly defined proteinlike sequences. In such cases abstract models can allow for the investigation of important biological questions. Here, we present a procedure to generate and classify proteinlike sequence data sets. Our LatPack tools and the approach in general are applicable to arbitrary lattice protein models. Identification is based on thermodynamic kinetic features and incorporates the sequential assembly of proteins by addressing cotranslational folding. We demonstrate the approach in the widely used unrestricted 3D-cubic HP-model. The resulting sequence set is the first large data set for this model exhibiting the proteinlike properties required. Our data tools are freely available and can be used to investigate protein-related problems.
doi:10.2976/1.3027681
PMCID: PMC2645588  PMID: 19436498
14.  Neutral evolution of Protein-protein interactions: a computational study using simple models 
Background
Protein-protein interactions are central to cellular organization, and must have appeared at an early stage of evolution. To understand better their role, we consider a simple model of protein evolution and determine the effect of an explicit selection for Protein-protein interactions.
Results
In the model, viable sequences all have the same fitness, following the neutral evolution theory. A very simple, two-dimensional lattice representation of the protein structures is used, and the model only considers two kinds of amino acids: hydrophobic and polar. With these approximations, exact calculations are performed. The results do not depend too strongly on these assumptions, since a model using a 3D, off-lattice representation of the proteins gives results in qualitative agreement with the 2D one. With both models, the evolutionary dynamics lead to a steady state population that is enriched in sequences that dimerize with a high affinity, well beyond the minimal level needed to survive. Correspondingly, sequences close to the viability threshold are less abundant in the steady state, being subject to a larger proportion of lethal mutations. The set of viable sequences has a "funnel" shape, consistent with earlier studies: sequences that are highly populated in the steady state are "close" to each other (with proximity being measured by the number of amino acids that differ).
Conclusion
This bias in the the steady state sequences should lead to an increased resistance of the population to environmental change and an increased ability to evolve.
doi:10.1186/1472-6807-7-79
PMCID: PMC2248192  PMID: 18021454
15.  Position and Orientation Distributions for Locally Self-Avoiding Walks in the Presence of Obstacles 
Polymer  2008;49(6):1701-1715.
This paper presents a new approach to study the statistics of lattice random walks in the presence of obstacles and local self-avoidance constraints (excluded volume). By excluding sequentially local interactions within a window that slides along the chain, we obtain an upper bound on the number of self-avoiding walks (SAWs) that terminate at each possible position and orientation. Furthermore we develop a technique to include the effects of obstacles. Thus our model is a more realistic approximation of a polymer chain than that of a simple lattice random walk, and it is more computationally tractable than enumeration of obstacle-avoiding SAWs. Our approach is based on the method of the lattice-motion-group convolution. We develop these techniques theoretically and present numerical results for 2-D and 3-D lattices (square, hexagonal, cubic and tetrahedral/diamond). We present numerical results that show how the connectivity constant μ changes with the length of each self-avoiding window and the total length of the chain. Quantities such as 〈R〉 and others such as the probability of ring closure are calculated and compared with results obtained in the literature for the simple random walk case.
doi:10.1016/j.polymer.2008.01.056
PMCID: PMC2390830  PMID: 18496591
16.  Predicting the Tolerated Sequences for Proteins and Protein Interfaces Using RosettaBackrub Flexible Backbone Design 
PLoS ONE  2011;6(7):e20451.
Predicting the set of sequences that are tolerated by a protein or protein interface, while maintaining a desired function, is useful for characterizing protein interaction specificity and for computationally designing sequence libraries to engineer proteins with new functions. Here we provide a general method, a detailed set of protocols, and several benchmarks and analyses for estimating tolerated sequences using flexible backbone protein design implemented in the Rosetta molecular modeling software suite. The input to the method is at least one experimentally determined three-dimensional protein structure or high-quality model. The starting structure(s) are expanded or refined into a conformational ensemble using Monte Carlo simulations consisting of backrub backbone and side chain moves in Rosetta. The method then uses a combination of simulated annealing and genetic algorithm optimization methods to enrich for low-energy sequences for the individual members of the ensemble. To emphasize certain functional requirements (e.g. forming a binding interface), interactions between and within parts of the structure (e.g. domains) can be reweighted in the scoring function. Results from each backbone structure are merged together to create a single estimate for the tolerated sequence space. We provide an extensive description of the protocol and its parameters, all source code, example analysis scripts and three tests applying this method to finding sequences predicted to stabilize proteins or protein interfaces. The generality of this method makes many other applications possible, for example stabilizing interactions with small molecules, DNA, or RNA. Through the use of within-domain reweighting and/or multistate design, it may also be possible to use this method to find sequences that stabilize particular protein conformations or binding interactions over others.
doi:10.1371/journal.pone.0020451
PMCID: PMC3138746  PMID: 21789164
17.  FiberDock: a web server for flexible induced-fit backbone refinement in molecular docking 
Nucleic Acids Research  2010;38(Web Server issue):W457-W461.
Protein–protein docking algorithms aim to predict the structure of a complex given the atomic structures of the proteins that assemble it. The docking procedure usually consists of two main steps: docking candidate generation and their refinement. The refinement stage aims to improve the accuracy of the candidate solutions and to identify near-native solutions among them. During protein–protein interaction, both side chains and backbone change their conformation. Refinement methods should model these conformational changes in order to obtain a more accurate model of the complex. Handling protein backbone flexibility is a major challenge for docking methodologies, since backbone flexibility adds a huge number of degrees of freedom to the search space. FiberDock is the first docking refinement web server, which accounts for both backbone and side-chain flexibility. Given a set of up to 100 potential docking candidates, FiberDock models the backbone and side-chain movements that occur during the interaction, refines the structures and scores them according to an energy function. The FiberDock web server is free and available with no login requirement at http://bioinfo3d.cs.tau.ac.il/FiberDock/.
doi:10.1093/nar/gkq373
PMCID: PMC2896170  PMID: 20460459
18.  Improved side-chain torsion potentials for the Amber ff99SB protein force field 
Proteins  2010;78(8):1950-1958.
Recent advances in hardware and software have enabled increasingly long molecular dynamics (MD) simulations of biomolecules, exposing certain limitations in the accuracy of the force fields used for such simulations and spurring efforts to refine these force fields. Recent modifications to the Amber and CHARMM protein force fields, for example, have improved the backbone torsion potentials, remedying deficiencies in earlier versions. Here, we further advance simulation accuracy by improving the amino acid side-chain torsion potentials of the Amber ff99SB force field. First, we used simulations of model alpha-helical systems to identify the four residue types whose rotamer distribution differed the most from expectations based on Protein Data Bank statistics. Second, we optimized the side-chain torsion potentials of these residues to match new, high-level quantum-mechanical calculations. Finally, we used microsecond-timescale MD simulations in explicit solvent to validate the resulting force field against a large set of experimental NMR measurements that directly probe side-chain conformations. The new force field, which we have termed Amber ff99SB-ILDN, exhibits considerably better agreement with the NMR data. Proteins 2010. © 2010 Wiley-Liss, Inc.
doi:10.1002/prot.22711
PMCID: PMC2970904  PMID: 20408171
molecular dynamics simulation; molecular mechanics; NMR; rotamer; side chain; protein dynamics; quantum mechanics; dihedral
19.  Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ1 and χ2 dihedral angles 
While the quality of the current CHARMM22/CMAP additive force field for proteins has been demonstrated in a large number of applications, limitations in the model with respect to the equilibrium between the sampling of helical and extended conformations in folding simulations have been noted. To overcome this, as well as make other improvements in the model, we present a combination of refinements that should result in enhanced accuracy in simulations of proteins. The common (non Gly, Pro) backbone CMAP potential has been refined against experimental solution NMR data for weakly structured peptides, resulting in a rebalancing of the energies of the α-helix and extended regions of the Ramachandran map, correcting the α-helical bias of CHARMM22/CMAP. The Gly and Pro CMAPs have been refitted to more accurate quantum-mechanical energy surfaces. Side-chain torsion parameters have been optimized by fitting to backbone-dependent quantum-mechanical energy surfaces, followed by additional empirical optimization targeting NMR scalar couplings for unfolded proteins. A comprehensive validation of the revised force field was then performed against data not used to guide parametrization: (i) comparison of simulations of eight proteins in their crystal environments with crystal structures; (ii) comparison with backbone scalar couplings for weakly structured peptides; (iii) comparison with NMR residual dipolar couplings and scalar couplings for both backbone and side-chains in folded proteins; (iv) equilibrium folding of mini-proteins. The results indicate that the revised CHARMM 36 parameters represent an improved model for the modeling and simulation studies of proteins, including studies of protein folding, assembly and functionally relevant conformational changes.
doi:10.1021/ct300400x
PMCID: PMC3549273  PMID: 23341755
Molecular dynamics simulation; NMR spectroscopy; empirical energy function; protein folding
20.  Structure of Chlorosomes from the Green Filamentous Bacterium Chloroflexus aurantiacus▿ †  
Journal of Bacteriology  2009;191(21):6701-6708.
The green filamentous bacterium Chloroflexus aurantiacus employs chlorosomes as photosynthetic antennae. Chlorosomes contain bacteriochlorophyll aggregates and are attached to the inner side of a plasma membrane via a protein baseplate. The structure of chlorosomes from C. aurantiacus was investigated by using a combination of cryo-electron microscopy and X-ray diffraction and compared with that of Chlorobi species. Cryo-electron tomography revealed thin chlorosomes for which a distinct crystalline baseplate lattice was visualized in high-resolution projections. The baseplate is present only on one side of the chlorosome, and the lattice dimensions suggest that a dimer of the CsmA protein is the building block. The bacteriochlorophyll aggregates inside the chlorosome are arranged in lamellae, but the spacing is much greater than that in Chlorobi species. A comparison of chlorosomes from different species suggested that the lamellar spacing is proportional to the chain length of the esterifying alcohols. C. aurantiacus chlorosomes accumulate larger quantities of carotenoids under high-light conditions, presumably to provide photoprotection. The wider lamellae allow accommodation of the additional carotenoids and lead to increased disorder within the lamellae.
doi:10.1128/JB.00690-09
PMCID: PMC2795307  PMID: 19717605
21.  Improved prediction of protein side-chain conformations with SCWRL4 
Proteins  2009;77(4):778-795.
Determination of side-chain conformations is an important step in protein structure prediction and protein design. Many such methods have been presented, although only a small number are in widespread use. SCWRL is one such method, and the SCWRL3 program (2003) has remained popular due to its speed, accuracy, and ease-of-use for the purpose of homology modeling. However, higher accuracy at comparable speed is desirable. This has been achieved through: 1) a new backbone-dependent rotamer library based on kernel density estimates; 2) averaging over samples of conformations about the positions in the rotamer library; 3) a fast anisotropic hydrogen bonding function; 4) a short-range, soft van der Waals atom-atom interaction potential; 5) fast collision detection using k-discrete oriented polytopes; 6) a tree decomposition algorithm to solve the combinatorial problem; and 7) optimization of all parameters by determining the interaction graph within the crystal environment using symmetry operators of the crystallographic space group. Accuracies as a function of electron density of the side chains demonstrate that side chains with higher electron density are easier to predict than those with low electron density and presumed conformational disorder. For a testing set of 379 proteins, 86% of χ1 angles and 75% of χ1+2 are predicted correctly within 40° of the X-ray positions. Among side chains with higher electron density (25th–100th percentile), these numbers rise to 89% and 80%. The new program maintains its simple command-line interface, designed for homology modeling, and is now available as a dynamic-linked library for incorporation into other software programs.
doi:10.1002/prot.22488
PMCID: PMC2885146  PMID: 19603484
homology modeling; side-chain prediction; protein structure; rotamer library; graph decomposition; SCWRL
22.  Assessing the Accuracy of Ancestral Protein Reconstruction Methods 
PLoS Computational Biology  2006;2(6):e69.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.
Synopsis
It is now possible to apply computational methods to known current protein sequences to recreate the sequences of ancestral proteins. By synthesising these proteins and measuring their properties in the laboratory, we can gain much information about the nature of evolution, better understand how proteins change and adapt over time, and develop insights into the environments of ancient organisms. Unfortunately, the accuracy of these reconstructions is difficult to evaluate. We simulate protein evolution using a simplified computational model and apply the various reconstruction methods to the sequences that arise from our simulations. Because we have the complete record of the evolutionary history, we can evaluate the reconstruction accuracy directly. We demonstrate that the reconstruction procedures in common use may have a bias toward overestimating the properties of these ancestral proteins, opposite to what has been assumed previously. An alternative method of creating these sequences is presented, Bayesian sampling, that can eliminate this bias and provide more robust conclusions.
doi:10.1371/journal.pcbi.0020069
PMCID: PMC1480538  PMID: 16789817
23.  Assessing the Accuracy of Ancestral Protein Reconstruction Methods 
PLoS Computational Biology  2006;2(6):e69.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.
Synopsis
It is now possible to apply computational methods to known current protein sequences to recreate the sequences of ancestral proteins. By synthesising these proteins and measuring their properties in the laboratory, we can gain much information about the nature of evolution, better understand how proteins change and adapt over time, and develop insights into the environments of ancient organisms. Unfortunately, the accuracy of these reconstructions is difficult to evaluate. We simulate protein evolution using a simplified computational model and apply the various reconstruction methods to the sequences that arise from our simulations. Because we have the complete record of the evolutionary history, we can evaluate the reconstruction accuracy directly. We demonstrate that the reconstruction procedures in common use may have a bias toward overestimating the properties of these ancestral proteins, opposite to what has been assumed previously. An alternative method of creating these sequences is presented, Bayesian sampling, that can eliminate this bias and provide more robust conclusions.
doi:10.1371/journal.pcbi.0020069
PMCID: PMC1480538  PMID: 16789817
24.  CPSP-tools – Exact and complete algorithms for high-throughput 3D lattice protein studies 
BMC Bioinformatics  2008;9:230.
Background
The principles of protein folding and evolution pose problems of very high inherent complexity. Often these problems are tackled using simplified protein models, e.g. lattice proteins. The CPSP-tools package provides programs to solve exactly and completely the problems typical of studies using 3D lattice protein models. Among the tasks addressed are the prediction of (all) globally optimal and/or suboptimal structures as well as sequence design and neutral network exploration.
Results
In contrast to stochastic approaches, which are not capable of answering many fundamental questions, our methods are based on fast, non-heuristic techniques. The resulting tools are designed for high-throughput studies of 3D-lattice proteins utilising the Hydrophobic-Polar (HP) model. The source bundle is freely available [1].
Conclusion
The CPSP-tools package is the first set of exact and complete methods for extensive, high-throughput studies of non-restricted 3D-lattice protein models. In particular, our package deals with cubic and face centered cubic (FCC) lattices.
doi:10.1186/1471-2105-9-230
PMCID: PMC2396640  PMID: 18462492
25.  Orientational distributions of contact clusters in proteins closely resemble those of an icosahedron 
Proteins  2008;73(3):730-741.
The orientational geometry of residue packing in proteins was studied in the past by superimposing clusters of neighboring residues with several simple lattices.1,2 In this work, instead of a lattice we use the regular polyhedron, the icosahedron, as the model to describe the orientational distribution of contacts in clusters derived from a high-resolution protein dataset (522 protein structures with high resolution < 1.5Å). We find that the order parameter (orientation function) measuring the angular overlap of directions in coordination clusters with directions of the icosahedron is 0.91, which is a significant improvement in comparison with the value 0.82 for the order parameter with the face-centered cubic (fcc) lattice. Close packing tendencies and patterns of residue packing in proteins is considered in detail and a theoretical description of these packing regularities is proposed.
doi:10.1002/prot.22092
PMCID: PMC3018876  PMID: 18498111
residue packing in proteins; icosahedron; packing pattern

Results 1-25 (839855)