One of the major challenges with protein template-free modeling is an efficient sampling algorithm that can explore a huge conformation space quickly. The popular fragment assembly method constructs a conformation by stringing together short fragments extracted from the Protein Data Base (PDB). The discrete nature of this method may limit generated conformations to a subspace in which the native fold does not belong. Another worry is that a protein with really new fold may contain some fragments not in the PDB. This article presents a probabilistic model of protein conformational space to overcome the above two limitations. This probabilistic model employs directional statistics to model the distribution of backbone angles and 2nd-order Conditional Random Fields (CRFs) to describe sequence-angle relationship. Using this probabilistic model, we can sample protein conformations in a continuous space, as opposed to the widely used fragment assembly and lattice model methods that work in a discrete space. We show that when coupled with a simple energy function, this probabilistic method compares favorably with the fragment assembly method in the blind CASP8 evaluation, especially on alpha or small beta proteins. To our knowledge, this is the first probabilistic method that can search conformations in a continuous space and achieves favorable performance. Our method also generated three-dimensional (3D) models better than template-based methods for a couple of CASP8 hard targets. The method described in this article can also be applied to protein loop modeling, model refinement, and even RNA tertiary structure prediction.
conditional random fields (CRFs); directional statistics; fragment assembly; lattice model; protein structure prediction; template-free modeling
Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold’s decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from http://www.riken.jp/zhangiru/software/.
Despite significant progress in recent years, ab initio folding is still one of the most challenging problems in structural biology. This paper presents a probabilistic graphical model for ab initio folding, which employs Conditional Random Fields (CRFs) and directional statistics to model the relationship between the primary sequence of a protein and its three-dimensional structure. Different from the widely-used fragment assembly method and the lattice model for protein folding, our graphical model can explore protein conformations in a continuous space according to their probability. The probability of a protein conformation reflects its stability and is estimated from PSI-BLAST sequence profile and predicted secondary structure. Experimental results indicate that this new method compares favorably with the fragment assembly method and the lattice model.
protein structure prediction; ab initio folding; conditional random fields (CRFs); directional statistics; fragment assembly; lattice model
Protein structure prediction without using templates (i.e., ab initio folding) is one of the most challenging problems in structural biology. In particular, conformation sampling poses as a major bottleneck of ab initio folding. This article presents CRFSampler, an extensible protein conformation sampler, built on a probabilistic graphical model Conditional Random Fields (CRFs). Using a discriminative learning method, CRFSampler can automatically learn more than ten thousand parameters quantifying the relationship among primary sequence, secondary structure, and (pseudo) backbone angles. Using only compactness and self-avoiding constraints, CRFSampler can efficiently generate protein-like conformations from primary sequence and predicted secondary structure. CRFSampler is also very flexible in that a variety of model topologies and feature sets can be defined to model the sequence-structure relationship without worrying about parameter estimation. Our experimental results demonstrate that using a simple set of features, CRFSampler can generate decoys with much higher quality than the most recent HMM model.
protein conformation sampling; conditional random fields (CRFs); discriminative learning
Accurate tertiary structures are very important for the functional study of non-coding RNA molecules. However, predicting RNA tertiary structures is extremely challenging, because of a large conformation space to be explored and lack of an accurate scoring function differentiating the native structure from decoys. The fragment-based conformation sampling method (e.g. FARNA) bears shortcomings that the limited size of a fragment library makes it infeasible to represent all possible conformations well. A recent dynamic Bayesian network method, BARNACLE, overcomes the issue of fragment assembly. In addition, neither of these methods makes use of sequence information in sampling conformations. Here, we present a new probabilistic graphical model, conditional random fields (CRFs), to model RNA sequence–structure relationship, which enables us to accurately estimate the probability of an RNA conformation from sequence. Coupled with a novel tree-guided sampling scheme, our CRF model is then applied to RNA conformation sampling. Experimental results show that our CRF method can model RNA sequence–structure relationship well and sequence information is important for conformation sampling. Our method, named as TreeFolder, generates a much higher percentage of native-like decoys than FARNA and BARNACLE, although we use the same simple energy function as BARNACLE.
Contact: email@example.com; firstname.lastname@example.org
Supplementary Information: Supplementary data are available at Bioinformatics online.
We perform all-atom computer simulations on nearly one hundred 6-, 8-, 10-, and 12-mer peptide fragments of protein G, and look for stable states. We simulated by replica-exchange molecular dynamics using Amber7 with the parm96 force-field and a GB/SA (generalized-Born/solvent accessible) implicit solvent model. We find that useful diagnostics for identifying stable converged structures are the conformational entropy and free energy of each state. A large gap in the ground-state free-energy, and a low entropy indicate convergence to a single preferred peptide conformation. We find that a non-negligible fraction of such structures have some native-like character. Such physics-based modeling may be useful for identifying early nuclei in folding kinetics and for assisting in protein-structure prediction methods that utilize the assembly of peptide fragments.
Ab initio prediction is the challenging attempt to predict protein structures based only on sequence information and without using templates. It is often divided into
two distinct sub-problems: (a) the scoring function that can distinguish native, or
native-like structures, from non-native ones; and (b) the method of searching the
conformational space. Currently, there is no reliable scoring function that can
always drive a search to the native fold, and there is no general search method
that can guarantee a significant sampling of near-natives. Pathway models combine
the scoring function and the search. In this short review, we explore some of the
ways pathway models are used in folding, in published works since 2001, and
present a new pathway model, HMMSTR-CM, that uses a fragment library and
a set of nucleation/propagation-based rules. The new method was used for ab initio
predictions as part of CASP5. This work was presented at the Winter School in
Bioinformatics, Bologna, Italy, 10–14 February 2003.
Here, we present the algorithm and validation for OMEGA, a systematic, knowledge-based conformer generator. The algorithm consists of three phases: assembly of an initial 3D structure from a library of fragments; exhaustive enumeration of all rotatable torsions using values drawn from a knowledge-based list of angles, thereby generating a large set of conformations; and sampling of this set by geometric and energy criteria. Validation of conformer generators like OMEGA has often been undertaken by comparing computed conformer sets to experimental molecular conformations from crystallography, usually from the Protein Databank (PDB). Such an approach is fraught with difficulty due to the systematic problems with small molecule structures in the PDB. Methods are presented to identify a diverse set of small molecule structures from cocomplexes in the PDB that has maximal reliability. A challenging set of 197 high quality, carefully selected ligand structures from well-solved models was obtained using these methods. This set will provide a sound basis for comparison and validation of conformer generators in the future. Validation results from this set are compared to the results using structures of a set of druglike molecules extracted from the Cambridge Structural Database (CSD). OMEGA is found to perform very well in reproducing the crystallographic conformations from both these data sets using two complementary metrics of success.
is one of the prime tools for high resolution protein structure
refinement. While its scoring function can distinguish native-like
from non-native-like conformations in many cases, the method is limited
by conformational sampling for larger proteins, that is, leaving a
local energy minimum in which the search algorithm may get stuck.
Here, we test the hypothesis that iteration of Rosetta with an orthogonal
sampling and scoring strategy might facilitate exploration of conformational
space. Specifically, we run short molecular dynamics (MD) simulations
on models created by de novo folding of large proteins
into cryoEM density maps to enable sampling of conformational space
not directly accessible to Rosetta and thus provide an escape route
from the conformational traps. We present a combined MD–Rosetta
protein structure refinement protocol that can overcome some
of these sampling limitations. Two of four benchmark proteins showed
incremental improvement through all three rounds of the iterative
refinement protocol. Molecular dynamics is most efficient in applying
subtle but important rearrangements within secondary structure elements
and is thus highly complementary to the Rosetta refinement, which
focuses on side chains and loop regions.
It has long been proposed that much of the information encoding how a protein folds is contained locally in the peptide chain. Here we present a large-scale simulation study designed to examine the extent to which conformations of peptide fragments in water predict native conformations in proteins. We perform replica exchange molecular dynamics (REMD) simulations of 872 8-mer, 12-mer, and 16-mer peptide fragments from 13 proteins using the AMBER 96 force field and the OBC implicit solvent model. To analyze the simulations, we compute various contact-based metrics, such as contact probability, and then apply Bayesian classifier methods to infer which metastable contacts are likely to be native vs. non-native. We find that a simple measure, the observed contact probability, is largely more predictive of a peptide's native structure in the protein than combinations of metrics or multi-body components. Our best classification model is a logistic regression model that can achieve up to 63% correct classifications for 8-mers, 71% for 12-mers, and 76% for 16-mers. We validate these results on fragments of a protein outside our training set. We conclude that local structure provides information to solve some but not all of the conformational search problem. These results help improve our understanding of folding mechanisms, and have implications for improving physics-based conformational sampling and structure prediction using all-atom molecular simulations.
Proteins must fold to unique native structures in order to perform their functions. To do this, proteins must solve a complicated conformational search problem, the details of which remain difficult to study experimentally. Predicting folding pathways and the mechanisms by which proteins fold is thus central to understanding how proteins work. One longstanding question is the extent to which proteins solve the search problem locally, by folding into sub-structures that are dictated primarily by local sequence. Here, we address this question by conducting a large-scale molecular dynamics simulation study of protein fragments in water. The simulation data was then used to optimize a statistical model that predicted native and non-native contacts. The performance of the resulting model suggests that local structuring provides some but not all of the information to solve the folding problem, and that molecular dynamics simulation of fragments can be useful for protein structure prediction and design.
Ab initio protein folding is one of the major unsolved problems in computational biology due to the difficulties in force field design and conformational search. We developed a novel program, QUARK, for template-free protein structure prediction. Query sequences are first broken into fragments of 1–20 residues where multiple fragment structures are retrieved at each position from unrelated experimental structures. Full-length structure models are then assembled from fragments using replica-exchange Monte Carlo simulations, which are guided by a composite knowledge-based force field. A number of novel energy terms and Monte Carlo movements are introduced and the particular contributions to enhancing the efficiency of both force field and search engine are analyzed in detail. QUARK prediction procedure is depicted and tested on the structure modeling of 145 non-homologous proteins. Although no global templates are used and all fragments from experimental structures with template modeling score (TM-score) >0.5 are excluded, QUARK can successfully construct 3D models of correct folds in 1/3 cases of short proteins up to 100 residues. In the ninth community-wide Critical Assessment of protein Structure Prediction (CASP9) experiment, QUARK server outperformed the second and third best servers by 18% and 47% based on the cumulative Z-score of global distance test-total (GDT-TS) scores in the free modeling (FM) category. Although ab initio protein folding remains a significant challenge, these data demonstrate new progress towards the solution of the most important problem in the field.
hydrogen bonding; Monte Carlo simulation; protein folding; protein structure prediction; solvent accessibility; statistical potential
Depending on whether similar structures are found in the PDB library, the protein structure prediction can be categorized into template-based modeling and free modeling. Although threading is an efficient tool to detect the structural analogs, the advancements in methodology development have come to a steady state. Encouraging progress is observed in structure refinement which aims at drawing template structures closer to the native; this has been mainly driven by the use of multiple structure templates and the development of hybrid knowledge-based and physics-based force fields. For free modeling, exciting examples have been witnessed in folding small proteins to atomic resolutions. However, predicting structures for proteins larger than 150 residues still remains a challenge, with bottlenecks from both force field and conformational search.
The increasing importance of non-coding RNA in biology and medicine has led to a growing interest in the problem of RNA 3-D structure prediction. As is the case for proteins, RNA 3-D structure prediction methods require two key ingredients: an accurate energy function and a conformational sampling procedure. Both are only partly solved problems. Here, we focus on the problem of conformational sampling. The current state of the art solution is based on fragment assembly methods, which construct plausible conformations by stringing together short fragments obtained from experimental structures. However, the discrete nature of the fragments necessitates the use of carefully tuned, unphysical energy functions, and their non-probabilistic nature impairs unbiased sampling. We offer a solution to the sampling problem that removes these important limitations: a probabilistic model of RNA structure that allows efficient sampling of RNA conformations in continuous space, and with associated probabilities. We show that the model captures several key features of RNA structure, such as its rotameric nature and the distribution of the helix lengths. Furthermore, the model readily generates native-like 3-D conformations for 9 out of 10 test structures, solely using coarse-grained base-pairing information. In conclusion, the method provides a theoretical and practical solution for a major bottleneck on the way to routine prediction and simulation of RNA structure and dynamics in atomic detail.
The importance of RNA in biology and medicine has increased immensely over the last several years, due to the discovery of a wide range of important biological processes that are under the guidance of non-coding RNA. As is the case with proteins, the function of an RNA molecule is encoded in its three-dimensional (3-D) structure, which in turn is determined by the molecule's sequence. Therefore, interest in the computational prediction of the 3-D structure of RNA from sequence is great. One of the main bottlenecks in routine prediction and simulation of RNA structure and dynamics is sampling, the efficient generation of RNA-like conformations, ideally in a mathematically and physically sound way. Current methods require the use of unphysical energy functions to amend the shortcomings of the sampling procedure. We have developed a mathematical model that describes RNA's conformational space in atomic detail, without the shortcomings of other sampling methods. As an illustration of its potential, we describe a simple yet efficient method to sample conformations that are compatible with a given secondary structure. An implementation of the sampling method, called BARNACLE, is freely available.
We present a method for the computer-based iterative assembly of native-like tertiary structures of helical proteins from alpha-helical fragments. For any pair of helices, our method, called MATCHSTIX, first generates an ensemble of possible relative orientations of the helices with various ways to form hydrophobic contacts between them. Those conformations having steric clashes, or a large radius of gyration of hydrophobic residues, or with helices too far separated to be connected by the intervening linking region, are discarded. Then, we attempt to connect the two helical fragments by using a robotics-based loop-closure algorithm. When loop closure is feasible, the algorithm generates an ensemble of viable interconnecting loops. After energy minimization and clustering, we use a representative set of conformations for further assembly with the remaining helices, adding one helix at a time. To efficiently sample the conformational space, the order of assembly generally proceeds from the pair of helices connected by the shortest loop, followed by joining one of its adjacent helices, always proceeding with the shorter connecting loop. We tested MATCHSTIX on 28 helical proteins each containing up to 5 helices and found it to heavily sample native-like conformations. The average RMSD of the best conformations for the 17 helix-bundle proteins that have 2 or 3 helices is less than 2 Å; errors increase somewhat for proteins containing more helices. Native-like states are even more densely sampled when disulfide bonds are known and imposed as restraints. We conclude that, at least for helical proteins, if the secondary structures are known, this rapid rigid-body maximization of hydrophobic interactions can lead to small ensembles of highly native-like structures. It may be useful for protein structure prediction.
One of critical difficulties of molecular dynamics (MD) simulations in protein structure refinement is that the physics-based energy landscape lacks of middle-range funnel to guide non-native conformations towards near-native states. We propose to use the target model as probe to identify fragmental analogs from PDB. The distance maps are then used to re-shape the MD energy funnel. The protocol was tested on 181 benchmarking and 26 CASP targets. It was found that structure models of correct folds with TM-score>0.5 can be often pulled closer to native with higher GDT-HA score; but improvement for the models of incorrect folds (TM-score<0.5) are much less pronounced. These data indicate that template-based fragmental distance maps essentially re-shaped the MD energy landscape from golf-course-like to funnel-like ones in the successfully-refined targets with a radius of TM-score~0.5. These results demonstrate a new avenue to improve high-resolution structures by combining knowledge-based template information with physics-based MD simulations.
Protein structure prediction; protein structure refinement; distance restraints; hydrogen bonding; molecular dynamics
The FALC-Loop web server provides an online interface for protein loop modeling by employing an ab initio loop modeling method called FALC (fragment assembly and analytical loop closure). The server may be used to construct loop regions in homology modeling, to refine unreliable loop regions in experimental structures or to model segments of designed sequences. The FALC method is computationally less expensive than typical ab initio methods because the conformational search space is effectively reduced by the use of fragments derived from a structure database. The analytical loop closure algorithm allows efficient search for loop conformations that fit into the protein framework starting from the fragment-assembled structures. The FALC method shows prediction accuracy comparable to other state-of-the-art loop modeling methods. Top-ranked model structures can be visualized on the web server, and an ensemble of loop structures can be downloaded for further analysis. The web server can be freely accessed at http://falc-loop.seoklab.org/.
We explored the energy-parameter space of our coarse-grained UNRES force field for large-scale ab initio simulations of protein folding, to obtain good initial approximations for hierarchical optimization of the force field with new virtual-bond-angle bending and side-chain-rotamer potentials which we recently introduced to replace the statistical potentials. 100 sets of energy-term weights were generated randomly, and good sets were selected by carrying out replica-exchange molecular dynamics simulations of two peptides with a minimal α-helical and a minimal β-hairpin fold, respectively: the tryptophan cage (PDB code: 1L2Y) and tryptophan zipper (PDB code: 1LE1). Eight sets of parameters produced native-like structures of these two peptides. These eight sets were tested on two larger proteins: the engrailed homeodomain (PDB code: 1ENH) and FBP WW domain (PDB code: 1E0L); two sets were found to produce native-like conformations of these proteins. These two sets were tested further on a larger set of nine proteins with α or α + β structure and found to locate native-like structures of most of them. These results demonstrate that, in addition to finding reasonable initial starting points for optimization, an extensive search of parameter space is a powerful method to produce a transferable force field.
protein folding; thermodynamic hypothesis; coarse-grained models; UNRES force field; force-field parameterization
Although most folding intermediates escape detection, their characterization is crucial to the elucidation of folding mechanisms. Here we outline a powerful strategy to populate partially unfolded intermediates: A buried aliphatic residue is substituted with a charged residue (e.g., Leu→Glu−) to destabilize and unfold a specific region of the protein. We apply this strategy to Ubiquitin, reversibly trapping a folding intermediate in which the β5 strand is unfolded. The intermediate refolds to a native-like structure upon charge neutralization under mildly acidic conditions. Characterization of the trapped intermediate using NMR and hydrogen exchange methods identifies a second folding intermediate and reveals the order and free energies of the two major folding events on the native side of the rate-limiting step. This general strategy may be combined with other methods and have broad applications in the study of protein folding and other reactions that require trapping of high energy states.
Psi-analysis; ubiquitin; protein engineering; native-state hydrogen exchange; small angle X-ray scattering; NMR
Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availability of short fragments in structure databanks.
We studied the representation of protein loop fragments up to 14 residues in length. All possible query fragments found in sequence databases (Sequence Space) were clustered and cross referenced with available structural fragments in Protein Data Bank (Structure Space). We found that the expansion of PDB in the last few years resulted in a dense coverage of loop conformational fragments. For each loops of length 8 in the current Sequence Space there is at least one loop in Structure Space with 50% or higher sequence identity. By correlating sequence and structure clusters of loops we found that a 50% sequence identity generally guarantees structural similarity. These percentages of coverage at 50% sequence cutoff drop to 96, 94, 68, 53, 33 and 13% for loops of length 9, 10, 11, 12, 13, and 14, respectively. There is not a single loop in the current Sequence Space at any length up to 14 residues that is not matched with a conformational segment that shares at least 20% sequence identity. This minimum observed identity is 40% for loops of 12 residues or shorter and is as high as 50% for 10 residue or shorter loops. We also assessed the impact of rapidly growing sequence databanks on the estimated number of new loop conformations and found that while the number of sequentially unique sequence segments increased about six folds during the last five years there are almost no unique conformational segments among these up to 12 residues long fragments.
The results suggest that fragment based prediction approaches are not limited any more by the completeness of fragments in databanks but rather by the effective scoring and search algorithms to locate them. The current favorable coverage and trends observed will be further accentuated with the progress of Protein Structure Initiative that targets new protein folds and ultimately aims at providing an exhaustive coverage of the structure space.
A simple model is presented that describes general features of protein folding, in good agreement with experimental results and detailed all-atom simulations. Starting from microscopic physics, and with no free parameters, this model predicts that protein folding occurs remarkably quickly because native-like states are kinetic hubs. A hub-like network arises naturally out of microscopic physical concerns, specifically the kinetic longevity of native contacts during a search of globular conformations. The model predicts folding times scaling as τf ~ eξN in the number of residues, but because the model shows ξ is small, the folding times are much faster than Levinthal’s approximation. Importantly, the folding timescale is found to be small due to the topology and structure of the network. We show explicitly how our model agrees with generic experimental features of the folding process, including the scaling of τf with N, two-state thermodynamics, a sharp peak in CV, and native-state fluctuations.
Protein Folding; Kinetic Hub; Master Equation; Markov State Model; Analytical; Folding Time
The “protein folding problem” consists of three closely related puzzles: (a) What is the folding code? (b) What is the folding mechanism? (c) Can we predict the native structure of a protein from its amino acid sequence? Once regarded as a grand challenge, protein folding has seen great progress in recent years. Now, foldable proteins and nonbiological polymers are being designed routinely and moving toward successful applications. The structures of small proteins are now often well predicted by computer methods. And, there is now a testable explanation for how a protein can fold so quickly: A protein solves its large global optimization problem as a series of smaller local optimization problems, growing and assembling the native structure from peptide fragments, local structures first.
structure prediction; funnel energy landscapes; CASP; folding code; folding kinetics
The discovery that RNA molecules can fold into complex structures and carry out diverse cellular roles has led to interest in developing tools for modeling RNA tertiary structure. While significant progress has been made in establishing that the RNA backbone is rotameric, few libraries of discrete conformations specifically for use in RNA modeling have been validated. Here, we present six libraries of discrete RNA conformations based on a simplified pseudo-torsional notation of the RNA backbone, comparable to phi and psi in the protein backbone. We evaluate the ability of each library to represent single nucleotide backbone conformations and we show how individual library fragments can be assembled into dinucleotides that are consistent with established RNA backbone descriptors spanning from sugar to sugar. We then use each library to build all-atom models of 20 test folds and we show how the composition of a fragment library can limit model quality. Despite the limitations inherent in using discretized libraries, we find that several hundred discrete fragments can rebuild RNA folds up to 174 nucleotides in length with atomic-level accuracy (<1.5Å RMSD). We anticipate the libraries presented here could easily be incorporated into RNA structural modeling, analysis, or refinement tools.
RNA structure; RNA backbone conformation; RNA fragment library; RNA modeling
Many non-coding RNAs fold into complex three-dimensional structures, yet the self-assembly of RNA structure is hampered by mispairing, weak tertiary interactions, electrostatic barriers, and the frequent requirement that the 5′ and 3′ ends of the transcript interact. This rugged free energy landscape for RNA folding means that some RNA molecules in a population rapidly form their native structure, while many others become kinetically trapped in misfolded conformations. Transient binding of RNA chaperone proteins destabilize misfolded intermediates and lower the transition states between conformations, producing a smoother landscape that increases the rate of folding and the probability that a molecule will find the native structure. DEAD-box proteins couple the chemical potential of ATP hydrolysis with repetitive cycles of RNA binding and release, expanding the range of conditions under which they can refold RNA structures.
RNA folding; kinetic partitioning; ribozyme; RNA chaperone; DEAD-box
Fragment assembly is a powerful method of protein structure prediction that builds protein models from a pool of candidate fragments taken from known structures. Stochastic sampling is subsequently used to refine the models. The structures are first represented as coarse-grained models and then as all-atom models for computational efficiency. Many models have to be generated independently due to the stochastic nature of the sampling methods used to search for the global minimum in a complex energy landscape. In this paper we present , a fragment-based approach which shares information between the generated models and steers the search towards native-like regions. A distribution over fragments is estimated from a pool of low energy all-atom models. This iteratively-refined distribution is used to guide the selection of fragments during the building of models for subsequent rounds of structure prediction. The use of an estimation of distribution algorithm enabled to reach lower energy levels and to generate a higher percentage of near-native models. uses an all-atom energy function and produces models with atomic resolution. We observed an improvement in energy-driven blind selection of models on a benchmark of in comparison with the AbInitioRelax protocol.
Flexible peptides that fold upon binding to another protein molecule mediate a large number of regulatory interactions in the living cell and may provide highly specific recognition modules. We present Rosetta FlexPepDock ab-initio, a protocol for simultaneous docking and de-novo folding of peptides, starting from an approximate specification of the peptide binding site. Using the Rosetta fragments library and a coarse-grained structural representation of the peptide and the receptor, FlexPepDock ab-initio samples efficiently and simultaneously the space of possible peptide backbone conformations and rigid-body orientations over the receptor surface of a given binding site. The subsequent all-atom refinement of the coarse-grained models includes full side-chain modeling of both the receptor and the peptide, resulting in high-resolution models in which key side-chain interactions are recapitulated. The protocol was applied to a benchmark in which peptides were modeled over receptors in either their bound backbone conformations or in their free, unbound form. Near-native peptide conformations were identified in 18/26 of the bound cases and 7/14 of the unbound cases. The protocol performs well on peptides from various classes of secondary structures, including coiled peptides with unusual turns and kinks. The results presented here significantly extend the scope of state-of-the-art methods for high-resolution peptide modeling, which can now be applied to a wide variety of peptide-protein interactions where no prior information about the peptide backbone conformation is available, enabling detailed structure-based studies and manipulation of those interactions.