|Home | About | Journals | Submit | Contact Us | Français|
We perform all-atom computer simulations on nearly one hundred 6-, 8-, 10-, and 12-mer peptide fragments of protein G, and look for stable states. We simulated by replica-exchange molecular dynamics using Amber7 with the parm96 force-field and a GB/SA (generalized-Born/solvent accessible) implicit solvent model. We find that useful diagnostics for identifying stable converged structures are the conformational entropy and free energy of each state. A large gap in the ground-state free-energy, and a low entropy indicate convergence to a single preferred peptide conformation. We find that a non-negligible fraction of such structures have some native-like character. Such physics-based modeling may be useful for identifying early nuclei in folding kinetics and for assisting in protein-structure prediction methods that utilize the assembly of peptide fragments.
A long-standing goal of computational biology has been to devise a computer algorithm that takes, as input, an amino acid sequence and gives, as output, the three-dimensional native structure of a protein. A main motivation is to make drug discovery faster and more efficient by replacing slow expensive structural biology experiments with fast cheap computer simulations. There are many successful bioinformatics approaches to this problem.1–5 Those methods draw heavily from knowledge bases of known protein native structures. Our interest here is in whether purely physical all-atom force field models are capable of identifying native-like starting points from small peptide fragments.
Short peptide fragments of proteins often have intrinsic propensities for the formation of their native conformations. NMR experiments6 show that long peptide fragments have native-like conformations.7–11 Some short peptides in solution have also been shown to adopt their native secondary structures: α helices12, 13 and β hairpins.14–18 As a consequence, peptide conformational propensities that are taken from the protein databank (PDB) are now widely used in protein-structure prediction algorithms. Most current protein structure prediction methods make some use of database-derived conformational preferences. A popular set of peptide fragment conformations is the I-sites library of David Baker and his coworkers19, 20 From the recent CASP (Critical Assessment of Techniques for Structure Prediction) protein structure prediction competition, it was noted that most of the successful de novo methods start with small fragments21, 22 which are then combined together into a predicted tertiary structure.
However, it would be desirable to achieve high-resolution protein structure prediction in models that are purely physics-based, for various reasons. Such predictions would not rely on information contained in protein structure databases. First, it would put our understanding of protein structures and driving forces on a deeper and more physical foundation. For example, such methods could elucidate the physical routes of protein folding. Second, it would allow the prediction of non-native states, too, those that are of interest for protein folding kinetics and stability. Third, it would allow us to treat induced fit binding of ligands, or other conformational changes.
There is good evidence that physical models can capture these conformational propensities of peptides. Simple physical models can reproduce the structural biases of certain peptide fragments.23–26 To date, however, such studies have largely focused on selected peptides that are expected to fold. Moreover, several models of protein folding kinetics are premised on the idea that folding routes begin with metastable structures of small peptide fragments.27–30 In recent work Ho et al31 simulated 133 peptide 8-mer fragments from six different proteins. In that work, it was found that more than 30 percent of the peptide fragments converge to a preferred structure, some of which are native-like. In the present study, we extend beyond that work, through a systematic exploration of different fragment chain lengths
The present study provides a test of physics-based all-atom simulations – the quality of the force field and solvation model, and the adequacy of current typical levels of sampling. There are well known problems with commonly used molecular mechanics force fields. Yoda et al.32 conducted multicanonical simulations of several small peptides (the α-helical C-peptide of ribonuclease A and the C-terminal β-hairpin of protein G) by using six common force fields (AMBER94, AMBER96, AMBER99, CHARMM22, OPLS/AA/L, and GROM0S96) in explicit solvent and concluded that all of these force fields have different propensities to form secondary structures because of the differences in backbone torsional energies. Also, it is still largely an open question as to how well these implicit solvent models can predict the thermodynamics as well as the kinetics of protein folding. Zhou33 tested different combinations of force fields and solvation models on the C-terminal β-hairpin of protein G. He found the best balanced combination to be AMBER96/GBSA, so we use that here.
Here, we study 26 6-mer peptide fragments from protein G, 25 8-mers, 24 10-mers and 23 12-mers. We use replica-exchange molecular dynamics sampling34 in Amber735, with the parm96 parameters36 and the GB/SA implicit solvent model of Tsui and Case37. We are interested in whether this physical model can identify native-like secondary structures in peptide fragments. We systematically generate a series of peptide fragments with overlapping sequences from the original protein sequence. Neighboring fragments have a 4–10 residue overlap (and two-residue gap). We simulate each peptide using 16 replicas for 5 ns/replica.
We utilized replica-exchange molecular dynamics (REMD)34 using a custom PERL script wrapper (http://www.dillgroup.ucsf.edu/~jchodera/code/rex) around the SANDER molecular dynamics program for the Amber7 molecular-modeling package.35 REMD periodically attempts to exchange conformations between independent molecular dynamics simulations running in parallel at different temperatures, based on a Metropolis-like criterion. This allows individual replicas to heat up to overcome barriers and then cool back down to temperatures of interests. REMD has two advantages. It explores more conformational space then conventional molecular dynamics techniques.38 And, REMD samples from the canonical ensemble at each temperature. This sampling gives proper estimates of free energies, not just energies. We used 16 replicas exponentially spaced between 270 K and 690 K. The probability of exchange-acceptance was approximately 50%. Exchanges were attempted every 1 ps. Between exchange attempts energy-conserving molecular dynamics was used with a 2 fs time step. After each exchange attempt the velocities were randomized from the appropriate Maxwell-Boltzmann distributions to ensure sampling from the canonical ensemble at the appropriate temperature.
The protein is chopped into overlapping fragments 6 to 12 residues in length. Fragments were modeled with the Amber Parm96 force field36 with GB implicit solvent model and surface area penalty term of 5 cal · mol−1 · Å−2 of Tsui and Case.37 All fragments were capped at N and C termini with acetyl and N-methylamine blocking groups to avoid undue influence from the zwitterionic termini. The fragments were initialized in extended state. The bonds to hydrogens were constrained with the SHAKE.39 Simulations were run for 5 ns per replica. Configurations were stored every 1 ps for analysis.
The results were analyzed by the weighted histogram analysis (WHAM).40, 41 In order to extract thermodynamic observables for a target temperature T we must reweight the configurations taken from each temperature Tk in order to combine them into a representative ensemble.41
We first calculate the dimensionless free energy fk for each replica k. We start with an approximate value for fk and calculate the density of states ΩkE with energy E in replica k as
where NkE is number of sampled configurations with energy E from replica k. inverse temperature, kB Boltzmann constant and Nkl number of configurations of replica k at temperature Tl. From the updated density of states we can calculate an updated estimate of dimensionless free energy by
We iterate equations (1) and (2) until the free energy converges. Then we can use the these data to reweight the relative free energy profile F of state i to the inverse target temperature or we can calculate the estimator of the expectation for any observable.41
To analyze our date, we form discrete bins of the backbone degrees of freedom31. This process has a long history, dating back to the original work of Ramachandran et al,42 who divided the backbone − ψ angles into three district regions, which are known as α, αL and β We describe the conformation of the peptide backbone as a string of discrete mesostates that we call a mesostring. Each mesostring describes a state that is separated by an energy barrier from other mesostrings. This means that each mesostring corresponds to local minima in the conformation space of the peptide backbone. When we know all mesostrings it is easy to find the one with lowest energy and get structure from the lowest energy basin. This partitioning in the terms of discrete regions in the backbone angles has been observed in a molecular dynamics simulation of an α-helical peptide.43 For mesostring analysis we cannot use the database analysis to define the boundaries of the backbone mesostates because current force fields cannot replicate the database distribution of − ψ angles.31 Ho and coworkers31 define the boundaries of the backbone mesostates in the terms of the same force field we use here. They ran replica-exchange simulations of the alanine dipeptide and the glycine dipeptide to define the boundaries of different mesostates. They break up the Ramachandran plot in terms of the following mesostates:
and for glycin
With the use of dimensionless free energies fk (Eq. (2)) we reweight the free energy profile Fi of mesostring i at the inverse target temperature β as
After using WHAM to calculate the relative free energies Fi of the mesostrings i we calculate the probabilities pi of this mesostring by
When each simulation is completed, each fragment will have different populations of the various mesostrings and also different free energies. The ground mesostring is the mesostate with the highest population and the lowest free energy. We then determine whether each peptide finds a metastable structure. We define the existence of the structure in the fragment in terms of two properties of mesostring. First, we use the probability pi of the ground mesostring. Second, we use the free energy gap ΔF between the ground mesostring and the next mesostring. When the ground mesostring is nearly identical to next mesostring, it indicates a lack of preference of the force field for a particular structure. If the most populated (ground mesostring) differs by only one mesostate, we group them into a consensus mesostring, which contains one indefinite mesostate signified by [−]. When we group similar mesostrings into consensus mesostring we calculate the free energy difference to another mesostring j by
The backbone entropy is calculated using the Boltzmann formula
The backbone entropy is useful for measuring for a given fragment the sharpness of the distribution of probabilities of the mesostring. The more peaked the distribution is and, thus the more favored the mesostring is, the lower is the backone entropy. In this way, the backbone entropy indicates whether any one conformation is substantially favored over the others for given fragment. The fragments having low mesostring entropy can be considered as early folding nuclei to initiate folding.
The three-dimensional structure of a protein may be compactly represented as a symmetrical, square matrix of pairwise, inter-residue contacts, called the contact map.44,45 The contact map provides useful information about the protein's secondary structure and it also captures non-local interactions giving clues to its tertiary structure.
We define that two residues xi and xj in a protein are in a contact if the distance dij between α-carbon atoms of the residues xi and xj is lower then some threshold value. In our case we chose 7 Å. Distance dij is defined as
where ri and rj are the coordinates of α-carbon atoms. A contact map for protein with N residues is an N × N binary matrix S in which the elements Sij are defined as
Note that we require a minimum sequence distance of 4 to call it a contact.
In this study, we chopped protein G in 26 peptide 6-mer fragments, 25 8-mers, 24 10-mers and 23 12-mers. We systematically generate a series of peptide fragments with overlapping sequences from the original protein sequence. Neighboring fragments have a 4–10 residue overlap (and a two-residue gap). Table 1 shows the sequences of the fragments. We simulated each peptide using 16 replicas for 5 ns per replica. We clustered our conformations analyzed them using mesostring analysis. We looked for structural biases in each peptide. We used two criteria. First, we use the free energy gap ΔF between the ground mesostring and the next mesostring with higher free energy. Second, we use the probability p1 to determine the population of the ground mesostring. We identified a stable structure if ΔF > 0.6 kcal/mol and p1 > 38%, which is slightly differently then was used previously by Ho et al.31 We used a 1 ns window for analyzing the mesostrings. Tables 2–5 show data for different fragments at 4 ns from start of simulation in 1 ns window. The lines in bold indicate stable structures. A key finding, consistent with that of Ho et al is the persistence of substantial stable structure even in such short peptides, studied here over a more extensive set of peptides and over a systematic series of different chain lengths: in 27% of 6-mer fragments, 20% of 8-mers, 17% of 10-mers and 30% of 12-mers. We classified the structures of stable fragments using the definitions of Ho et al.31
Out of 98 simulated fragments, we found 22 that have structural preferences. We found six structured 6-mers each form a helical-turn, and one other forms a reverse turn. Similarly, for 8-mers, three fragments for a helical turn and one other forms a reverse turn. For 10-mers, all but one is in a helical turn structure. The 12-mers can form more complex structures, often a combination of a helical-turn and reverse-turn. Only one of the 12-mers has a stable helical structure. While it has been generally supposed that peptides this short are unlikely to have stable structure, our work is consistent with the conclusions from the previous more limited studies of Ho et al.
In general there are 3 places in protein G where we find structured fragments: the N-terminal β-hairpin, the helix, and the C-terminal β-hairpin. At the N-terminal, the turn in the β-hairpin is predicted by the ground state of the 8-mer called frag4. For several other fragments, this turn is among the highly populated structures. Within the helix, we find two 6-mers and one 10-mer with very nativelike structure. Two 12-mers are also stable, but with non-native structures. In the C-terminal β-hairpin, all our stable fragments adopt a helical-turn, the 8-mer of which is the most stable. The isolated C-terminal β-hairpin has been found experimentally to be stable,14 where this stability is reflected in the structural bias found in the peptide fragments of the hairpin-turn. These structured fragments also are consistent with a study of Minor and Kim,46 who replaced the α-helix sequence with a sequence based on the C-terminal hairpin. The mutant was able to fold into the same topology, showing that there is a helical propensity in the C-terminal hairpin. In this study we find helical-turns in both the α-helix and the turn of the C-terminal hairpin, which demonstrates the interchangeability of these two sequences in our simulations. Our 6-mers frag10 and frag11 and our 10-mer frag14 all show highly native-like helical structures, with very low RMSD to native and low entropy.
On figures 1–4 we plotted contact maps for native protein G and stable fragments of different sizes. We found a substantial number of stable fragments in the right places and they predict at least some of the contacts correctly. We also see some stable fragments in the places where native protein G does not have contacts. These tend to be locations where the backbone is transitioning from one structural element to another.
We also tested the proposition that keeping the five best mesostrings for a given peptide, based on our entropy and free energy analysis, might capture native-like structures in the fragments. Results are shown in figures 5–8. Figure 5 shows a 10-mer helix for which all the top conformations are native-like. In figure 6 we show structures of a 12-mer in the C-terminal β-hairpin. Here, we find a competition between helical-turn structures and the correct hairpin. The helix population is about 40% of all mesostrings, the hairpin is about 50% and there is a small population of other conformations. In short, in this case, by keeping the top 2 or 3 of our best conformers, they have the potential to ultimately grow into the native structure. Figures 7 and and88 show an 8-mer and 12-mer from the N-terminal hairpin. In this case, all the top conformations form the correct hairpin. Out of our total of 98 simulated peptides, we find both the correct native topology and structures that are less than 3 Angstroms RMSD from the native in 3 6-mers (frag4, frag10 and frag14), one 8-mer (frag4) and one 10-mer (frag14). Other fragments meet one criterion or the other, but not both. Hence, we find that these short physical simulations have a significant ability to identify useful nativelike structures.
In this work, we have applied replica-exchange molecular dynamics using AMBER96 force field with the GB/SA solvent model to simulate protein fragments 6–12 residues in length. Out of simulated 98 fragments, 22 were structured. Despite the fact that the simulations are short in time and are on peptides that are very short in length, nevertheless, these physics-based computations are capable of picking out some native-like structures. This may be useful for simulations of protein folding kinetics, or for physics-based native protein structure predictions.
This work was supported by the Slovenian Research Agency (Research Program P1–0201 and P1–0010) and NIH grant GM34993. T. U. was supported through the Young Researcher Program.