|Home | About | Journals | Submit | Contact Us | Français|
One of critical difficulties of molecular dynamics (MD) simulations in protein structure refinement is that the physics-based energy landscape lacks of middle-range funnel to guide non-native conformations towards near-native states. We propose to use the target model as probe to identify fragmental analogs from PDB. The distance maps are then used to re-shape the MD energy funnel. The protocol was tested on 181 benchmarking and 26 CASP targets. It was found that structure models of correct folds with TM-score>0.5 can be often pulled closer to native with higher GDT-HA score; but improvement for the models of incorrect folds (TM-score<0.5) are much less pronounced. These data indicate that template-based fragmental distance maps essentially re-shaped the MD energy landscape from golf-course-like to funnel-like ones in the successfully-refined targets with a radius of TM-score~0.5. These results demonstrate a new avenue to improve high-resolution structures by combining knowledge-based template information with physics-based MD simulations.
Template based modeling (TBM) represents the most accurate method in protein structure prediction. In traditional TBM, the structural model is built by aligning the query sequence to a single protein template and then copying the structure information from the template in the aligned regions (Sali and Blundell, 1993). Thus, the final structural models are often closer to the template than to the native structure (Tramontano and Morea, 2003). In the contemporary TBM, multiple templates are often identified through meta-threading techniques (Fischer, 2003; Ginalski et al., 2003; Wu and Zhang, 2007) and full-length models are built by combining the structural fragments/restraints from multiple templates (Cheng, 2008; Raman et al., 2009; Zhang et al., 2010; Zhang, 2009a). Although the multiple template approach can build models with topology closer to the native structure than individual templates, the assembled structures often contain significant local distortions, including steric clashes, unphysical phi/psi angles and irregular H-hydrogen bonding networks, which render the structure models less useful for high-resolution functional analysis (Arakaki et al., 2004; Keedy et al., 2009). How to refine the protein structure closer to the native and meanwhile keep the physically meaningful atomic details of local structure remains a significantly unsolved problem in the field of protein structure predictions (Zhang, 2009b).
Different from the template-based structure assembly simulations that are usually implemented in reduced-level modeling, molecular dynamics (MD) simulations try to relocate every protein atom following Newton’s laws of motion (Alder and Wainwright, 1957). Due to the advantage of direct sampling of protein atoms as guided by physics-based force field, MD simulations have been widely used in the atomic-level protein structure refinements (Chen and Brooks, 2007; Fan and Mark, 2004; Floudas, 2007; Lee et al., 2001). Except for some isolated instances, however, no systematic structural improvement was achieved (Lee et al., 2001). Although it is efficient for removing steric clashes, MD simulations without restraints often drive the structure away from the native state, especially when relaxing the overly compact structural models, e.g. the structures constructed by the recombination of multiple homologous templates (Zhang, 2007, 2009b).
Recently, Zhu et al. (Zhu et al., 2008) performed replica-exchange molecular dynamics (REMD) simulations to refine 21 protein structures built by comparative modeling. The authors found that the REMD simulations could produce structures with a lower RMSD than the initial models and the best in top five models has a RMSD improvement of 0.24 Å in the secondary structure region. Although encouraging, the experiment highlights a key issue of physics-based structure refinements, i.e. no atomic potential could distinguish the near-native structures from non-native structures. However, the energy of the native structures was often found to be lower than all structure decoys. These data indicate that the current energy landscape is similar to a golf court with the native state as the deepest hole but lacks of a middle-range funnel that could guide the simulation to the native state (Zhang, 2009b).
Another noteworthy observation is recently made by Summa and Levitt (Summa and Levitt, 2007) who exploited atomic potentials from AMBER99, OPLS-AA, GROMOS96, and ENCAD, on the refinement of 75 proteins by in vacuo energy minmization. The authors found that a knowledge-based atomic contact potential outperformed all the traditional physics-based potentials by moving most protein models closer to the native state, while the physical potentials, except for AMBER99, essentially drove the decoys away from the native. The vacuum environment without solvent may be part of the reason for the failure of the molecular mechanics minimization. But this observation demonstrated the potential of combining knowledge-based potentials with physics-based force field to improve the funnel shape of the energy landscape of protein folding force field (Wroblewska et al., 2008).
In this work, we will systematically examine the ability of molecular dynamics simulations to refine protein structural models and check in particular the possibility of re-shaping the middle-range funnel of physics-based energy landscapes. We developed a new Fragment Guided Molecular Dynamics (FG-MD) algorithm, which combines the physical-based force field AMBER99 (Wang et al., 2000) with knowledge-based hydrogen-bonding and repulsive potentials. Distance maps taken from high-resolution experimental fragments are used as restraints to guide the simulated annealing MD simulations. FG-MD was extensively tested on both benchmark and CASP refinement experiments and demonstrated significant potential in atomic-level protein structure refinements.
The flowchart of FG-MD is showed in Figure 1, which consists of three steps of fragment structure identification, simulated annealing MD refinement simulation, and final model selection (see METHODS for detailed descriptions).
To benchmark the performance of FG-MD approach, we collected a test set of 181 non-redundant proteins from the PDB which are non-homologous to the training proteins of our MD force field. It includes 82 α, 28 β and 71 α/β single-domain proteins ranging in length from 64 to 222 residues. The initial structural models were generated by the I-TASSER pipeline, which represents a typical multiple template reassembly approach (Roy et al., 2010; Zhang, 2007). Although the I-TASSER assembly simulations have drawn the threading templates significantly closer to the native (with the average RMSD reduced by 1.15 Å and average TM-score increased by 12% compared to the best threading template), the structure models were found overly compact and have considerable unphysical distortions in the local structures. On average, there are 50 steric clashes between the heavy atoms of the models and only 48.07% of native H-bonds are retrieved on the I-TASSER models (Table 1).
To examine in detail the effect of various energy terms and spatial restraints, we split FG-MD into five different runs:
Table 1 summarizes the average results on the 181 target proteins. First, the MD simulations show an apparent power for removing the steric clashes between all heavy atoms (with the average number of clashes per protein reducing from 50 to 2), due to the strong repulsive term of Lennard-Jones potential in AMBER99 force field and the knowledge-based Cα repulsive term. However, the quality of the global topology as measured by GDT-HA (Zemla, 2003) and TM-score (Zhang and Skolnick, 2004) was considerably degraded, i.e. GDT-HA reduces by 30.8% and TM-score by 12.3%; the HB-score is also reduced by 31%. The RMSD of the initial models was increased by 0.75 Å. We note that running MD simulations without the knowledge-based Cα repulsive potential results in models with similar RMSD, GDT-HA, TM- and HB-scores. But the additional repulsive potential significantly speeds up the convergence of the MD relaxing simulations.
The MD+HB approach was formulated by using backbone H-bonding optimization with a newly developed knowledge-based HB potential (Figure 8). Because part of the native hydrogen bonding network was recovered, the corresponding HB-score increases by 45% (from 0.3668 to 0.5312). The GDT-HA and TM-scores are also increased by 7.7% and 3.2% respectively and RMSD decreases by 0.17 Å, which demonstrate a correlation between the hydrogen bonding network and the global topology score, especially for beta-proteins due to the long-range H-bonding between beta strands.
Although the models were improved by MD+HB, the global topology was still further away from the native than the initial models as judged by the GDT-HA and TM-score and RMSD. We therefore applied the distance maps collected from the starting models to guide the MD simulations with the purpose to constrain the simulation near the initial structures. Indeed, the restraints improved GDT-HA, TM-score and RMSD to a similar level of the starting models (GDT-HA: 0.4901 vs 0.4908, TM-score: 0.7163 vs 0.7164, and 4.971 vs 4.980 Å). But HB-score was slightly lower than that by MD+HB, due to the bending to the starting models which have a distorted H-bonding network.
In the MD+HB+DRM+DRT simulation, we added the distance restraints taken from the high-resolution PDB structures which are closest to the initial model as searched by the structure alignment program TM-align (Zhang and Skolnick, 2005). Because the experimental structures have ideal local structures with standard H-bonding networks, we anticipated that the inclusion of the template-based restraints could improve the quality of the local structures; this was indeed verified by the increased HB-score and a slight improvement in the TM- and GDT-HA scores and RMSD.
In the MD+HB+DRM+DRFG simulation, we split the sequences into smaller fragments and used the substructures of the initial model as probes to search for high-resolution template structures from the PDB. The distance map restraints taken from the fragmental templates were then incorporated into the MD simulations. Overall, the restraints from local fragments outperformed those from global templates, and better TM-score, GDT-HA, RMSD and HB-scores were achieved in the final model structures.
Finally, in the MD+HB+DRM+DRT+DRFG, we combined the distance restraints taken from both global templates and local structural fragments. The models generated by this approach have achieved the highest TM-score, GDT-HA and HB-scores and lowest RMSD among all the simulations (Table 1).
In Figures 2A–C, we presented a scattering distribution of GDT-HA, TM-score and HB-score improvements versus TM-score of the starting models. While the points were scattered both below and above the dashed reference line, there were obviously more proteins with improved scores than that with deteriorated ones. The average improvement was showed in the solid line in each plot. For 181 benchmarking targets, 71% (128/181) of the targets have a GDT-HA improvement, 65% (117/181) of the targets have a TM-score improvement, and 92% (167/181) of the targets has a HB-score improvement, where a detailed histogram of the score difference is showed in Figure S1.
While the quality improvement of local structures demonstrated by the HB-score seems to occur on all range of protein models, the most significant improvement of GDT-HA and TM-score was observed for the models of good starting structures (e.g. TM-score >0.5). For the models of incorrect topology (e.g. TM-score<0.5), the GDT-HA and TM-score improvements by FG-MD are less pronounced. This may indicate that the FG-MD force field landscape have a sensitive funnel shape in the region of TM-score >0.5; within the region of TM-score <0.5, however, the correlations of energy and TM-score are much weaker and the influence of local geometry repacking on the global topology refinement therefore tends to be randomized.
As an illustration, in Figure 2D we calculated the AMBER99 and FG-MD energy force fields based on the models from a successfully refinement example of 2bwfA, which had the TM-score improved from 0.798 to 0.804. For this target, 77 decoy models with TM-score ranging from 0.17 to 0.84 were taken from the I-TASSER simulation trajectory; 200 refinement models were then generated by MD and FG-MD simulations separately for each of the 77 initial models. The corresponding AMBER99 (black circles) and FG-MD (red squares) energy potentials for the 77×200 refined models are plotted versus the TM-score. As shown by the fitting curves (the median of the 10 lowest energy models per TM-score bin), there was almost no correlations between AMBER99 and the global topology. After the introduction of the better fragment based distance restraints (with a DMRMSD 0.036Å lower than that of the initial model), there appears an apparent funnel-like shape starting from TM-score~0.5 in the FG-MD energy landscape, which is critical to the success of structure refinement of this example.
Although encouraging, the funnel-like landscape was not always achieved in the FG-MD force field even when TM-score of the initial model is >0.5. The re-shaping of energy landscape was found to strongly rely on the quality of fragment templates although other energy terms in FG-MD contribute as well. Among the successfully improved targets, the majority of the targets were found to have a funnel-like landscape similar to Figure 2D (with lower energy in the near-native regions). But for the deteriorated targets, most of targets have still an energy landscape which lacks of energy-TM-score correlations. In Figure S2, we present such example from 1z3e, where the DMRMSD of the best fragment template is 0.1 Å worse than the initial model and the FG-MD energy landscape shifts the lowest energy away from the native state, compared with the AMBER99 potential. Consequently, TM-score and GDT-HA of the initial model were deteriorated by 0.0074 and 0.0089, respectively, in this example.
Except for the adjustment of local structures and H-bonding networks, the major driven force for the global structural refinement of FG-MD is the distance map restraints taken from the initial model, the global and fragmental templates as searched by TM-align from the high-resolution PDB structures. Generally, the distance maps from the initial models do not directly contribute to improving the topology of protein models because these restraints do not contain new information to the initial structures.
In Figure 3A, we plot the histogram of ΔTM-scores that is defined as the TM-score of the TM-align templates subtracting the TM-score of the initial models, where a positive ΔTM-score value indicates a better quality of template structures than the initial models. Since the initial models were used as probes in the template searching, most of the templates are closer to the initial models than to the native, i.e. there are more templates with a TM-score lower than that of the initial models. This data is consistent with the observation we obtained in the previous study (see Figure 4A of Ref (Zhang and Skolnick, 2005)). However, since there are much more appropriate fragmental templates than the global ones for the given initial models, we obtained a better quality of fragment templates (see, the red bars in Figure 3A). Overall, 24% of fragment templates have a higher TM-score than the corresponding initial models and only 4% of global templates do so.
In Figure 3B, we present the distribution of ΔDMRMSD which is defined as the distance map root-mean-squared-deviation (DMRMSD, Eq. 6) from initial models subtracting the DMRMSD from the templates, where a positive ΔDMRMSD indicates a more accurate distance map restraint from templates than that from initial models. Again, most of the restraints (90%) from the global template have a lower accuracy than that from the initial templates. Interestingly, there are slightly more distance restraints from the fragment templates (52%) which have a better accuracy than that from the initial models. Since the average TM-score of the fragment templates to the native is still lower than that of the initial models, this excess in distance maps is probably attributed to the more protein-like and regular secondary structures in the experimental structures, which constitutes the major driven force for the FG-MD to refine the high-resolution protein structures.
To examine the detailed data in different categories, we split the targets into “successful” (TM-score increased after refinement) and “failed” (TM-score decreased after refinement simulation) cases, and compared the restraints from fragment and global templates separately as showed in Figure S3. For the successful cases, 56% (65/117) of the targets have better fragments (ΔDMRMSD>0) and 10% (12/117) of the targets have better global templates (ΔDMRMSD>0). Whereas for failed cases, only 44% (28/64) of the targets have better fragment (ΔDMRMSD>0) and 2% (1/64) of the targets have better global templates (ΔDMRMSD>0). On average, the fragment templates are always better than global templates for both successful and failed cases. However, only in the successful cases, the quality of restraints outperformed that from initial models.
In Figure 4, we look into the details of a typical refinement example from the E. coli 6-hydroxymethyl-7,8-dihydropterin pyrophosphokinase (PDB ID: 1eqmA). Figures 4A and 4B are the long-range distance maps (|i−j|>10) from the initial model and that from fragmental templates, respectively, versus the distance map of the native structure, where only the residue pairs with a distance <8Å are presented. While all 199 restraints from the initial model appeared in that from the fragment structures, there were 57 new distance restraints predicted only by fragments (circles). If we define a correct distance restraint as that with where Dij and are distances from model and native, there are 91 and 188 correct restraints from initial model and fragmental templates, respectively. Thus, the fragment-based restraints outperform the initial models in both accuracy (188/256 vs. 91/199) and coverage. As a result, the GDT-HA/TM/HB-score of the initial models were increased from 0.573/0.791/0.391 to 0.6/0.796/0.529 (Figure 4C).
Figure 5 shows three additional, typical examples with major improvements from loop, helix and strand regions, respectively. For 1elw, a high quality fragment at the loop region (19I-66L) with a DMRMSD of 0.228 Å drew the RMSD/DMRMSD of the corresponding region in the initial model from 0.439/0.306 Å to 0.365/0.211 Å, which results in the GDT-HA/TM-score of the global model increasing from 0.686/0.885 to 0.720/0.898. Similar improvement was observed for the other two proteins with PDB ID 1hnl and 1djr, one with improvements occurring in the helical region and another in the stranded region, both attributing to the better fragmental templates identified in the corresponding regions (Rows 2 and 3 in Figure 5).
The structure refinement category was a new addition to CASP since CASP8 (MacCallum et al., 2009). In this category, predictors were given starting models that had been generated by the CASP structural prediction servers and judged by organizers to be among the best for each targets, and requested to refine the models in a blind mode. We tested FG-MD on refining all the 12 target proteins from the CASP8 refinement experiment. Although the FG-MD run was performed after the CASP8, we note that all templates proteins deposited in PDB after CASP8 were excluded to ensure that we are in strictly the same modeling condition as the CASP8 blind predictors.
The FG-MD refinement result is summarized in the upper part of Table 2 together with the best five groups as ranked by the cumulative GDT-HA score of refined models for all the 12 models. A full list of 25 groups is given in Table S1. As illustrated in the overall results, FG-MD is the only method which could drive the initial protein models closer to the experiment structure according to both the cumulative GDT-HA and TM-scores and the average RMSD. Since some groups submitted fewer proteins, we also calculated the average TM-score and GDT-HA on the submitted proteins and found that none of the CASP8 groups have a TM-score/GDT-HA/RMSD better than the initial models, which indeed highlights the difficulty in refining global topology of protein models.
Overall, the GDT-HA/TM-score are 1.2%/1.6% higher and RMSD is 0.164 Å lower than that by the second best LEE group. The HB-score of the FG-MD models was ranked the third position following SAM-T08-human and LevittGroup. The average number of steric clash is 0.1, lower than the models from all other top five groups. A comparison of the FG-MD models with the initial models for all 12 CASP8 protein targets are listed in Figure 6A, where FG-MD improve the GDT-HA and TM-score in 9 out of 12 cases, HB-score in 10 out of 12 cases. Again, the most significant improvement were observed in the region of TM-score larger than 0.5.
In the CASP8 experiment (MacCallum et al., 2009), the assessor also used the MolProbity score (Chen et al., 2010) to count for the physically unfavorable steric overlaps, rotamer and Ramachandran outliers. We listed the MolProbity score in the last column of Table 2 and Figure 6B. FG-MD improves the MolProbity score of the initial models for all but two targets with an average reduction by 5%.
In Figures 6C and 6D, we present two representative examples of the CASP8 refinements. For Target TR432, the improvements were scattered among the entire sequence including loop and regular secondary structures, which result in a 3.2% in GDT-HA, 0.9% increase in TM-score, 11.1% in HB-score and 8.8% in MolProbity score (Figure 6C). For TR461, however, the improvement is mainly in the beta strand region, which result in an increase of GDT-HA, TM-score and HB-score by 2.0%, 0.2%, and 8.7%, respectively (Figure 6D).
We participated (as “ZHANG”) in the structure refinement section in the CASP9 experiment which contained 14 protein targets with length from 69 to 159 residues. 2/5/7 targets belong to α/β/αβ proteins. A summary of the refinement results for the top five groups is listed in the lower part of Table 2 according to the cumulative GDT-HA score. A complete list of the CASP9 groups is listed in Table S2. We note that the algorithm generating the ZHANG models in CASP9 was not identical to the FG-MD reported in this work since the models submitted in CASP9 were further refined by MODELLER (Sali and Blundell, 1993), where the FG-MD refined model was used as the single template and then a simple AUTOMODEL step was performed in MODELLER. The reason for us to use MODELLER was that it could slightly improve the rotamer torsion angles according to our benchmark test. However, we found that MODELLER refinements could result in a significant increase of H-atom clashes and therefore deteriorated the MolProbity score. Therefore, we modified the FG-MD algorithm without using MODELLER in this work. In Table 2, we also listed the models by the current version of the FG-MD program. All the FG-MD refined models for the CASP9 targets, together with that for CASP8 targets and the 181 I-TASSER models, were uploaded at http://zhanglab.ccmb.med.umich.edu/FG-MD/.
The groups in the lower part of Table 2 were ranked based on the cumulative GDT-HA score of the first model. Only two groups, ZHANG and SEOK in CASP9, could refine the initial structures based on GDT-HA and TM-scores. However, both ZHANG and SEOK models have the MolProbity score higher than the initial models, indicating a deterioration of the local structure qualities. Especially, SEOK models showed a significant number of heavy atomic overlaps (Column 7). After removing the MODELLER refinement step, FG-MD improves the MolProbity score by 39% compared to ZHANG, which is 15% lower than the initial models. Overall, the FG-MD models demonstrate improvement in all aspects of GDT-HA, TM-score, RMSD, HB-score, heavy atom clash and MolProbity score.
In Figures 7A, B, C and D, we present the improvement of GDT-HA, TM-score, HB-score and MolProbity score over the initial models for all individual targets by ZHANG and FG-MD. Again, the results for ZHANG and FG-MD are similar for GDT-HA, TM and HB-scores; but FG-MD without the MODELLER step reduces significantly the MolProbity score over the ZHANG models. There are 11/10/10/5 out of 14 cases, the ZHANG models show improvement over the initial models according to GDT/TM/HB/MolProbity scores, and the FG-MD models do so in 12/10/5/13 cases.
Figures 7E and 7F show two typical examples of ZHANG models in CASP9. For TR614, the major improvement is in region (82N-91E) although minor improvement were scattered along the whole chain (Figure 7E), which resulted a final model of GDT-HA score 0.535, 1.7% higher than the initial model. For the second target TR530, the major refinement occurs at (49R-57E and 108D-115K), which results in a final model of GDT-HA score 0.709, 2.6% higher than the initial model (Figure 7F).
Atomic-level protein structural refinement represents a significantly unsolved problem in protein structure prediction. Most of the current methods based on molecular dynamics simulations drive the structural models away from the native state, mainly because of the lack of long-range correlation between the topology and the energy of the physics-based atomic potentials (Summa and Levitt, 2007; Zhang, 2009b; Zhu et al., 2008). The multiple-template based homology modeling has the potential to improve the structure of the threading templates, a fundamental issue is that the local structures of the resultant model can be seriously distorted (Fischer, 2003; Zhang, 2007). In this work, we developed a new protocol of FG-MD for high-resolution and atomic-level protein structure refinements, where spatial restraints from fragmental templates were exploited to re-shape the energy funnel of the simulated annealing MD simulations. The knowledge-based hydrogen-bonding potential is incorporated for improving the local structure refinements.
FG-MD was tested in a large-scale set of 181 benchmark proteins with initial models generated by the I-TASSER template structural reassembly approach. It was found that the distance map restraints extracted from fragmental templates has a significant higher accuracy than that obtained from the global templates; the accuracy is also higher on average than that from the initial models. As a result, progressive improvements were observed when hydrogen-bonding energy term and the distance map restraints were separately introduced to guide the refinement simulations. The average GDT-HA score of the FG-MD models was found 0.7 units higher than that of the initial models, and RMSD was reduced by 0.031 Å. The majority of the improvements happened for the initial models of correct topology, i.e. TM-score >0.5. This corresponds to a funnel shape improvement of the physics-based energy landscape from a golf-course like shape to a funnel like shape with an approximate radius of TM-score=0.5, which have been seen in most of the successfully refined targets. As part of local structure measurements, the H-bonding score of the FG-MD models increases by 12% and the number of steric clashes is reduced from 50 to 2, mainly due to the introduction of the hydrogen-bonding and repulsive energy terms in the MD simulations.
The FG-MD method was also tested in the blind test of CASP refinement experiments where starting models were generated by a variety of structure assembly methods. FG-MD was among the very few methods which could consistently bring the initial models closer to the native structure as assessed by the improved GDT-HA and TM-scores. The local structural quality of the H-bonding score, the number of steric clashes and the MolProbity score were also significantly improved, compared to the initial models. These data demonstrate a promising approach of atomic protein structure refinements by using analogical fragment templates from other experimentally solved protein structures.
The FG-MD protocol is showed in Figure 1. Starting from a target protein structure, the sequence was split into separate secondary structure elements (SSEs). The substructures of every three consecutive SSEs, together with the full-length structure, were used as probes to search through a non-redundant PDB library by TM-align (Zhang and Skolnick, 2005) for structure fragments closest to the target. The top 20 template structures with highest TM-scores (Zhang and Skolnick, 2004) were used to collect spatial restraints. Simulated annealing molecular dynamics simulations were implemented using modified LAMMPS (Plimpton, 1995) as guided by the distance map restraints and a knowledge-based hydrogen bonding potential. The final refined models were selected based on the sum of Z-score of hydrogen bonds, Z-score of the number of steric clashes and Z-score of FG-MD energy. The procedure is fully automated and the running time for each refinement target is less than 2 hour at a 2.4 GHz CPU. The FG-MD server is freely available at http://zhanglab.ccmb.med.umich.edu/FG-MD.
The FG-MD potential contains four energy terms.
The Cα distance maps were collected from three sources of initial models, global structure templates, and fragmental structure templates. The Cα distance restraint potential is written as
where rij is the distance between ith and jth Cα atoms. , and are the distance maps from the initial model, global structure template, and fragmental template, respectively. k1, k2, and k3 are the corresponding force constants with the value equal to 0.5, 0.5 and 2.0 Kcal/mole, respectively. TM-align was used to build the structural alignment between the template structures and the initial model. We found that restraints from residues in close distance are most efficient and thus we only considered the distance map with rij below 15 Å from the top 20 templates. The force parameters (and others shown below) were decided on the performance of FG-MD on an independent training set of proteins.
The definition of backbone hydrogen bond is illustrated in Figure 8A. A knowledge-based, explicit hydrogen bonding potential is constructed as:
where dij is the distance between hydrogen of the donor and oxygen of the acceptor. α is the angle of N-H-O and β is the angle of C-O-H (Figure 8A). The standard values of the H-O distance, N-H-O and C-O-H angles are derived from the statistics average of 1,383 non-redundant, high-resolution experimental structures (see Figures 8B–D). This protein set was constructed with the PISCES server (Wang and Dunbrack, 2003), with a percentage identity cutoff 20%, a resolution cutoff 1.6 Å, and a R-factor cutoff 0.25 Å. The average result is d0=1.95±0.17 Å, α0=160.0±12.2 degree and β0=150.0±17.5 degree. Because the fluctuations of the experimental values are relatively small, we took the average value with a harmonic restraint for the hydrogen bonding in Eq. 2. K4, k5 and k6 are the force constant with values equal to 2.0, 0.5 and 0.5, respectively. Only short-range H-binding potential is considered with a cutoff of dij ≤ 3Å.
The Cα repulsive potential was designed to rapidly relax compact structural models with severe Cα clashes, i.e.
where the force constant k = 200 Kcal/mole.
The standard AMBER99 force field (Wang et al., 2000) was used for local conformation stiffness:
where r, θ and ϕ are bond length, bond angle and torsion angle, respectively; and req, θeq and γ are the corresponding equilibrium values. Kr, Kθ and Vn are the force constants for bond length, bond angle and torsion angle, respectively. Aij and Bij are the Lennard-Jones parameters; qi and qj are the partial charge of atom i and j. Rij are the distance between atom pair i and j.
Although the root-mean-squared deviation (RMSD) between two structures was often used to measure the modeling accuracy, the value can be very sensitive to the local structural variations. In this work, we mainly used GDT-HA score (Zemla, 2003) and TM-score (Zhang and Skolnick, 2004) to evaluate the global topology refinement of the structural models. The GDT-HA score counts the average percentage of residues with Cα distance from the native below 0.5, 1, 2, and 4 Å, respectively, after the optimal structure superposition. The TM-score is defined as
where LN is the length of the native structure, LT is the number of common residues appearing in both compared structures, di is the distance between the ith pair of residues and d0 is a scale to normalize the match difference. Both GDT-HA and TM-score lie in [0, 1] with higher values indicating better accuracy. But GDT-HA score focuses more on the accuracy of local structure (d < 4 Å) while TM-score counts the residues of all structures. Another difference is that TM-score is length independent with TM-score >0.5 indicate proteins of the same fold (Xu and Zhang, 2010). Historically, the GDT-HA score was used as the standard topology measurement for high-resolution protein structure prediction and refinements in CASP experiments (Cozzetto et al., 2009; Keedy et al., 2009; Kopp et al., 2007; MacCallum et al., 2009).
The RMSD of Cα distance map, called DMRMSD, is introduced to evaluate the accuracy of distance restraints:
where Di is the ith distance restraints collected from structural models and is the corresponding distance on the native structure. n is the total number of collected distance restraints.
The accuracy of the hydrogen bonding network (or secondary structures) of the protein models is evaluated by the HB-score:
The hydrogen bonds in full atomic structures are defined by HBPLUS (McDonald and Thornton, 1994).
To count the unphysical overlaps of the model structures, we define a steric clash between two heavy atoms if the distance is less than 80% of the sum of Van der Waals radius, which is taken from Amber99 force field (Cornell et al., 1996).
The MolProbity score is calculated as (Chen et al., 2010):
where clashscore counts the number of unfavorable steric overlaps ≥ 0.4 Å including H-atoms, rota_out and rama_iffy are the percentages of the outliers of the side-chain rotamers and the backbone torsion angles, respectively. Lower MolProbity scores indicate more physically realistic models.
The project is supported in part NSF Career Award (DBI 1027394), and the National Institute of General Medical Sciences (GM083107, GM084222).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.