Benchmark of FG-MD on structural refinement of I-TASSER models
The flowchart of FG-MD is showed in , which consists of three steps of fragment structure identification, simulated annealing MD refinement simulation, and final model selection (see METHODS for detailed descriptions).
Flow chart of the FG-MD protocol.
To benchmark the performance of FG-MD approach, we collected a test set of 181 non-redundant proteins from the PDB which are non-homologous to the training proteins of our MD force field. It includes 82 α, 28 β and 71 α/β single-domain proteins ranging in length from 64 to 222 residues. The initial structural models were generated by the I-TASSER pipeline, which represents a typical multiple template reassembly approach (Roy et al., 2010
; Zhang, 2007
). Although the I-TASSER assembly simulations have drawn the threading templates significantly closer to the native (with the average RMSD reduced by 1.15 Å and average TM-score increased by 12% compared to the best threading template), the structure models were found overly compact and have considerable unphysical distortions in the local structures. On average, there are 50 steric clashes between the heavy atoms of the models and only 48.07% of native H-bonds are retrieved on the I-TASSER models ().
Summary of FG-MD refinement on a benchmark set of 181 proteins.
To examine in detail the effect of various energy terms and spatial restraints, we split FG-MD into five different runs:
- MD: Simulated annealing MD simulations using AMBER99 force field and a knowledge-based Cα repulsive potential (Eqs. 3 and 4);
- MD+HB: MD simulations with hydrogen-bonding optimization guided by a knowledge-based HB potential (Eq. 2);
- MD+HB+DRM: MD+HB simulation guided by the distance restraints taken from the initial I-TASSER model (First term of Eq. 1);
- MD+HB+DRM+DRT: MD+HB+DRM simulation with additional restraints taken from global template searched by TM-align from the PDB (Second term of Eq. 1).
- MD+HB+DRM+DRFG: MD+HB+DRM simulation with additional restraints taken from structural fragments searched by TM-align from the PDB (Third term of Eq. 1).
- MD+HB+DRM+DRT+DRFG: MD+HB+DRM simulation with additional distance restraints taken from both global templates and local structural fragments searched from the PDB.
summarizes the average results on the 181 target proteins. First, the MD simulations show an apparent power for removing the steric clashes between all heavy atoms (with the average number of clashes per protein reducing from 50 to 2), due to the strong repulsive term of Lennard-Jones potential in AMBER99 force field and the knowledge-based Cα
repulsive term. However, the quality of the global topology as measured by GDT-HA (Zemla, 2003
) and TM-score (Zhang and Skolnick, 2004
) was considerably degraded, i.e. GDT-HA reduces by 30.8% and TM-score by 12.3%; the HB-score is also reduced by 31%. The RMSD of the initial models was increased by 0.75 Å. We note that running MD simulations without the knowledge-based Cα
repulsive potential results in models with similar RMSD, GDT-HA, TM- and HB-scores. But the additional repulsive potential significantly speeds up the convergence of the MD relaxing simulations.
The MD+HB approach was formulated by using backbone H-bonding optimization with a newly developed knowledge-based HB potential (). Because part of the native hydrogen bonding network was recovered, the corresponding HB-score increases by 45% (from 0.3668 to 0.5312). The GDT-HA and TM-scores are also increased by 7.7% and 3.2% respectively and RMSD decreases by 0.17 Å, which demonstrate a correlation between the hydrogen bonding network and the global topology score, especially for beta-proteins due to the long-range H-bonding between beta strands.
The illustration and statistics of the hydrogen-bonding potential used in FG-MD.
Although the models were improved by MD+HB, the global topology was still further away from the native than the initial models as judged by the GDT-HA and TM-score and RMSD. We therefore applied the distance maps collected from the starting models to guide the MD simulations with the purpose to constrain the simulation near the initial structures. Indeed, the restraints improved GDT-HA, TM-score and RMSD to a similar level of the starting models (GDT-HA: 0.4901 vs 0.4908, TM-score: 0.7163 vs 0.7164, and 4.971 vs 4.980 Å). But HB-score was slightly lower than that by MD+HB, due to the bending to the starting models which have a distorted H-bonding network.
In the MD+HB+DRM+DRT simulation, we added the distance restraints taken from the high-resolution PDB structures which are closest to the initial model as searched by the structure alignment program TM-align (Zhang and Skolnick, 2005
). Because the experimental structures have ideal local structures with standard H-bonding networks, we anticipated that the inclusion of the template-based restraints could improve the quality of the local structures; this was indeed verified by the increased HB-score and a slight improvement in the TM- and GDT-HA scores and RMSD.
In the MD+HB+DRM+DRFG simulation, we split the sequences into smaller fragments and used the substructures of the initial model as probes to search for high-resolution template structures from the PDB. The distance map restraints taken from the fragmental templates were then incorporated into the MD simulations. Overall, the restraints from local fragments outperformed those from global templates, and better TM-score, GDT-HA, RMSD and HB-scores were achieved in the final model structures.
Finally, in the MD+HB+DRM+DRT+DRFG, we combined the distance restraints taken from both global templates and local structural fragments. The models generated by this approach have achieved the highest TM-score, GDT-HA and HB-scores and lowest RMSD among all the simulations ().
In , we presented a scattering distribution of GDT-HA, TM-score and HB-score improvements versus TM-score of the starting models. While the points were scattered both below and above the dashed reference line, there were obviously more proteins with improved scores than that with deteriorated ones. The average improvement was showed in the solid line in each plot. For 181 benchmarking targets, 71% (128/181) of the targets have a GDT-HA improvement, 65% (117/181) of the targets have a TM-score improvement, and 92% (167/181) of the targets has a HB-score improvement, where a detailed histogram of the score difference is showed in Figure S1
Figure 2 Scattering plot of the FG-MD improvements vs. TM-score of initial models: (A), GDT-HA; (B) TM-score; (C) HB-score. (D) AMBER99 (black circles) and FG-MD (grey squares) energy vs. TM-score for 77×200 refined models by MD and FM-MD simulations. (more ...)
While the quality improvement of local structures demonstrated by the HB-score seems to occur on all range of protein models, the most significant improvement of GDT-HA and TM-score was observed for the models of good starting structures (e.g. TM-score >0.5). For the models of incorrect topology (e.g. TM-score<0.5), the GDT-HA and TM-score improvements by FG-MD are less pronounced. This may indicate that the FG-MD force field landscape have a sensitive funnel shape in the region of TM-score >0.5; within the region of TM-score <0.5, however, the correlations of energy and TM-score are much weaker and the influence of local geometry repacking on the global topology refinement therefore tends to be randomized.
As an illustration, in we calculated the AMBER99 and FG-MD energy force fields based on the models from a successfully refinement example of 2bwfA, which had the TM-score improved from 0.798 to 0.804. For this target, 77 decoy models with TM-score ranging from 0.17 to 0.84 were taken from the I-TASSER simulation trajectory; 200 refinement models were then generated by MD and FG-MD simulations separately for each of the 77 initial models. The corresponding AMBER99 (black circles) and FG-MD (red squares) energy potentials for the 77×200 refined models are plotted versus the TM-score. As shown by the fitting curves (the median of the 10 lowest energy models per TM-score bin), there was almost no correlations between AMBER99 and the global topology. After the introduction of the better fragment based distance restraints (with a DMRMSD 0.036Å lower than that of the initial model), there appears an apparent funnel-like shape starting from TM-score~0.5 in the FG-MD energy landscape, which is critical to the success of structure refinement of this example.
Although encouraging, the funnel-like landscape was not always achieved in the FG-MD force field even when TM-score of the initial model is >0.5. The re-shaping of energy landscape was found to strongly rely on the quality of fragment templates although other energy terms in FG-MD contribute as well. Among the successfully improved targets, the majority of the targets were found to have a funnel-like landscape similar to (with lower energy in the near-native regions). But for the deteriorated targets, most of targets have still an energy landscape which lacks of energy-TM-score correlations. In Figure S2
, we present such example from 1z3e, where the DMRMSD of the best fragment template is 0.1 Å worse than the initial model and the FG-MD energy landscape shifts the lowest energy away from the native state, compared with the AMBER99 potential. Consequently, TM-score and GDT-HA of the initial model were deteriorated by 0.0074 and 0.0089, respectively, in this example.
Why can FG-MD refine the global topology of the protein models?
Except for the adjustment of local structures and H-bonding networks, the major driven force for the global structural refinement of FG-MD is the distance map restraints taken from the initial model, the global and fragmental templates as searched by TM-align from the high-resolution PDB structures. Generally, the distance maps from the initial models do not directly contribute to improving the topology of protein models because these restraints do not contain new information to the initial structures.
In , we plot the histogram of ΔTM-scores that is defined as the TM-score of the TM-align templates subtracting the TM-score of the initial models, where a positive ΔTM-score value indicates a better quality of template structures than the initial models. Since the initial models were used as probes in the template searching, most of the templates are closer to the initial models than to the native, i.e. there are more templates with a TM-score lower than that of the initial models. This data is consistent with the observation we obtained in the previous study (see Figure 4A of Ref (Zhang and Skolnick, 2005
)). However, since there are much more appropriate fragmental templates than the global ones for the given initial models, we obtained a better quality of fragment templates (see, the red bars in ). Overall, 24% of fragment templates have a higher TM-score than the corresponding initial models and only 4% of global templates do so.
Figure 3 Topology and restraint accuracy from global and fragment templates. (A) Histogram of ΔTM-score of global templates (black) and fragmental templates (grey). (B) Histogram of ΔDMRMSD of distance map restraints taken from the global templates (more ...)
In , we present the distribution of ΔDMRMSD which is defined as the distance map root-mean-squared-deviation (DMRMSD, Eq. 6
) from initial models subtracting the DMRMSD from the templates, where a positive ΔDMRMSD indicates a more accurate distance map restraint from templates than that from initial models. Again, most of the restraints (90%) from the global template have a lower accuracy than that from the initial templates. Interestingly, there are slightly more distance restraints from the fragment templates (52%) which have a better accuracy than that from the initial models. Since the average TM-score of the fragment templates to the native is still lower than that of the initial models, this excess in distance maps is probably attributed to the more protein-like and regular secondary structures in the experimental structures, which constitutes the major driven force for the FG-MD to refine the high-resolution protein structures.
To examine the detailed data in different categories, we split the targets into “successful” (TM-score increased after refinement) and “failed” (TM-score decreased after refinement simulation) cases, and compared the restraints from fragment and global templates separately as showed in Figure S3
. For the successful cases, 56% (65/117) of the targets have better fragments (ΔDMRMSD>0) and 10% (12/117) of the targets have better global templates (ΔDMRMSD>0). Whereas for failed cases, only 44% (28/64) of the targets have better fragment (ΔDMRMSD>0) and 2% (1/64) of the targets have better global templates (ΔDMRMSD>0). On average, the fragment templates are always better than global templates for both successful and failed cases. However, only in the successful cases, the quality of restraints outperformed that from initial models.
In , we look into the details of a typical refinement example from the E. coli
6-hydroxymethyl-7,8-dihydropterin pyrophosphokinase (PDB ID: 1eqmA). are the long-range distance maps (|i
|>10) from the initial model and that from fragmental templates, respectively, versus the distance map of the native structure, where only the residue pairs with a distance <8Å are presented. While all 199 restraints from the initial model appeared in that from the fragment structures, there were 57 new distance restraints predicted only by fragments (circles). If we define a correct distance restraint as that with
are distances from model and native, there are 91 and 188 correct restraints from initial model and fragmental templates, respectively. Thus, the fragment-based restraints outperform the initial models in both accuracy (188/256 vs. 91/199) and coverage. As a result, the GDT-HA/TM/HB-score of the initial models were increased from 0.573/0.791/0.391 to 0.6/0.796/0.529 ().
Figure 4 An example of FG-MD refinement on 1eqm. (A) Long range distance map from initial model vs. that from native (|i−j|>10); (B) Long range distance map from fragments vs. that from native. Only the distances below 8 Å are shown. Distance (more ...)
shows three additional, typical examples with major improvements from loop, helix and strand regions, respectively. For 1elw, a high quality fragment at the loop region (19I-66L) with a DMRMSD of 0.228 Å drew the RMSD/DMRMSD of the corresponding region in the initial model from 0.439/0.306 Å to 0.365/0.211 Å, which results in the GDT-HA/TM-score of the global model increasing from 0.686/0.885 to 0.720/0.898. Similar improvement was observed for the other two proteins with PDB ID 1hnl and 1djr, one with improvements occurring in the helical region and another in the stranded region, both attributing to the better fragmental templates identified in the corresponding regions (Rows 2 and 3 in ).
Figure 5 Three examples of refinement by FG-MD simulations in the benchmark test set. The superposition of fragmental template (Column 1), initial model (Column 2), refined model (Column 3) on the experimental structure in the fragmental region. Column 4 is the (more ...)
Performance of FG-MD on CASP8 refinement targets
The structure refinement category was a new addition to CASP since CASP8 (MacCallum et al., 2009
). In this category, predictors were given starting models that had been generated by the CASP structural prediction servers and judged by organizers to be among the best for each targets, and requested to refine the models in a blind mode. We tested FG-MD on refining all the 12 target proteins from the CASP8 refinement experiment. Although the FG-MD run was performed after the CASP8, we note that all templates proteins deposited in PDB after CASP8 were excluded to ensure that we are in strictly the same modeling condition as the CASP8 blind predictors.
The FG-MD refinement result is summarized in the upper part of together with the best five groups as ranked by the cumulative GDT-HA score of refined models for all the 12 models. A full list of 25 groups is given in Table S1
. As illustrated in the overall results, FG-MD is the only method which could drive the initial protein models closer to the experiment structure according to both the cumulative GDT-HA and TM-scores and the average RMSD. Since some groups submitted fewer proteins, we also calculated the average TM-score and GDT-HA on the submitted proteins and found that none of the CASP8 groups have a TM-score/GDT-HA/RMSD better than the initial models, which indeed highlights the difficulty in refining global topology of protein models.
Overall, the GDT-HA/TM-score are 1.2%/1.6% higher and RMSD is 0.164 Å lower than that by the second best LEE group. The HB-score of the FG-MD models was ranked the third position following SAM-T08-human and LevittGroup. The average number of steric clash is 0.1, lower than the models from all other top five groups. A comparison of the FG-MD models with the initial models for all 12 CASP8 protein targets are listed in , where FG-MD improve the GDT-HA and TM-score in 9 out of 12 cases, HB-score in 10 out of 12 cases. Again, the most significant improvement were observed in the region of TM-score larger than 0.5.
Figure 6 FG-MD structural refinement results on CASP8 targets. (A) Scattering plot of improvements on TM-score, GDT-HA and HB-score. (B) Scattering plot of MolProbity improvements. (C) Structural superposition of the initial model (green) and refined model (blue) (more ...)
In the CASP8 experiment (MacCallum et al., 2009
), the assessor also used the MolProbity score (Chen et al., 2010
) to count for the physically unfavorable steric overlaps, rotamer and Ramachandran outliers. We listed the MolProbity score in the last column of and . FG-MD improves the MolProbity score of the initial models for all but two targets with an average reduction by 5%.
In , we present two representative examples of the CASP8 refinements. For Target TR432, the improvements were scattered among the entire sequence including loop and regular secondary structures, which result in a 3.2% in GDT-HA, 0.9% increase in TM-score, 11.1% in HB-score and 8.8% in MolProbity score (). For TR461, however, the improvement is mainly in the beta strand region, which result in an increase of GDT-HA, TM-score and HB-score by 2.0%, 0.2%, and 8.7%, respectively ().
Blind test of FG-MD in CASP9 refinement experiment
We participated (as “ZHANG”) in the structure refinement section in the CASP9 experiment which contained 14 protein targets with length from 69 to 159 residues. 2/5/7 targets belong to α/β/αβ proteins. A summary of the refinement results for the top five groups is listed in the lower part of according to the cumulative GDT-HA score. A complete list of the CASP9 groups is listed in Table S2
. We note that the algorithm generating the ZHANG models in CASP9 was not identical to the FG-MD reported in this work since the models submitted in CASP9 were further refined by MODELLER (Sali and Blundell, 1993
), where the FG-MD refined model was used as the single template and then a simple AUTOMODEL step was performed in MODELLER. The reason for us to use MODELLER was that it could slightly improve the rotamer torsion angles according to our benchmark test. However, we found that MODELLER refinements could result in a significant increase of H-atom clashes and therefore deteriorated the MolProbity score. Therefore, we modified the FG-MD algorithm without using MODELLER in this work. In , we also listed the models by the current version of the FG-MD program. All the FG-MD refined models for the CASP9 targets, together with that for CASP8 targets and the 181 I-TASSER models, were uploaded at http://zhanglab.ccmb.med.umich.edu/FG-MD/
The groups in the lower part of were ranked based on the cumulative GDT-HA score of the first model. Only two groups, ZHANG and SEOK in CASP9, could refine the initial structures based on GDT-HA and TM-scores. However, both ZHANG and SEOK models have the MolProbity score higher than the initial models, indicating a deterioration of the local structure qualities. Especially, SEOK models showed a significant number of heavy atomic overlaps (Column 7). After removing the MODELLER refinement step, FG-MD improves the MolProbity score by 39% compared to ZHANG, which is 15% lower than the initial models. Overall, the FG-MD models demonstrate improvement in all aspects of GDT-HA, TM-score, RMSD, HB-score, heavy atom clash and MolProbity score.
In , we present the improvement of GDT-HA, TM-score, HB-score and MolProbity score over the initial models for all individual targets by ZHANG and FG-MD. Again, the results for ZHANG and FG-MD are similar for GDT-HA, TM and HB-scores; but FG-MD without the MODELLER step reduces significantly the MolProbity score over the ZHANG models. There are 11/10/10/5 out of 14 cases, the ZHANG models show improvement over the initial models according to GDT/TM/HB/MolProbity scores, and the FG-MD models do so in 12/10/5/13 cases.
Figure 7 Structural refinement results on the CASP9 targets. (A) GDT-HA score improvements. (B) TM-score improvements. (A) HB-score improvements. (A) MolProbity score improvements. (E) Structural superposition of the initial model and final model on the native (more ...)
show two typical examples of ZHANG models in CASP9. For TR614, the major improvement is in region (82N-91E) although minor improvement were scattered along the whole chain (), which resulted a final model of GDT-HA score 0.535, 1.7% higher than the initial model. For the second target TR530, the major refinement occurs at (49R-57E and 108D-115K), which results in a final model of GDT-HA score 0.709, 2.6% higher than the initial model ().