|Home | About | Journals | Submit | Contact Us | Français|
Molecular modeling of proteins including homology modeling, structure determination, and knowledge-based protein design requires tools to evaluate and refine three-dimensional protein structures. Steric clash is one of the artifacts prevalent in low-resolution structures and homology models. Steric clashes arise due to the unnatural overlap of any two non-bonding atoms in a protein structure. Usually, removal of severe steric clashes in some structures is challenging since many existing refinement programs do not accept structures with severe steric clashes. Here, we present a quantitative approach of identifying steric clashes in proteins by defining clashes based on the Van der Waals repulsion energy of the clashing atoms. We also define a metric for quantitative estimation of the severity of clashes in proteins by performing statistical analysis of clashes in high-resolution protein structures. We describe a rapid, automated and robust protocol, Chiron, which efficiently resolves severe clashes in low-resolution structures and homology models with minimal perturbation in the protein backbone. Benchmark studies highlight the efficiency and robustness of Chiron compared to other widely used methods. We provide Chiron as an automated web server to evaluate and resolve clashes in protein structures that can be further used for more accurate protein design.
The role of molecular modeling of proteins in structural and molecular biology is steadily increasing due to its usefulness in generating experimentally testable hypotheses for understanding protein function. Molecular modeling techniques are also widely used for protein design1 and fitting all-atom models to low-resolution data (e.g. electron microscopy2 and small-angle X-ray scattering3). The ever-expanding database of protein 3D structures has made comparative modeling a viable option to model proteins of unknown structure.4 However, all-atom structural modeling of proteins requires stringent quality control to ensure that the structures are physically accurate. Additionally, protein design and comparative modeling require several refinement steps to culminate in a usable structural model. Steric clash, characterized by unphysical overlap of newly positioned side-chain atoms with other side-chain and backbone atoms, is one of the prevalent artifacts in protein structures of low resolution.
Current state-of-the-art tools for protein quality control identify clashes qualitatively, precluding an understanding of their possible energetic effects on protein structure. For instance, WHAT_CHECK5,6 and Molprobity7, commonly used in protein quality control, report a steric clash based on distances between two atoms with a distance cutoff for overlap set to 0.4 Å. However, the energetic penalty of such an overlap varies widely depending on the types of atoms involved in the clash (0-10 kcal/mol). We observe that low energy clashes are present even in high-resolution structures, however the number of severe clashes is very low. Thus, in order to correctly identify severe clashes, it is important to develop a quantitative measure to evaluate the effect of clashes present in a protein, and also it is necessary to measure the extent of clashes seen in high-resolution crystal structures.
Several tools have emerged for resolution of such clashes upon identification. Steepest descent/Conjugate gradient minimization using all-atom Molecular Mechanics force fields is the most widely used method to resolve clashes in a protein structure before using the structure for further studies. However, minimization using Molecular Mechanics may not resolve severe clashes in some cases hampering subsequent Molecular Dynamics simulations. Molecular modeling tools like Rosetta are the alternate avenues for refining structures with severe clashes. These tools use knowledge-based potentials and small backbone moves to resolve clashes. However, these methods work best with smaller proteins (less than 250 residues in size).8 Tools like MMTSB9 and PULCHRA10 have emerged for structure refinement and for reconstruction of all-atom representation of proteins from Cα traces, which includes removal of clashes during refinement. In the current study, we present a method for quantitative estimation and if required, resolution of clashes in a given protein structure. To accomplish the above, we developed a protocol using discrete molecular dynamics (DMD) simulations.11,12 We also show that our protocol is more robust in comparison to other state-of-the-art tools widely used by the protein structural modeling community.
We define a steric clash in a protein as any atomic overlap resulting in Van der Waals repulsion energy greater than 0.3 kcal/mol (0.5 kBT), except i) when the atoms are bonded, ii) when the atoms form a disulfide bond or a hydrogen bond (i.e. the heavy atoms are involved in the hydrogen bond; we assign the Van der Waals radius of hydrogen to be zero), iii) when the atoms involved are backbone atoms and have separation of 2 residues (in order to accommodate the formation of tight turns). We calculate the Van der Waals repulsion energy using the non-bonded parameters from the CHARMM19 force field13, which are identical to CNS14 parameters except for carboxyl oxygen atoms. Since clashes are local structural artifacts, we reduce the search time and space by restricting the search to the local environment of a given atom. We determine clashes using the above definition by constructing a grid around the protein with the dimension of each cell larger than the largest Van der Waals interaction distance between any two atom pairs (~4.5 Å) and walking along the chain to check if the overlap of the atom under consideration with the heavy atoms in the same or adjacent cells leads to a clash. We then define ‘contacts’ as the number of such overlaps tested. The clash-energy of a protein is the sum of Van der Waals repulsion energy of all the clashes in the protein’s structure. In order to arrive at a descriptor independent of protein size, we define the clash-score, which is the clash-energy divided by the number of contacts. Thus, clash-score describes the clashes present in a protein-structure, but is independent of the size of the protein. To estimate the permissible Van der Waals repulsion in a given structure, we determine the clash-scores of high-resolution crystal structures (see below). The distribution of these clash-scores indicates the extent of clashes permissible in proteins as a consequence of tight packing. A clash-score that is deviant from the distribution for high-resolution structures would then point to clashes that are artifacts of model building rather than those inherent to the protein structure. Clash-score is acceptable if it is less than one standard deviation away from the mean on the higher side of the distribution of clash-score of high-resolution dataset of structures (which would include ~84% of the proteins in the dataset). From the distribution of clash scores of structures from the high-resolution dataset, we calculate the acceptable clash-score to be 0.02 kcal.mol−1.contact−1 (Fig. 1 B).
In order to understand the extent of clashes in protein structures and arrive at an acceptable clash-score, we constructed datasets of protein structures of various resolutions. We obtained two sets of protein structures from the Protein Data Bank (PDB)15 and one set of protein structures from Swiss-model repository.16 The sets obtained from PDB correspond to a high-resolution set (0-2.5 Å) and a low-resolution set (2.5-3.5 Å). The high-resolution set was used to arrive at the acceptable clash-score. Our high-resolution dataset comprises protein structures determined by X-ray crystallography with a reported resolution less than 2.5 Å. Other than protein chains, these structures did not contain any other biomolecules (i.e. ligands/DNA/RNA). We then split these structures into individual peptide chains and clustered them based on sequence similarity. We used individual chains because we wanted clash statistics of globular proteins and not interfaces. We considered only one representative chain from each cluster of sequences that were at least 80% similar to each other, thus creating a dataset of 4495 unique chains. We further filtered the dataset based on radius of gyration to remove non-globular peptides/proteins from the dataset. The final dataset consisted of 4311 single chains at least 25 residues long. We used Medusa17,18 to accurately place any missing side-chain atoms in these structures.
We obtained a low-resolution dataset from PDB in order to explore if clash-score was worse in low-resolution structures compared to high-resolution structures (Results). The lower-resolution dataset contains 2942 unique protein structures determined using X-ray crystallography with a resolution between 2.5Å and 3.5Å. In addition to these two datasets, we obtained a set of 1000 homology models from the Swiss-model repository of random swiss-model entries (using the CGI-perl script provided by expasy: http://www.expasy.org/cgi-bin/get-random-entry.pl?S). We filtered these structural models based on radius of gyration resulting in a final dataset of 931 structural models.
The DMD simulation methodology is described in detail elsewhere.11,12 DMD is a special type of molecular dynamics (MD) algorithm, which uses square-well potentials instead of continuous potentials. Thus, in DMD we seek to solve the ballistic equations of motion instead of Newtonian equations of motion in a system of particles. We use CHARMM19 non-bonded potentials13, EEF1 implicit solvation parameters19 and geometry-based hydrogen bond potentials in DMD11 to model various macromolecular interactions. The time unit of the all-atom DMD simulations is ~50 femtosecond (fs) and the temperature is maintained using Anderson’s thermostat.20 The rate of velocity rescaling (for maintaining temperature) depends on the simulation we perform (as shown in Fig. 2); in the present study we used either 200 ps−1 or 4 ps−1 as the rescaling rate.
We compared the performance of our technique to resolve clashes with that of Rosetta1, which is used widely for protein design and refinement. We used two different protocols for minimization of protein structures using Rosetta. In the first minimization protocol, we used the “fast relax” flag with 4 fast relaxation repeat steps per cycle. We also performed “constrained relax”, where all the backbone atoms were constrained to their initial coordinates. We performed 100 independent iterations for minimization using either protocol. For each of such independent computation we used a unique random seed to initiate the minimization routine.
We used the recently ported CHARMM13 force field in GROMACS21 simulation package for performing protein energy minimization.22 In our protocol we performed an initial conjugate gradient (CG) minimization for 1000 steps, where we considered maximum force less than 200 kJmol−1nm−1 as criteria for convergence. If the clash-score of the protein had not decreased below the acceptable clash-score during CG, we performed 2 ps MD simulation at 240 K, before an equilibrium MD simulation for 2 ps at 300 K. We did not use any cut-offs in these simulations, and used OBC (Onufriev, Bashford, Case) implicit solvation.23 We used the temperature scaling method of Ref. 24, with a time constant of 0.5 ps. We also attempted to implement a lower time constant of 0.05 ps, to mimic our DMD simulations with high rate of temperature scaling, but the results did not differ significantly between time constant of 0.5 ps and 0.05 ps in our simulations (data not shown).
To test the performance of various programs used in minimizing clashes, we used several structures from low-resolution and Swiss-model datasets. We used 20 structures from our low-resolution PDB dataset with the worst clash-scores and 50 structures from our Swiss-model dataset of homology models (25 structures less than 250 residues in length and 25 structures more than 250 residues in length) to test the ability of different programs to minimize clashes.
We define buried residues (buried core of the protein) as those that have a solvent accessible surface area (SASA) of zero Å2. We calculated SASA using the method developed by LeGrand and Merz25, but with 1024 dots on surface of each atom instead of 256 dots. We first constructed all possible hydrogen bonds in a given protein structure using the orientation-dependant hydrogen bonding potential, which is part of Medusa17,18 and iteratively checked if each of the polar atoms in the buried residues was involved in a hydrogen bond; those that did not form hydrogen bonds were counted towards buried unsatisfied donors/acceptors.
To assess the geometry of sidechains in the input structure and the structures obtained by various minimization techniques, we define a metric based on Dunbrack rotamer library26. For each residue in a given structure (except glycines and alanines), we determine the closest rotamer in the rotamer library and calculate the Z-score of the side-chain to the identified rotamer as shown below:
where χi is the ith χ angle for a given sidechain, meanχ and sdχ are obtained from the Dunbrack library for the rotamer that is closest to a given sidechain. We report the above Z-score, averaged over all the sidechains in a given protein structure (Supplementary Table V).
We explore the prevalence of clashes in protein structures and ask if the extent of clashes in a protein structure depends on the quality of the protein structure. Using our energetic definition of clashes (Methods), we first study the distribution of the energy of different clashes, setting the minimum clash-energy as half kBT (0.3 kcal/mol). We find an exponential distribution of the energy of clashes in crystal structures of proteins: most of the clashes in a structure correspond to very low repulsion energy (0.3-0.6 kcal, Fig. 1 A), and the probability of finding clashes of higher energies decreases exponentially (Fig. 1 A, inset). The clash energy distribution confirms that severe clashes are very rare in protein structures. However, a lot of low-energy clashes are observed, which may be required for close packing of the protein core and also for the formation of hydrogen bonds. Even though we exclude hydrogen-bonded atoms in enumerating clashes, during the formation of a hydrogen bond, adjacent atoms are also brought very close to each other. We also observe that the probability of finding high-energy clashes is higher in low-resolution crystal structures compared to high-resolution crystal structures (Fig. 1A, inset)
In order to obtain an aggregate clash-score for a structure, we sum up the energies of all clashes and divide it by total number of ‘contacts’. Contacts include all possible local overlaps in the protein (Methods). The clash-score is thus a single number that describes the maximum permissible extent of clashes in a protein structure. In order to determine whether the clash-score of a given protein is dependent on the resolution of its crystal structure, we compute the clash-score for all the protein structures in our datasets. Since the clash-score itself is a sum of independent variables (the energy of different clashes in a protein), the clash-scores of proteins in a dataset are expected to form a normal distribution. As expected, the clash-scores of the three datasets (high-resolution, low-resolution and homology models) fit well to normal distributions (Fig. 1 B). We find that the clash score of a given protein structure is strongly dependent on its resolution (Fig. 1 B). High-resolution crystal structures feature low clash scores, while the mean clash score of lower resolution structures is significantly higher than that of high-resolution crystal structures with the homology models having the highest clash-scores in our three datasets (Fig. 1 B). These results indicate that the increased clash-scores in lower resolution structures is more the consequence of model building than being an inherent property of the protein structure. Furthermore, distribution of clash-energies in a protein show that low-resolution structures on an average feature more number of high-energy clashes compared to high-resolution structures (Fig. 1 A, inset).
Given the differences in distribution of clash-scores of different datasets, we arrived at an acceptable clash-score as the mean plus one standard deviation of the distribution of clash-scores of high-resolution structures. From the distribution of clash-scores for high and low resolution crystal structures and homology models (Fig. 1), it is clear that we require a tool that reduces the clash-score to be within the permissible value. To check if side-chain repacking alone can reduce abnormal clash-scores, we performed side-chain optimization using SCWRL27 on 20 structures from our low-resolution PDB dataset with the worst clash-scores (Supplementary Table I). We find that just side-chain repacking is not enough to improve the clash-score in these structures, and in many cases, optimizing side-chain rotamers results in higher clash-scores compared to input structures. Thus, small perturbations in the protein-backbone may be required for resolving clashes in low-resolution structures.
To address the need for removing severe clashes in protein structures with minimal perturbations in the protein backbone, we develop a protocol using discrete molecular dynamics (DMD) simulations12 (Fig. 2). Severe clashes have high values of Lennard-Jones potential, the release of which results in high velocities of clashing atoms. The high velocities in the simulation result in a rapidly expanding simulation system with bonds between atoms being broken. Thus, we need to quench the high velocities arising due to clashes, which we achieve using a high heat exchange rate of the solute (protein) with the bath. In our simulations, we maintain the temperature by periodically scaling the velocities of particles according to Maxwell’s distribution. By using a high heat exchange rate (or the frequency of rescaling the velocities) we ensure that the high velocities of clashing atoms are quenched. In this protocol, we rescale the velocities of all particles every 5 fs instead of ~1-5 ps commonly implemented in equilibrium MD simulations. To minimize deviation from the initial structure in these simulations, we constrain the backbone and Cβatoms to their initial positions using a harmonic potential. In some cases (like enzyme active sites), the rotameric state of certain residues is important for protein function, and those may not be involved in any clashes. In such cases, we could also constrain the side-chain atoms to their initial positions in order to maintain the initial rotameric orientations of side-chain atoms. In our protocol (Fig. 2), we first perform a short DMD simulation (10 ps) at a higher temperature (0.7 ε/kB, roughly corresponds to 350 K; ε/kB is the reduced unit of temperature11), and then compute clash-score of every snapshot in the simulation trajectory to see if the system has converged to feature a clash-score less than or equal to the acceptable clash-score (Fig. 2). We report as the minimized structure, the first snapshot in the trajectory that has an acceptable clash-score. If the clash-score is not low enough during the first simulation cycle, we perform another cycle of DMD simulation at 0.5 ε/kB, to quench the system. The “quench” approach aids in the formation of contacts and hydrogen bonds in the protein core that might remain unsatisfied at higher temperatures. We repeat the alternating cycles of simulations at 0.7 ε/kB and 0.5 ε/kB until a structure with acceptable clash-score is obtained.
We benchmark our protocol by attempting to resolve clashes in protein structures with high clash-scores. We select 20 structures from our low-resolution PDB dataset with the worst clash-scores. We reduce the clash-score of all these structures within the range of high-resolution structures (less than 0.02 kcal.mol−1.contact−1), with Cα RMSD of utmost 1 Å with respect to the corresponding initial structures (Table I, Supplementary Table II; Fig. 3 shows one of the structures). The backbone and all-atom RMSD follow similar trends, but as expected, they are slightly higher than Cα RMSD (Supplementary Table III). Some of these structures initially have severe clashes that cannot be acceptable for MD simulations (see below), but we can robustly minimize those clashes using DMD and make the structures acceptable for MD simulations. In order to ensure that stabilizing interactions in the core are not sacrificed for the sake of lowered clash energy, we examine the hydrogen bonds in the core of the protein. We find that in most of the cases, the number of unsatisfied hydrogen bond donors/acceptors decrease, and the core seldom has perturbation that causes significant loss of contacts (Table II, Supplementary Table IV). When we compare the initial and DMD-minimized structures, we notice that slight perturbations in the backbone ensure that the clashes are resolved while contacts in the core are maintained intact (Fig. 3). In order to ensure that the clash minimization does not result in distortion of side-chain geometry, we calculate the Z-score of χ angles of each side-chain (Supplementary Table V). On average, we find the Z-scores of DMD minimized structures to be close to or less than the Z-scores of the input structures. We would expect the Z-scores of DMD-minimized structures to be close to the input structures since we constrain the Cβ atoms in our simulations, which maintains the side-chains close to the initial rotameric states.
From our benchmarking simulations, we conclude that our protocol based on DMD can be used as a tool for rapid and efficient minimization of steric clashes in protein structures. Our protocol (Fig. 2) is available as a web-based application – Chiron (http://chiron.dokhlab.org). Chiron accepts protein-structures and attempts to resolve clashes with minimum backbone perturbation relative to the input structure. In the web server, any given input structure is first examined for missing atoms – missing side-chain atoms are reconstructed using Medusa17,18, which uses Dunbrack rotamer libraries27. Chiron then computes the clash-score to determine if the structure requires minimization (if the clash-score is greater than 0.02). Chiron then iteratively runs constrained DMD simulations till the clash-score is acceptable. Chiron also accepts protein-structures with ligands (designated as HETATM in the input PDB files). The user can choose the ligands to be included in determination of clash-score and for further minimization. When ligands are included in an input structure, we reconstruct side-chains in context of the ligands in order to ensure that the reconstruction does not introduce clashes with the ligands.
We now compare our approach in minimizing clashes in protein structures to other widely used simulation programs using the publicly available versions of the respective programs.
We use Rosetta, one of the widely used tools for protein design, to minimize clashes in protein structures to benchmark DMD. Rosetta uses a knowledge-based energy function for treating inter-atomic interactions in proteins.1 We choose Rosetta for comparison with DMD since both the tools use the same bonded and non-bonded parameters derived from the CHARMM force field.13 We use two different protocols for clash minimization using Rosetta (see Methods) and report that we are able to resolve clashes in 8 out of 10 test cases with Rosetta, but with a higher Cα RMSD relative to the starting structure when compared to DMD (Table I, gray cells). We observe that for the structures for which Rosetta is able to successfully minimize clashes, the clash-scores are much lesser than those obtained by DMD or all-atom conjugate gradient (CG) minimization (see below). The lower clash-score of the final structure obtained using Rosetta is only a consequence of the simulation protocol: while DMD and CG/MD simulations are terminated as soon as the clash-score is less than the acceptable clash-score, the Rosetta protocol has a fixed routine which provides a single structure as an output for each iteration.
Rosetta also takes much longer CPU time to minimize clashes in the given protein structure (Table I) compared to DMD. Since Rosetta performs protein design by selectively replacing fragments of the given structure by determining the best fragment from a knowledge-driven fragment database, it is imperative that the protocol must be executed multiple times for statistically significant results. We perform 100 relaxation iterations-for every chosen protein in a parallel computing environment, with a unique random seed for every individual attempt. The runtime reported in Table I is the sum of times taken for all the independent iterations on the cluster. It is logical to compare runtimes in this manner because DMD performs minimization on a single processor in the times reported in Table I.
We also attempt to implement the “classic relax” protocol designed to perform more extensive relaxation employing small and shear moves of the protein backbone along with minimization and repacking of the side chain atoms. Since the classic relax protocol is extensive, we would expect it to take longer than the fast or constrained relax runs. Minimization of a 71 amino acid protein (1CTX) takes almost one day of CPU time, with no significant improvement in the final output compared to fast and constrained relaxation routines (data not shown). Hence, we do not perform classic relaxation of the remaining test cases. Overall, although we can use Rosetta to minimize the number of clashes in a given protein in 8 out of 10 cases tested, the downside of this approach is higher RMSD relative to the initial structure and longer computation time required for minimization compared to the other protocols tested. From the results obtained for the first ten PDB structures (Table I) and also from literature 8, we observe that Rosetta is not ideal for minimization of proteins longer than 250 residues. Hence, we do not use Rosetta to minimize the ten additional structures we consider for benchmarking DMD and CG/MD simulations (Supplementary Table II, IV).
We compare our results to those obtained using conjugate gradient (CG) minimization using all-atom forcefield (CHARMM). We perform CG minimization of the test set of proteins (Table I, Supplementary Table II; see methods for the protocol). We perform subsequent molecular dynamics (MD) simulations for those proteins whose clash scores are not reduced below the acceptable clash score using CG minimization alone. Most of the protein structures attain acceptable clash-scores with CG alone, which leads to much shorter time required for minimization compared to DMD or Rosetta. The quality of the structure is also maintained in these cases. Thus, CG is an efficient way to remove clashes in most protein structures. If the structures require subsequent MD simulations for complete minimization, the time taken is comparable to DMD.
However, in four out of twenty structures considered, CG does not converge because of unreasonable Van der Waals repulsion energy leading to high velocity, which in turn causes some of the bonds to break. We show these clashes before and after DMD minimization for one such structure (PDB ID 1GFF, Fig. 4) that does not converge due to severe clashes leading to infinite forces in the simulation system. We are able to perform DMD simulations with these structures because we employ soft potentials for non-bonded interactions coupled with frequent rescaling of velocities. Soft-core potentials are an available option in MD forcefields too, but have been implemented for free energy perturbation simulations, but not for use in CG or MD to resolve severe steric clashes. Thus, existing minimization protocol using CG/MD method is not robust when applied to structures with severe clashes, which can be resolved using DMD. Furthermore, in all the four cases in which CG/MD does not converge, the DMD minimized structures are acceptable for subsequent MD simulations.
Minimization of steric clashes with minimal backbone perturbation need not imply a final structure that has formed proper contacts in its buried core. In order to estimate the quality of the structures generated by the different relaxation techniques we use, we enumerate the number of unsatisfied hydrogen bond donors/acceptors in the buried core before and after minimization, which informs us if all polar contacts in the core are well formed. Since a very small number of unsatisfied hydrogen bond donors/acceptors are seen in the core of high-resolution crystal structures, lesser the number of unsatisfied hydrogen bond donors/acceptors in the core, better is the quality of the structure, provided the steric overlaps are absent. In our benchmark set, we find that the numbers of buried unsatisfied hydrogen bond donors/acceptors of the test set of proteins are either maintained or decreased in 19/20, 6/10 and 15/16 cases for DMD, Rosetta and CG/MD respectively (Table II, Supplementary Table IV; CG/MD does not converge in 4/20 cases). If we were to categorize these unsatisfied partners as backbone or sidechain atoms, we find that all the methods are much better at forming new backbone hydrogen bonds, but lose many hydrogen bonds involving side chain atoms. This effect could be due to removal of clashes involving side chain, while at the same time stabilizing secondary structural elements in the protein. We conclude from the analysis of unsatisfied hydrogen bond donors/acceptors that all protocols we test (DMD, Rosetta and CG/MD) maintain the hydrogen bonds in the core of the protein, and thus provide final structures that can be compared on the basis of backbone RMSD to initial structures.
In order to test whether severe steric clashes are the cause for failure of CG during minimization of four of the twenty structures we use for benchmarking, we use the structure generated by DMD upon minimization to run a 2 ps MD simulation using CHARMM in GROMACS. These simulations proceed normally, further establishing the robustness of DMD in minimizing steric clashes. The reason DMD succeeds in minimizing steric clashes in cases where CG fails is that DMD automatically defines soft potentials between clashing atoms, thus accommodating even structures with severe clashes for minimization.
To further test the different protocols, we perform minimization simulations on 25 structures of proteins smaller than 250 residues with the worst clash scores from our swiss-model dataset (see methods). We observe a trend (Supplementary Table VI) similar to that of the low resolution crystal structures: minimization using Rosetta takes longer with the minimized structure having higher RMSD to the starting structure, while DMD and CG/MD are comparable in terms of RMSD, while CG minimization alone is faster. However, there are three structures (Q03EE3, A6L8G0, P49048) where CG/MD does not converge due to severe clashes in the starting structures. Thus, simulations on homology models reiterate that DMD protocol is robust in accepting structures with severe clashes. To evaluate the robustness of these protocols in minimizing larger structures, we perform DMD and CG/MD simulations on 25 structures that are more than 250 residues long with worst clash-scores (Supplementary Table VII). We notice that CG/MD does not converge on a majority of these larger structures, mainly because of the poor quality of these structures. Most of these structures feature bond-lengths that are more than ten standard deviations away from mean, causing CG to fail, and hence we do not use these simulations for arriving at any conclusions.
Our results clearly indicate that Chiron is able to resolve unnatural clashes in low-resolution crystal structures and homology models efficiently with minimum deviation of the protein backbone from the initial structure. However, attempting to refine homology models using simulations may drive the models away from the native state while featuring an RMSD of 1 Å with respect to the input structure. Hence it is important to ensure that the minimization protocol we employ does not result in a structure that is farther away from the native state compared to the initial structure. To verify whether Chiron drives a given structure away from its native state upon minimization, we consider five homology models from CASP8 predictions featuring the worst clash-scores, to compare against their native structures that are now available in the PDB. We perform refinement and analyses similar to those described above on these five models and observe that Chiron is able to resolve clashes from the homology models within 1 Å of the initial structure and yet not drift away from the native structure (Supplementary Tables VIII and IX). We also observe that the other methods we tested present similar trend with respect to the Cα-RMSD from the native state after refinement (Supplementary Tables VIII and IX). We can conclude from this analysis that clash-refinement using Chiron does not drive the structures further away from the native state.
The purpose of the current study is to quantitatively characterize the energetic effect of steric clashes on protein structures. Our quantitative definition of clashes and systematic analysis on protein-structures of varying quality reveal that indeed steric clashes are present in all structures including high-resolution crystal structures. Based on our findings, we developed a precise metric using clash-scores of high-resolution structures to quantify the quality of a structure with respect to clashes. To resolve severe clashes robustly, we propose a new dynamics based approach that uses high temperature, high heat exchange simulations to quench high velocities arising due to clashes and. As a part of our protocol, we have successfully used soft-core potentials for clash minimization, which have so far been implemented only for free energy calculations and for enhanced sampling in traditional MD simulations.
As evidenced by our analyses, steric clashes are common structural artifacts of homology models and low-resolution crystal structures. Our elucidation of clashes in a protein structure and the distribution of clash-scores in different proteins enable one to easily determine if a given protein features abnormal steric overlaps. Thus, this study provides an important quality control parameter for protein structure that is based on data from high-resolution crystal structures. Once such abnormality has been identified, one would like to refine their structure to ensure acceptable clash-scores. Though all protein modeling programs have some techniques to resolve steric clashes, we demonstrate that they either take too long or do not converge for structures with severe clashes.
To address the lack of tools to minimize clashes in proteins, we have developed a robust and automated methodology using DMD. Implementing soft-core potentials for clashing atoms in combination with high heat-exchange coefficients, we are able to resolve clashes in proteins with severe clashes. In the process, we also ensure minimum perturbation to the protein backbone. With respect to clash minimization, DMD outperforms Rosetta in terms of RMSD and time taken for minimization. While the CG/MD protocol is faster in minimizing many structures, severe clashes cause the currently available CG/MD protocol to fail, establishing the robustness of DMD. Given these advantages over available programs, we believe that our methodology, available as an easy-to-use web server (http://chiron.dokhlab.org) provides an accessible platform to refine protein structures. We acknowledge that the currently available programs may upon customization be efficient in minimization of clashes. On the other hand, since DMD (and hence Chiron) is transferrable to proteins with any extent of clashes, the users are not required to customize Chiron for their specific structural models.
This work was supported by the American Heart Association [grant number 09PRE2090068 to S. R.]; the University of North Carolina Research Council [to F.D.]; and the National Institutes of Health [grant numbers R01GM080742, ARRA supplements GM080742-03S1, GM066940-06S1 to N. V. D]. The authors acknowledge members of Brian Kuhlman’s lab for discussions regarding Rosetta protocols.