The majority of protein crystal structures are solved in the resolution range 1.7–2.8 Å, a resolution range in which the diffraction experiment does not present sufficient information to accurately place individual atoms without additional chemical information. Electron-density peaks specifically for H atoms are not observed in this resolution range owing to a low signal-to-noise ratio. Therefore, H atoms are usually not explicitly included in molecular models of protein crystal structures. A molecular model without explicit coordinates for H atoms is denoted as an united-atom model, in contrast to an all-atom model. United-atom models are frequently insufficient for molecular modeling and computational chemistry applications (such as structure-based virtual screening or lead optimization). How is the gap bridged between current best crystallographic practices and the requirements of these other disciplines for all-atom structures that include hydrogen coordinates?
A brief history of the use of H atoms and chemical restraints in protein crystal structure refinement is useful before answering this question. Jensen and coworkers (Watenpaugh et al.
) first demonstrated that moderate-resolution protein crystal structures could benefit from the reciprocal-space refinement techniques developed for use with crystal structures of small molecules at atomic resolution. They recognized the necessity of using additional chemical information combined with reciprocal-space refinement to accurately determine atomic positions in this situation.
A complete system of geometric restraints was devised for the first widely used protein reciprocal-space refinement program, PROLSQ
; Konnert & Hendrickson, 1980
; Hendrickson, 1985
). H atoms were not explicitly considered in this system.
The introduction of simulated-annealing refinement led to the widespread adoption of the program X-PLOR
). This program featured geometric restraints based on the CHARMM force field (Brünger et al.
; Brünger & Karplus, 1988
). Originally, use of this force field required an all-atom model. CHARMM-based restraints evolved in a way that removed the requirements for hydrogen coordinates. This change was associated with an alteration in the representation of nonbonded contacts from a Lennard–Jones potential to a much simpler repulsive function and the elimination of the use of electrostatic potentials. These modifications were partially motivated by electrostatic artifacts that were introduced into the structural results owing to the lack of an implicit solvent model. In addition, the long time required for computation of the complete set of nonbonded interactions was a significant impediment to the refinement of large crystal structures (Nilges et al.
; Weis et al.
). By the time that X-PLOR
was superseded by the program CNS
(Brünger et al.
), any requirement for explicit H-atom coordinates for protein crystallographic refinement had been eliminated. However, the capability to apply an electrostatic model and more complete nonbonded interactions in an all-atom model remained an essential part of CNS
for the determination of structures from NMR data (Linge et al.
Engh & Huber (1991
) brought important additional information to the definition of the geometry for protein crystal structures. Their survey of bond lengths and angles observed in small peptide crystal structures at high resolution has been uniformly adopted as a standard against which protein crystal structure models are judged. It has also become the basis for the restraint system in all of the major refinement programs.
Recent developments indicate an interest among crystallographers in the application of more complex descriptions of molecular geometry in refinement to aid in producing better models. The refinement programs REFMAC
(Murshudov et al.
) and PHENIX
(Afonine et al.
) may be employed with ‘riding H atoms’, even though the ultimate result to be deposited is a united-atom model. (Riding H atoms are those H atoms whose positions can be determined unambiguously from the positions of the non-H atoms; for example, the H atom attached to the Oγ
of a serine residue is not a riding H atom since its position depends on the torsion angle of the Cβ
bond, while the H atom on the Cα
atom of an amino acid is a riding H atom, since all torsion angles affecting its position are determined by non-H atom coordinates.) The advantages of a restraint scheme in which geometric target values for a residue depend on the torsion-angle conformation of the residue backbone have recently been demonstrated (Tronrud et al.
). Brunger and coworkers (Fenn et al.
; Schnieders et al.
) have combined the all-atom force field AMOEBA with a new refinement scheme and have described the advantages of a more complex molecular description that includes the calculation of electrostatic interactions between protein atoms. Additional recent innovations in the use of geometric information in refinement include the use of deformable elastic network refinement (Schröder et al.
), hydropathic force-field terms (Koparde et al.
) and jelly-body restraints (Murshudov et al.
Structure-validation tools for protein geometry, partially based on the Engh & Huber standard, are available in several widely used computer programs, most notably PROCHECK
(Laskowski et al.
(Hooft, Vriend et al.
(Feng et al.
) and SFCHECK
(Vaguine et al.
). These programs address close nonbonded contacts largely from a united-atom perspective. More recently, the structure-validation programs Reduce
(Davis et al.
; Chen, Arendall et al.
) have become important and popular additions to the toolkit of protein crystallographers. They are based on the concept that better judgments can be made as to the correct positioning of certain groups in the model after the addition of H atoms to a united-atom protein crystal structure and after observing their interactions. Within their software system, interpenetration of van der Waals molecular surfaces by 0.4 Å or more constitutes a clash. The authors flatly state that
Such large overlaps cannot occur in the actual molecule, but mean that at least one of the two atoms is modeled incorrectly
(Chen, Arendall et al.
At this point, the question of the source of all-atom models needed for computational work can be addressed more clearly. Currently, such all-atom models are produced by adding H atoms to the united-atom models produced by crystallography. For water molecules and for protein H atoms whose position is subject to some degree of freedom, i.e.
non-riding H atoms, either a force-field-dependent or a rule-based method is employed to determine the positions of these H atoms in order to avoid close nonbonded contacts and to form hydrogen bonds as appropriate. Nevertheless, when H atoms are added in this way to a very large majority of protein crystal structures deposited in the Protein Data Bank (Berman et al.
), multiple close nonbonded contacts between atoms are observed. One goal of this work is to document this observation and to try to understand why such interactions occur, the recent focus on protein structure validation with H atoms present notwithstanding.
The usual remedy in computational chemistry to these high-energy close contacts is to minimize the coordinates of the all-atom model against a force field, with non-H atoms restrained to their positions in the crystallography-derived model so that they do not deviate too far from their experimentally determined positions. This solution is less than ideal, because the method produces no feedback as to whether the all-atom model is still consistent with the experimental data. In other words, one does not know how far is too far. This procedure could be especially dangerous if the original clashes were caused by atoms that were significantly misplaced.
The refinement program PrimeX
was implemented partially in response to these issues. It applies well established methods of protein crystal structure refinement (Bell et al.
) combined with the all-atom OPLS force field (Jorgensen et al.
; Kaminski et al.
; Banks et al.
) for geometric restraints. Aside from the presence or absence of H atoms in the model, these OPLS-based restraints differ in two specific respects from what have become the traditional restraint systems: (i) a Lennard–Jones description of both the attractive and repulsive components of van der Waals interactions replaces the simpler repulsive term of most Engh and Huber-based restraints and (ii) electrostatic interactions are treated, including a Surface Generalized Born model to account for implicit solvent effects (Ghosh et al.
; Gallicchio et al.
; Zhu et al.
; Li, Abel et al.
). The net effect of these differences is very significant. In a simpler restraint system, the bond-length targets are each a function of a single parameter according to the atom types involved in the bond. A similar situation occurs for bond angles. However, the bond-length and bond-angle targets specified by OPLS are a function of several parameters that can all affect a single bond length or bond angle. In other words, the restraint target for a particular bond length (or angle) is contingent on the local environment of the atoms involved. Touw & Vriend (2010
) have shown that at least one type of protein bond angle is a complex function of the local environment and is not well described by a single Engh & Huber (1991
) target angle. The target geometric values in the well characterized restraint system of Karplus and coworkers depend on the local backbone conformation of the protein (Tronrud et al.
). That any particular force field can reproduce all such dependencies remains to be demonstrated, but potentially a force-field-based restraint system can more effectively adapt to local environments than current protein crystallography restraint systems.
Refinement of protein crystal structures with an all-atom model and a complete force field does much more than avoid errors whose remediation may seriously degrade the accuracy of the coordinates. The more detailed accounting for nonbonded interactions within the protein used in PrimeX can also produce a direct positive effect during refinement. While even small changes in the structure near a ligand-binding site can be critical for structure-based drug discovery, examples are presented to show how refinement with an all-atom model can result in large coordinate improvements at such sites.