The object of this study was 2-fold: to arrive at reliable and challenging sets of solid-state structures to validate OMEGA and to examine its default parameters for their effectiveness on these well chosen solid-state structures (not to exhaustively evaluate a large number of combinations of parameters in OMEGA for their efficacy). The OMEGA algorithm presented here combines knowledge-based and first principles approaches to conformer generation, so it can be described as systematic and rule based. The knowledge-based part is the torsion library, while the fragment library, ensemble buildup, and sampling are all performed on a first principles basis.
We have developed a gold standard set of PDB ligand structures by paying particular attention to identifying structures that are good models for their electron density, an approach rarely taken in the literature. Many of the properties of a model that are used here are ignored or misinterpreted in other publications in this area. For example, based on the published literature, it is commonly believed that selecting a cocrystal structure with a resolution below some cutoff value (for example 2 Å) ensures a good quality ligand structure. That this is clearly mistaken can easily be seen by inspecting the ligand models for two structures, both of 2 Å resolution, 1NHU and 1IY7 (see Figure ).
Electron density for two ligands both solved at 2 Å resolution; 1NHU on the left, 1IY7 on the right.
The 1IY7 ligand model is clearly a good model of complete density, while the 1NHU model is an interpretation of partial density that is obviously poor (the deposited ligand coordinates show severe atomic clashes). In two other cases at 2 Å resolution or better, 1ATL and 1ETA, there is no significant density for the ligand at all (even when viewed at 1σ) making the deposited coordinates at best highly speculative educated guesses. Therefore, resolution of the parent structure alone is no guide to the quality of a ligand’s conformation but should only be used as one criterion among many. Cases like 1NHU, 1ATL, and 1ETA also caution against using B-factors as a representation of thermal mobility in a structure. If there is no density for a set of atoms, what physical meaning is there in the B-factors for those atoms?
Models built from data collected to at least 2.7 Å resolution have a parameter to data point ratio of at least 1, allowing the model to be well constructed. The confidence in the fit of a model at the global level is increased if the difference in R
is low (<0.05 herein), as a large difference in R
is indicative of an overfit, though not necessarily poor, model. A low level of difference also means that the local measures of fit, RSCC and RSR, are meaningful, which is not the case for overfit models (good values of RSCC and RSR can be obtained for poor fits when R
is large). By ensuring that the structures in the set all have low experimental error in their atomic coordinates (DPI), we can use atom-based metrics like rmsd, take appropriate account of this experimental error, and still generate meaningful measures of performance. The RSCC and RSR metrics often delineate ligands that show good fits to their local density from those that do not, and we thereby avoid many of the problems with structures from other publications. Cases like 1NHU and 1ATL are easily identified by these fit criteria as unsuitable for inclusion in conformation generator validation sets (or any other kind of validation set), as the coordinates are not supported by the experimental data. For example in 1NHU, the RSCCs for the two versions of the ligand in the unit cell are 0.768 and 0.744, and the RSRs are 0.27 and 0.25; while in 1ATL, the ligand RSCCs are 0.709 and 0.722, and the RSRs are 0.35 and 0.35, greatly exceeding the cutoffs used in this work. Unfortunately there are a number of poorly fit molecules like 1NHU in previously published data sets,(33
) and their presence in these sets only weakens the conclusions that can be drawn therefrom. Any ligand conformation from the PBD that shows intramolecular atomic clashes is an error on the part of the crystallographer, and such structures are easily avoided by the use of the RSCC, RSR, and OWAB criteria used here. Further, by paying attention to the physicochemical properties and graph diversity of the ligands, we ensure a reasonable level of independence among the molecules in our data set. That this is a factor often ignored is easily seen by the example of the Kirchmair set,(17
) which contains no less than 50 duplicate molecules. Such a level of duplication is likely to bias the results obtained from that data set.
While individually all the criteria we deploy seem reasonable and even relatively benign, when combined, they present a significant hurdle for a structure to surmount. Even though we began to assemble our validation set using three large data sets, two of which had already been selected for validation of conformer generators, we found it impossible to assemble a substantial set of well-solved structures. By applying relatively loose criteria for the quality of the crystallographic models at a global level, we removed around 75% of the starting structures. Overall, more than 90% of all the input structures failed our filtering criteria, a surprisingly high level. Figure illustrates the attrition rates for each of the three databases used and shows that they are quite similar, which was unexpected. Given that the Kirchmair and Sadowski sets were selected with the explicit goal of testing conformer generators, we expected lower attrition rates for these two sets than for PDBBind, which is simply a collation of cocrystal structures for which there exists a published binding affinity. However, the percentages of surviving structures are quite similar: 5.3% for the Kirchmair set, 6.4% for PDBbind, and 8.1% for the Sadowski set. These very low levels of survival emphasize that the number of structures suitable for this sort of study in the PDB as a whole is very small, as has been seen in a study using PDB structures for docking validation.(37
) However, given that recently the PDB has made deposition of structure factors along with coordinates a requirement, we hope that that this situation will improve quickly in the future.
The most commonly used metrics in conformer generation studies are based on comparing each conformer in the set to the experimental conformation, using some atom-based geometric measure such as rmsd. Metrics like rmsd are used almost without exception in conformer reproduction studies, probably because they are relatively easy to understand and require no specialized applications to calculate. There are, however, a number of objections to the use of rmsd as a metric of quality; it has no upper bound, it scales with molecular size (so that an rmsd of 2 Å for a molecule of 6 heavy atoms is much different than an rmsd of 2 Å for a molecule with 60 heavy atoms), and it can give an inaccurate picture of the overall quality of a prediction.(38
) The most serious of the problems with rmsd may be that it does not directly compare a prediction with an experimental value but rather compares a prediction or model (the conformer set) with another model (the atomic positions in the crystal structure). This problem has been eloquently discussed in a paper by Yusuf et al.(39
) in which the authors advocate comparison between the experimental data (electron density) with calculated density derived from a docking pose (or computed conformation) using the RSR metric (which is bounded by 0 and 1). These objections notwithstanding the large amount of published literature using rmsd militates against not using it as one metric of quality in a validation study, but it should not be the only one. A complementary approach, pace Yusuf et al., is to use a bounded metric that is not derived directly from the atom positions in the two conformers being compared. One recent approach in this vein has been the comparison of the overall shape of the experimental conformation to the shape of a docking pose using the shape Tanimoto metric employed by Warren et al.(40
) We have extended their shape-based comparison to include an additional term (the color Tanimoto) that compares the alignment of functional groups between the conformers. The score representing this combination of shape matching (shape Tanimoto) and functional group matching (color Tanimoto) is known as the Tanimoto combo (TC). Since it represents the match of both shape and functional groups in space, TC allows for a greater discrimination between poses than using shape alone. Use of a metric like TC avoids some major problems with rmsd: while rmsd has no defined range, the range of TC is, by definition, 0−2; since TC is bounded by 0 and 2 comparisons using TC is independent of molecular size; large rmsds can arise from differences in the conformations of only small parts of the molecule, while TC is not as sensitive to these divergences; TC provides an extra weight for matching chemical functionality (the color Tanimoto term represents the matching of the chemical features only) while rmsd weights the matching of all atoms equally; the Gaussian representations of molecular properties used in the calculation of TC are “soft” so that the significance of results are not as affected by experimental uncertainties in atomic positions (though it is more difficult to quantitatively correct for their effect on TC). While any use of a cutoff value for good reproduction is difficult, we find that if TC is below 1.0 the reproduction of the experimental pose is always bad and if TC is above 1.5 the reproduction is almost always satiusfactory.
Another major issue with the common use of atom-based measures like rmsd, RDE, etc. is that no account is taken of experimental coordinate error in the structures being reproduced. To our knowledge, this is the first work in which reported rmsds are corrected for the atomic coordinate precision of the structure being reproduced. We have chosen to allow for coordinate error or uncertainty by the use of, as a metric of quality, either the maximum of the rmsd and the uncertainty or the difference between them. The second of these can be considered as an estimate of the level of computational noise introduced by the conformer generation process atop the existing experimental noise. While careful selection of our data set resulted in the corrected and uncorrected results not being significantly different, the correction of rmsds by coordinate uncertainty as outlined herein allows the future use of interesting structures with poorer coordinate precision than used in this study. There are clearly more sophisticated approaches that can be taken to this problem of experimental noise in PDB ligand structures, among which is to calculate a set of conformers that all fit the electron density equally well within some limit and compare these with the set computed by OMEGA. This more realistically reflects the fact that a crystal structure is an average over time and space, and so, a small molecule is likely to be found in a number of slightly different conformations in a solid-state structure.
The PDB-derived data set used here, while of good quality, is relatively small (197 structures), and so, a possible concern is that the results generated are not robust indicators of future performance. We have addressed this issue by performing bootstrapping on our two metrics. We find that in both cases the 5% and 95% quantiles are close and that the standard deviations of the bootstrap means are small. We infer that our results are quite stable to changes in the composition of the data sets used and, therefore, can be considered reliable indicators of future performance on molecule sets of similar physicochemical properties. The confidence interval has another, related, application in comparison between performance. The usual practice in this area has been to compare an aggregate statistic such as mean or median results, from a number of different tools or parameter sets and to declare one superior, without any account of the errors in these terms. However, by the use of confidence intervals, we can quantitatively assign a probability that one tool or parameter set actually is superior; for example based on data in this paper, it is over 90% likely that OMEGA with default parameters is better at reproducing small molecule structures from the CSD than from the PDB.
The torsion library in OMEGA is based upon analysis of a number of crystal structures from the PDB, coupled with analysis of energy profiles for certain torsions in the MMFF94 force field. Therefore, the problem of overtraining the torsion library arises if many of the structures used to derive the torsion library entries are also in the test sets used in this study. This problem was addressed by the use of a naive torsion library containing no torsion specific information at all. Comparison of the results from this naive library with the default one showed that the main impact of the torsion library is not to improve OMEGA’s ability to closely reproduce experimental structures but rather to reduce the size of the conformer ensemble required for good reproduction and the run time required. This result reduces any possible concern about the effect of “over-training” the torsion library so that it contains matches for many known structures. As above, the use of bootstrapping is key in interpreting the results; it is over 90% certain that the use of the default torsion library produces no improvement in reproduction of the PDB structures compared to the use of a naive torsion library.
While the purpose of this study was not to extensively compare parameter sets and versions of OMEGA, a comparison of previous versions of OMEGA with the current version (on the PDB ligand set) is provided in the Supporting Information