|Home | About | Journals | Submit | Contact Us | Français|
The field of computational chemistry, particularly as applied to drug design, has become increasingly important in terms of the practical application of predictive modeling to pharmaceutical research and development. Tools for exploiting protein structures or sets of ligands known to bind particular targets can be used for binding-mode prediction, virtual screening, and prediction of activity. A serious weakness within the field is a lack of standards with respect to quantitative evaluation of methods, data set preparation, and data set sharing. Our goal should be to report new methods or comparative evaluations of methods in a manner that supports decision making for practical applications. Here we propose a modest beginning, with recommendations for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage.
The field of computational chemistry, particularly as applied to drug design, has become increasingly important in terms of the practical application of predictive modeling to pharmaceutical research and development. Tools for exploiting protein structures or sets of ligands known to bind particular targets can be used for binding-mode prediction, virtual screening, and quantitative prediction of activity. A serious weakness within the field is a lack of standards with respect to statistical evaluation of methods, data set preparation, and data set sharing. Our goal should be to report new methods or comparative evaluations of methods in a manner that supports decision making for practical applications. In this editorial, we propose a modest beginning, with recommendations for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage.
There are two fundamental premises in making such a proposal. First, we must believe that the goal of reporting new methods or evaluations of existing methods is to communicate the likely real-world performance of the methods in practical application to the problems they are intended to solve. Ideally, the specific relationship between methodological advances and performance benefits will be clear in such reports. Second, we must understand that the utility of the methods of broad utility in pharmaceutical research application are predicting things that are not known at the time that the methods are applied. While this seems elementary, a substantial proportion of recent reports within the field run afoul of this observation in both subtle and unsubtle ways. Rejection of the first premise can reduce scientific reports to advertisements. Rejection (or just misunderstanding) the second premise can distort any conclusions as to practical utility.
This special issue of the Journal of Computer-Aided Molecular Design includes eleven papers, each of which makes a detailed study of at least one aspect of methodological evaluation [1–11]. The papers collected within this issue make the detailed case for the recommendations that follow; the recommendations are intended to provide guidance to editorial boards and reviewers of work submitted for publication in our field. In surveying the eleven papers, we feel there are three main areas of concern: data sharing, preparation of datasets, and reporting of results. Concerns within each area relate to three main subfields of molecule modeling, i.e. virtual screening, pose prediction, and affinity estimation, and to whether protein structural information is used or not. We describe the issues in each area and then present recommendations drawn from the papers herein.
Reports of new methods or evaluations of existing methods must include a commitment by the authors to make data publicly available except in cases where proprietary considerations prevent sharing. While the details are different across the spectrum of methods, the principle is the same: that sharing data promotes advancement of the field by ensuring study reproducibility and enhancing investigators’ ability to directly compare methods. However, the details of this matter a great deal, both for docking methods and for ligand-based methods. Docking will be used to briefly illustrate the problem. Many reports make claims of sharing data by, for example, providing a list of PDB codes for a set of protein–ligand complexes used in evaluating docking accuracy. In a very narrow sense, this might accommodate a notion of sharing. However, this is inadequate for four reasons:
As stated earlier, the ultimate goal is predictions of things that we do not alreadyknow. For retrospective studies to be of value, the central issue is the relationship between the information available to a method (the input) to the information to be predicted (the output). If knowledge of the input creeps into the output either actively or passively, nominal test results may overestimate performance. Also, if the relationship between input and output in a test data set does not accurately reflect, in character or difficulty, the operational application of the method to be tested, the nominal reported performance might be unrelated to real world performance. Here, we will briefly frame the issue by discussing the differences between the operational use of methods and the construction of tests to measure and document their effectiveness for both protein structure, e.g. docking, and ligand-based methods in their areas of application.
The descriptions of test case construction above involve different degrees of challenge in proportion to the amount of information provided to a method. The problems often encountered in reviewing or reading papers is that methods claim a lower level of information concerning the answers than is actually true. This is seldom intentional, no matter the provocation to believe otherwise, but a reflection of the difficulty in preparing a ‘clean’ test.
Even within the constraints outlined above, data set preparation and parameter selection can yield a wide range of results. This is acceptable to illuminate which choices are of most benefit to users of the different methods. However, without strong requirements for data sharing (the subject of the previous section), this benefit will be diluted. Further, without baseline requirements for statistical reporting (the subject of the next section), this diversity will lead to an unacceptable degree of incomparability between different reports.
The issues surrounding what to report are substantially in dispute, and this has led to an alarming inability to compare multiple studies, except in the case where all primary data are available and where one is willing to make an independent analysis. Here there seem to be two schools of thought. The first is that molecular modeling is a special enterprise, distinct and different from other efforts at prediction. As such it is seen as a part of the process to select or invent measures that illustrate a particular point. The second school holds that molecular modeling is in fact similar to many other areas of science and commerce and that by ignoring standard practices in other, more established, fields, we do a disservice to modeling.
Molecular modeling is a relatively young field. As such, its growing pains include the slow development of standards. Our hope for this special issue of JCAMD is that with the help of the arguments made in the contributed papers, the modest recommendations made here will form the kernel of standards that will help us as a community to both improve the methods we develop and to reduce the disparity between reported performance and operational performance.
The authors gratefully acknowledge the valuable contributions of Marti Head and Terry Stouch, who participated as discussion leaders during symposium on the “Evaluation of Computational Methods” at the 234th American Chemical Society meeting that led to the development of this special issue. Dr. Jain also acknowledges NIH for partial funding of the work (grant GM070481).
Ajay N. Jain, Email: ajain/at/jainlab.org.
Anthony Nicholls, Email: anthony/at/eyesopen.com.