We have shown that the combination of a simple molecular descriptor combined with effective statistical methods was able to classify molecules for mutagenicity quickly and accurately. Performance on an external test set was competitive with the LAZAR method, a recent mutagenicity method described in the literature. In addition to accuracy it was shown to be possible to interpret the highly ranked features of the model in terms of simple molecular features that agreed with known toxicology.
While performance was good, it was clearly not optimal. The best accuracy value was only 0.770. One possible reason for this is differences in the way the Ames method is executed in different laboratories. Both the Bursi and CPDB data sets contain data from multiple laboratories, and interlaboratory Ames test error has been estimated to be about 15%
18 corresponding to a maximal possible accuracy of 0.850.
Therefore it is likely that the best practical performance on these data sets is well below the theoretical maximum. In addition better performance may have been obtained with a more complex descriptor. We have shown that the descriptor used was able to describe many of the important molecular features involved in mutagenicity such as nitro and nitroso groups. However some other important mutagenic features, such as three-membered epoxide and aziridine rings, would be missed by the two vertex descriptor used. To address this issue we experimented with using larger substructures than atom pairs. Our preliminary work in this area did not find an increase in accuracy. For example we tried using the set of connected “atom triples” in effect adding an edge to the atom pair substructure. This resulted in the number of distinct features derived from the training data set going from 9634 to 1,836,781. Performance, as measured by 2-fold cross-validation, did not improve. In spite of this result, it seems quite possible that a method to selectively pick out discriminative substructures in an efficient way would lead to improved results.
The results on approved drugs showed that the method predicted mutagenicity in several compounds not present in the training set which were also suspected carcinogens. We were surprised that such a large percentage of approved drugs (21–22%) was predicted to be mutagens. Our expectation had been that a relatively small number of known drugs would be classified as mutagens. Several of the predictions were confirmed to be either mutagens or possible carcinogens as described above. Also, for the 110 drugs (of 962 total) for which an Ames test result was available, 30 (27%) had positive Ames test results. This shows that at least 3% of the drug data set are mutagens and suggests that a significant number of approved drugs may give a positive Ames test result.
indicates that the Rulefit score may be useful for the purpose of filtering likely mutagens from a screening library such as ZINC. This figure shows there is a clear difference between the scores of the mutagen and nonmutagen drugs. If a Rulefit score of zero was used as a threshold for filtering, then the figure suggests that 90% of nonmutagens would be retained while 70% of mutagens would be discarded.
In using this method for virtual screening for mutagenicity it would probably be most practical to treat the highest scoring predictions as the most reliable. Consideration of the specific molecular features resulting in a high model score may help to corroborate the prediction against known chemistry. It may also be possible to exploit such knowledge to make modifications to candidate molecules in order to optimize their properties. A very simple example of this is suggested by in which the addition of a trifluoro group to mutagenic benzene results in a nonmutagen.