Computational prediction of genotoxicity based on molecular structure, or structure–activity relationships (SAR), has met with a measure of success due in part to the availability of large public datasets for modeling. In particular, data from Ames Salmonella assays have been used to build some of the more successful and widely used SAR predictive models [see e.g., Kazius et al., 2005
; Yang et al., 2008
]. Genotoxicity predictions are more facile than predicting, say, hepatotoxicity or nephrotoxicity; this being due to the multitarget, multimechanistic nature of organ toxicities. True genotoxicity, in which the chemical interacts directly with cellular DNA, should be predictable based solely on chemical reactivity and the physicochemical properties of a molecule, particularly in the case of DNA alkylators or electrophilic compounds. This accounts for the partial success of programs such as Derek for Windows and MC4PC, among others, which rely primarily on recognition of what are termed “biophores” or “structural alerts.” However, all SAR programs generate “false positives” (nonspecificity) and “false negatives” (insensitivity) for a complex set of reasons. These center around factors such as the chemical “space” around the alerting structural feature, which could hinder that reactive center, the need for metabolism to activate a chemical to its reactive form, and whether or to what extent a chemical can enter a cell due to size, lipophilicity considerations, and so on. Quantitative SAR (QSAR) approaches in which various physicochemical parameters of a series of compounds are used to identify relationships to genotoxicity and genotoxic potency have also had some success, particularly when applied to prediction within a series of structurally similar congeners.
It could be argued that the great majority of directly DNA-reactive chemical moieties have already been identified and, despite improvement of the learning sets on which programs such as Derek for Windows and MC4PC are based, that only incremental increases in sensitivity and specificity might be possible. However, this thinking does not take into consideration those factors identified more recently such as noncovalent DNA interaction and interference with critical DNA metabolizing proteins such as topoisomerase and DNA polymerases [Snyder and Strekowski, 1999
; Snyder, 2000
; Snyder and Arnone, 2002
; Snyder et al., 2004
; and references therein], all of which can clearly contribute to genotoxicity but for which, as yet, insufficient peer-reviewed data have been generated for extensive modeling in SAR or QSAR learning sets. The strengths and limitations of both the SAR and QSAR approach will be discussed below, along with a perspective on how these technologies are being applied in practice within a regulatory setting and in industry.
In Silico Approaches in a Regulatory Context and at the U.S. FDA
QSAR-based computational toxicology (comptox) approaches are now used extensively around the world by regulatory agencies as an adjunct to safety assessment. The heaviest user today is probably the European Union, with de-emphasis on animal testing and recently phased-in laws for regulating new and existing substances such as for the Registration, Evaluation, and Authorization of Chemicals (REACH), the Seventh Amendment (of the Cosmetics Directive), and the Screening Information Data Set (SIDS). At the same time, Health Canada has used SAR and weight-of-evidence approaches extensively to prioritize their Domestic Substances List, as directed by the Canadian Environmental Protection Act. In the United States, the Environmental Protection Agency (EPA) Office of Pollution Prevention and Toxics has been using SAR, weight-of-evidence, and analog approaches for over two decades under the Toxic Substances Control Act to evaluate the environmental safety of chemicals, including High Production Volume (HPV) chemicals. Finally, in the Pacific region, Japan is considering the use of computational toxicology under the Law Concerning the Evaluation of Chemical Substances and Regulation for their Manufacture, whereas Australia has implemented the National Industrial Chemicals Notification and Assessment Scheme.
At the U.S. FDA, a dedicated team at the Center for Food Safety and Applied Nutrition (CFSAN) uses comptox approaches in an ongoing program to evaluate food contact substances for which minimal experimental data may be available as part of the food contact notification program. Currently, the Center for Veterinary Medicine is considering how comptox approaches may benefit them. The Center for Drug Evaluation and Research (CDER) uses comptox models to help evaluate toxicities of drugs and drug-related substances, such as precursors, degradants, and contaminants. CDER has a draft Guidance for establishing the safety of genotoxic contaminants that includes the use of comptox (http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm079235.pdf
CDER contains an applied regulatory research group that has for over 10 years created toxicological and clinical databases, developed rules for quantifying toxicological and clinical endpoints, evaluated data mining and QSAR software, and developed toxicological and clinical effect prediction programs through collaborations with software companies [Matthews et al., 2000
; Benz, 2007
; Kruhlak et al., 2007
]. In the last 2 years, The CDER comptox group has evolved from a group primarily doing basic research to providing an applied computational toxicology consulting service that supplies comptox evaluations for drugs, metabolites, contaminants, excipients, degradants, and so forth to FDA/CDER safety reviewers in preclinical, clinical, post-market, and compliance groups.
The overall goal of the CDER computational toxicology program is to develop capabilities to predict accurately with in silico software all toxicological and clinical effect endpoints of interest to the U.S. FDA/CDER. The ultimate goal is to substantially reduce the need for animal toxicological testing and human clinical trials in establishing the safety of FDA/CDER-regulated chemical substances. Overall, the group advocates use of computer predictions of toxicological and clinical effects to inform and provide valuable decision support of regulatory actions. Computational toxicology information can serve an important role when a regulatory decision must be made in the absence of all the safety information desired for chemicals being considered, or, when the result of a safety study is equivocal, predictions can be provided on related endpoints or chemicals.
Change at U.S. FDA/CDER
The U.S. FDA published a “white paper” on March 16, 2004, describing the Agency’s Critical Path Initiative (www.fda.gov/ScienceResearch/SpecialTopics/CriticalPathInitiative/CriticalPathOpportunitiesReports/ucm077262.htm
). In this article, FDA stated that “not enough applied scientific work has been done to create new tools to get fundamentally better answers about how the safety and effectiveness of new products can be demonstrated, in faster time frames, with more certainty, and at lower costs.” In this document, FDA further stated that “a new product development toolkit—containing powerful new scientific and technical methods such as animal or computer-based predictive models, biomarkers for safety and effectiveness, and new clinical evaluation techniques—is urgently needed to improve predictability and efficiency along the critical path from laboratory concept to commercial product.”
Today, as part of the U.S. FDA Critical Path Initiative, change has been officially institutionalized. The Office of Critical Path Programs now exists within the FDA Office of the Commissioner, and the Office of Translational Science has become a component of FDA/CDER. In addition, The Critical Path Institute, a private organization, is working toward the same ends. Within CDER there are active working groups such as the Computational Science Center Committee, the Research and Development Computing Advisory Board, and the Data Mining Council. New resources have been provided to power change, including newly opened, state-of-the art production and research computer data centers, and a multiparallel High Performance Grid Computing Center, FDA’s new “super-computer.”
FDA/CDER are working to bring about an orderly transition to new testing paradigms to establish drug safety. This is being accomplished through: (1) education, with symposia, seminars, publications, and training (all of which having been instituted for the CDER computational toxicology service has resulted in well over a 10-fold increase in requests for consultations over the last year); (2) evaluation and buy-in by CDER safety personnel; (3) internal marketing as a major component of the process; and, finally, (4) written directives, including internal MaPPs (manual of policies and procedures), and public Guidance that will be established with public input.
For comptox in particular, evaluation and buy-in is primarily being accomplished through the establishment of a Computational Toxicology work group. This group is formally a subcommittee of the U.S. FDA/CDER Office of New Drug’s Pharmacology/Toxicology Coordinating Committee (PTCC). The official mission of the PTCC/CTSC is to disseminate appropriate guidance to CDER review staff and the pharmaceutical industry in the assessment of computational toxicology studies, and to serve as a resource to the PTCC and to the Center on scientific and regulatory aspects of computational toxicology issues.
Predicting Rodent Carcinogenicity and Genetic Toxicology with QSAR Computational Toxicology
QSAR computational toxicology is useful to predict the results of in vitro and animal preclinical tests when no actual test data are available, laboratory data are equivocal, or when additional decision support is needed. This methodology is less expensive and faster than traditional means of safety evaluation, can provide decision support information, suggestions of mechanism/mode of action, and, in general, goes well beyond the traditional use of Ashby–Tennant structural alerts [Ashby, 1985
]. “Seat of the pants” QSAR, visually looking for Ashby–Tennant structural alerts, is a 25-year old procedure that has served us well but is now obsolete due to a number of factors: (1) Ashby and Tennant did not have many pharmaceutical data to consider—pharmaceutical-specific structural alerts exist; (2) a simple examination of a molecule for Ashby–Tennant alerts does not take into consideration the effect of other chemical groups in the same molecule that modify the activity of the alert (modulators); (3) a computer can consistently and exhaustively look for all alerts and modulators; this is difficult for humans to do by simple visual inspection with a molecule as complex as a typical pharmaceutical; (4) there are other highly valuable ways to perform QSAR analyses that do not involve examining the atom connectivity per se; (5) there are characteristic structural alerts for all toxicological and adverse human clinical endpoints, not just Salmonella mutagenesis.
U.S. FDA/CDER has created many QSAR models to predict the ability of organic chemicals to cause cancer in rodents [Matthews and Contrera, 1998
; Contrera et al., 2003
; Matthews et al., 2008
], as well as genetic toxicity at several endpoints [Contrera et al., 2005b
; Matthews et al., 2006a
; Contrera et al., 2008
]. Carcinogenicity is predicted on the basis of a training set consisting of bioassay data from over 1600 chemicals, on which over 25,000 individual records have been obtained. These data have been harvested from FDA archives, CDER Cancer Assessment Committee reports, NTP technical reports, IARC monographs, the L. Gold Carcinogenic Potency Data Base, and from the published literature. Specific rodent carcinogenicity endpoints modeled at CDER are for male mice, female mice, male rats, female rats, rats (both genders pooled), and mice (both genders pooled).
Genetic toxicology prediction models used by CDER are based on laboratory data on 5,880 chemicals, with 27,498 individual records [15,691 mutation (58.0%), 8,783 clastogenicity (31.9%), 2138 cell transformation (7.8%), and 886 DNA effects (3.2%)], with data obtained from the EPA GENETOX database, FDA archives, NTP technical reports, the published literature, and data sets collected by MultiCASE. There are currently nine QSAR models that are used in combination to predict rodent carcinogenicity, each based on a different genetic toxicity test (Salmonella typhimurium strains, Escherichia coli strains, fungal mutagenicity, Drosophila genetic toxicity, rodent mutation in vivo, Hprt in CHO and CHL, rodent micronucleus in vivo, rodent chromosome aberrations in vivo, and rat and human unscheduled DNA synthesis). Genetic toxicology models used to predict the current International Committee on Harmonization (ICH) S2b battery are for bacterial mutagenicity, in vitro chromosome aberrations, mouse lymphoma Tk+/− mutation, and in vivo micronucleus.
Using the Results of More than One Computational Toxicology Program
A current focus of research at CDER is developing a paradigm for using the results of more than one computational toxicology program for any one endpoint [Contrera et al., 2007
; Matthews et al., 2008
]. CDER currently uses five in silico programs for estimating the toxic potential of diverse chemicals. Four make their predictions based on statistical correlations between chemical attributes and toxicity [MC4PC (www.multicase.com
), SciQSAR (www.scimatics.com
), BioEpisteme (www.prousresearch.com
), and Model Applier (www.leadscope.com
)]. The fifth predicts toxicity and adverse effects based on human expert rules [Derek for Windows (www.lhasalimited.org
)]. CDER believes that using together different prediction paradigms and analyzing the results with a meta-analysis will give the best overall prediction. None of the currently commercially available computational toxicology programs have all necessary functionalities; none of the programs have 100% coverage, sensitivity, and specificity; and all of the programs have some unique features (i.e., they are not completely over-lapping) in their strategies. Hence, the various models combined can achieve greater prediction accuracies within consensus prediction strategies.
A simple example of a decision support strategy is to utilize two or more comptox programs to independently make predictions and combine the predictions in different ways, depending on the needs of the regulatory application, to tune the outcome towards higher specificity or sensitivity. To attain high sensitivity, call the overall result positive if any one software program gives a positive prediction. To attain high specificity, call the overall result positive only if all software programs give a positive prediction. In either case, the types of chemicals that can be predicted (coverage) are improved because different programs cover different parts of the chemical universe.
The Promise of Computational Toxicology
The near-term effects of the use of computational toxicology on regulatory review are that fewer problematical chemicals are submitted to regulatory agencies, decision support information is available, and concerns can be prioritized. However, there will be a continuing, though more delimited need for “classical” testing and review because computational toxicology is of limited use with truly novel molecular entities (lack of coverage). The software can not make a prediction about something never seen before.
Using computational toxicology effectively can lead to greater efficiencies in the review process. When model predictions have a high degree of confidence, interpretability, and transparency, laboratory testing (and lengthy reviews) may not be needed. Similarly, if a submission reporting equivocal test results can be augmented by high confidence model predictions, then additional testing potentially could be avoided. Comptox also can be used to rapidly determine if postmarket toxicity is likely, thus, triggering additional targeted studies, although further development of this capability is needed. Ultimately, QSAR computational toxicology methods, as well as other newly developed techniques, will be applied as a means of reduction, replacement, and refinement for longer, more expensive testing.
In Silico Approaches in the Pharmaceutical Industry: Current Practice and Future Directions
In their 2007 report entitled Toxicology Testing in the 21st Century, the National Academy of Sciences stressed the importance of in silico toxicity prediction in the future state of safety assessment of drugs and chemicals [NRC, 2007
]. Some of the major drivers for adoption of in silico approaches include the ethical issues surrounding widespread use of mammals, especially nonhuman primates for toxicity testing, the high cost and long timelines associated with traditional toxicology testing paradigms, and the pressure to increase productivity in the pharmaceutical industry without compromising patient safety. For these and other reasons, in silico approaches have become a mainstay of modern drug discovery and development. In this respect, no other toxicological discipline has been more profoundly influenced by the growth and development of in silico approaches as genetic toxicology. In many cases, drug-induced DNA damage is governed by well understood principles of chemical reactivity and physicochemical properties, and this has greatly facilitated development of structure-based in silico prediction models. Such models have been used in the pharmaceutical industry for many years, but the approaches to and applications of this technology have evolved alongside advances in modeling technique, the state of the knowledge base and the regulatory environment. This section reviews current approaches to in silico genotoxicity modeling in the pharmaceutical industry and looks forward to the future evolution of this technology. Three main issues are addressed: (1) the current state of in silico genotoxicity prediction in the pharmaceutical industry; (2) the ability of current approaches to meet the needs of industry in the 21st century; and (3) opportunities for improvement of in silico genotoxicity prediction.
Where Are We Now?
To assess the current state of in silico genotoxicity approaches, a web-based survey of genetic and computational toxicologists in the pharmaceutical industry was conducted. The survey was composed of 10 questions dealing with how and when in silico approaches are used in decision making, the types of models used, and perceived shortcomings and challenges for improvement of in silico genotoxicity models. In all, 15 companies participated in the survey, representing both large pharmaceutical (80%) and smaller biotechnology companies (20%).
Almost all of the companies surveyed use in silico modeling of genetic toxicity endpoints as a part of their decision-making process. The most common application is to inform and guide testing strategies for novel compounds and to prioritize compound testing based on level of concern. Approximately half of the responding companies indicated that in silico modeling is also used as a stand-alone approach for decision making. A good example of this approach is the use of negative in silico predictions to qualify process impurities in clinical and production batches of drugs without the need for biological testing [European Medicines Agency, 2006
; U.S. Department of Health and Human Services, FDA, CDER, 2008
Most companies make significant use of in silico genotoxicity prediction in both the discovery and development phases of the pharmaceutical pipeline. In silico modeling capabilities are most frequently deployed in toxicology and safety assessment groups where the focus is largely on later stage compounds. However, approximately half the responding companies also leverage in silico modeling capabilities within medicinal chemistry to support early discovery efforts. Primary applications of in silico techniques in early drug discovery include virtual screening of planned compounds and prioritization of compounds for biological testing, conserving synthetic and testing resources for those compounds with the best chance of advancing into development. In the development phase, in silico approaches can be used prospectively to help direct synthetic routes, and retrospectively to characterize impurities and unique/disproportionate human metabolites, contributing to prioritization of testing and development of testing strategies for these entities.
Almost all companies use at least one commercially available in silico model, and approximately half of the respondents also use some types of proprietary in-house models to support genotoxicity screening. Among commercially available models, Derek for Windows (Lhasa) and MC4PC (MultiCASE) are the most frequently used (93 and 60% of respondents, respectively). Derek for Windows is a rule-based expert system that uses a library or knowledgebase of structural alerts derived from the open literature and from user-contributed data to predict activity of unknown compounds. Derek for Windows also allows users to implement custom alerts based on in-house data. MC4PC evaluates structure–activity data in a training set of molecules and identifies structural features that correlate with higher or lower mutagenic activity (biophores and biophobes, respectively). It then uses this information to derive ad hoc local QSAR models to predict activity of unknowns. The most common approach to in-house model development is customization of existing commercial models (e.g., development of proprietary Derek for Windows alerts based on proprietary data). Some companies also develop models from scratch using modeling packages such as MDL-QSAR (MDL Information Systems) or the LeadScope Predictive Data Modeler (Leadscope). In-house model development encompasses global models to broadly predict chemical space, and local models focused on specific chemical classes. About 40% of the responding companies rely on a single model for prediction of genotoxicity. Other companies use multiple prediction models, using either a weight of evidence approach or a more conservative single hit approach to resolve conflicting model predictions.
Is This Where We Need to Be?
Given the ever increasing role in silico models play in decision support, it is appropriate to periodically challenge the adequacy and robustness of this approach. Not surprisingly, model accuracy stands out as the single biggest perceived challenge for in silico genotoxicity prediction, with ~40% of the survey respondents identifying high false positive and/or false negative rates as the two highest priorities for improvement, respectively. Model transparency and interpretability were also viewed as areas for improvement. Transparency is particularly important from a medicinal chemistry standpoint because understanding the structural basis for predictions is essential for hypothesis-based optimization of chemistry away from positive genotoxicity results. In addition, adequate model transparency is important for assessing the reliability of model predictions. Poor extrapolation power was also viewed as a significant shortcoming of current prediction models; only one respondent felt that the ability to extrapolate beyond known chemical space was currently acceptable.
Each in silico genotoxicity prediction model has strengths and weaknesses that are largely determined both by the nature of the training data set, and the prediction challenge to which the model is applied. Clearly, accuracy, interpretability, and extrapolation are considered major weaknesses of current prediction models in large part because of the inability of available models to adequately address specific decision support needs within industry. Hence, it is appropriate to ask, what are the underlying challenges to improvement in these areas? The two most commonly cited roadblocks to model improvement are adequacy of existing training sets and insufficient mechanistic understanding. Available public databases are based largely on environmental and commodity chemicals, and drug-like compounds are poorly represented. Despite the relatively high degree of mechanistic understanding in the area of genotoxicity, many of the subtle structural features that modulate mutagenic activity are poorly understood, and the chemical rationale for differentiating active and inactive members of a given chemical series is often obscure. In addition to these two major challenges to model improvement, a shortage of qualified people to develop novel models as well as people to run and interpret the models were also recognized as potential barriers.
Where Do We Go from Here?
The results of the industry survey paint a picture of the current state of in silico genotoxicity prediction characterized by high importance and value accompanied by suboptimal performance and significant barriers to improvement. These same issues have plagued computational toxicology for many years, yet little progress has been made in overcoming these barriers, and substantive improvements in model performance have not been forthcoming. So, the question is how can we get to the next level in genotoxicity prediction from where we stand today?
Expanding and diversifying model training sets to cover larger areas of chemical space is certainly needed, but achieving this in a publicly accessible manner has been problematic. Developing and testing novel chemistry are expensive and time consuming, and pharmaceutical companies are understandably reluctant to give away the competitive advantage derived from such proprietary knowledge. One possibility for data sharing that has received some attention is precompetitive collaborative testing agreements focused on commonly used building blocks. This would provide a means for companies to share both the testing burden and the acquired knowledge without giving away proprietary information. Another possibility is the sharing of derived knowledge without sharing actual compound structures or data. As discussed previously, some member companies of the Lhasa consortium contribute knowledge in this form to improve existing Derek for Windows alerts or to create new ones. Within individual companies, opportunities certainly exist to expand training sets for proprietary use. One of the interesting results of the survey was that over 50% of the responding companies do not engage in any form of proprietary model development. This suggests that many companies are sitting on potentially large sets of structure–activity data that could be used to improve their internal predictive capabilities.
Toxicophore-based approaches to in silico prediction are generally only applicable within the structural domain of the training sets on which they are based. Therefore, even with expanded and diversified training sets, the ability of these models to predict genotoxicity of novel chemotypes will be somewhat limited.
Ab Initio Molecular Models of Genotoxicity
Factors such as steric hindrance, electronic effects, stereochemistry, and planarity (to name a few) may, to some degree, get incorporated into conventional SAR prediction models based on sufficiently large datasets; however, these models will still be bounded by the chemical space of the training sets. One approach to circumvent this barrier is the development of models based on first principles of chemical behavior. Ab initio computations provide access to numerous parameters reflecting potential for chemical reactivity and, hence, DNA damage, that may be difficult to build into toxicophore-based models. These include factors such as localization of electron density, orbital energy levels, optimized geometries, and thermodynamic stability that can be used to develop generalized models of DNA reactivity. Two examples of the use of quantum mechanical parameters in genotoxicity modeling are given as follows.
Aromatic amines are key structural elements in medicinal chemistry and constitute a pharmaceutically relevant class of potential genotoxicants and carcinogens. The mutagenic activity of primary aromatic amines has been attributed to transient formation of reactive nitrenium ions resulting from decomposition of a hydroxylamine metabolite. Ford and Griffin 
built a predictive model for aromatic amine mutagenicity based on semiempirical calculation of the thermodynamic stability of the putative nitrenium intermediate from a small series of aromatic amines encountered in cooked meat. We have recently extended this model to higher levels of theory (Hartree–Fock and density functional theory) and used it to evaluate a test set of 257 primary aromatic amines [Bentzien et al., 2010
]. The nitrenium stability model exhibits good sensitivity and specificity and is not dependent on a predefined training set. The model provides a continuous spectrum of activity values that allows medicinal chemists to predict subtle trends in SAR during compound optimization.
Computational prediction of genotoxicity related to noncovalent binding modes such as intercalation has been a challenging issue, because this class of compounds is generally devoid of alerting structural moieties. Even more problematic than traditional planar fused ring compounds are the so-called atypical intercalators, which are characterized by two to three unfused rings and the presence of one or more cationic centers—typically dialkylamines [Snyder, 2007
]. To address this problem, Snyder et al. 
adapted a computational DNA-docking model that was originally developed to facilitate discovery and optimization of antiestrogenic compounds [Hendry et al., 1994
]. In this model, energy-minimized structures of the test compounds were generated and computationally docked into unwound but structurally intact DNA. Using this approach, a high degree of concordance between docking energies and in vitro DNA intercalation potency was demonstrated, and it was shown that structural features differentiating genotoxic versus nongenotoxic intercalators could be rationalized based on the binding energies.
The examples described above indicate the potential gains in predictive power to be gained from the application of advanced computational modeling based on first principles. Clearly, the initial development of such models requires significant knowledge of, and capabilities in computational chemistry, and close collaboration between genetic toxicology and computational chemistry groups is essential for success. However, once developed, complex models such as these can often be made accessible to nonspecialist users via a user-friendly interface such as Pipeline Pilot (Accelrys). Although such models require more effort to develop and implement, the advanced predictive capabilities they offer should lead to the design of molecules with better genotoxicity profiles, resulting in fewer failures and overall lower resource costs in later stages of drug development. These types of advanced modeling approaches may well be a critical contributor to the future state of in silico genotoxicity prediction.