|Home | About | Journals | Submit | Contact Us | Français|
A symposium at the 40th anniversary of the Environmental Mutagen Society, held from October 24–28, 2009 in St. Louis, MO, surveyed the current status and future directions of genetic toxicology. This article summarizes the presentations and provides a perspective on the future. An abbreviated history is presented, highlighting the current standard battery of genotoxicity assays and persistent challenges. Application of computational toxicology to safety testing within a regulatory setting is discussed as a means for reducing the need for animal testing and human clinical trials, and current approaches and applications of in silico genotoxicity screening approaches across the pharmaceutical industry were surveyed and are reported here. The expanded use of toxicogenomics to illuminate mechanisms and bridge genotoxicity and carcinogenicity, and new public efforts to use high-throughput screening technologies to address lack of toxicity evaluation for the backlog of thousands of industrial chemicals in the environment are detailed. The Tox21 project involves coordinated efforts of four U.S. Government regulatory/research entities to use new and innovative assays to characterize key steps in toxicity pathways, including genotoxic and nongenotoxic mechanisms for carcinogenesis. Progress to date, highlighting preliminary test results from the National Toxicology Program is summarized. Finally, an overview is presented of ToxCast™, a related research program of the U.S. Environmental Protection Agency, using a broad array of high throughput and high content technologies for toxicity profiling of environmental chemicals, and computational toxicology modeling. Progress and challenges, including the pressing need to incorporate metabolic activation capability, are summarized.
The field of Genetic Toxicology has experienced episodic periods of major advancement since the early days of Alexander Hollaender, former Director of the Biology Division at Oak Ridge National Laboratory, and the founder of the Environmental Mutagen Society. This growth has been spurred primarily through the establishment, evaluation, and refinement of various techniques to detect damage to genetic material and the repair of such damage in a variety of prokaryotic and eukaryotic cells and organisms [Waters and Auletta, 1981; Zeiger, 2004; Claxton et al., 2010]. Building on these scientific and technological advances, we have been moderately successful at piecing together the remarkable story of mutation, chromosome breakage, and the biological consequences of these events with respect to both hazard identification and cancer risk estimation. From a testing standpoint, the standard battery of Ames bacterial mutagenesis, in vitro cytogenetics, and in vivo genotoxicity assays has served us well despite some lingering and contentious doubts about particulars. Of all toxicities routinely examined in standard safety evaluations, genotoxicity is arguably the most amenable to reliable prediction and detection, yet poses persistent challenges to its interpretation in the context of human health risk assessment. Given recent advances in computational toxicology and increasing use of high throughput in vitro testing to more broadly probe putative toxicity pathways more cost effectively and efficiently than with standard in vitro or in vivo technologies, it might be appropriate to ask if the present genotoxicity testing paradigm best serves current needs, or if it is time for some fundamental rethinking? Are there, hidden amongst the throng of newer technologies, assays and in silico approaches, or some combination thereof that can create the foundation for more facile, efficient, and biologically relevant approaches? Is it feasible that in silico approaches can be substantially improved to exceed 90% sensitivity and specificity with respect to a defined endpoint (mutagenicity, rodent carcinogenicity, germ cell mutagenesis, etc.), and how would such approaches be used? What is to be concluded about newer genomics-based technologies; can they provide the missing information to distinguish true genotoxicity from results arising indirectly from cytotoxicity, and is there a different molecular expression profile for such direct versus indirect genotoxins [Waters et al., in press]? Can genetic toxicology testing in the chemical or pharmaceutical industry rise above the suspicion that it may have limited utility and relevance with regard to human cancer risk? We will briefly examine where we are and where we should or could be headed, to set the stage for a broader discussion of these topics in the following pages. The remaining sections were contributed by presenters in a symposium entitled “Genetic Toxicology in the 21st Century” at the EMS 40th annual meeting held from October 24–29, 2009 in St. Louis, Missouri.
First, consider the current genetic toxicity test battery of Ames bacterial mutagenesis, in vitro cytogenetics and an in vivo rodent genotoxicity assay. The Ames bacterial mutagenicity assay [McCann et al., 1975; McCann and Ames, 1976] has stood the test of time and has gained strong consensus as the assay of choice for prediction of mutagenicity and carcinogenicity. It has virtually no serious shortfalls when used in the testing of most chemical classes, it is robust and reproducible, and is considered the best stand-alone assay for genotoxicity hazard identification, used widely for assessing mutagenicity of environmental, pharmaceutical, and industrial test articles [Claxton et al., 2010]. Although the assay is estimated to be 60% predictive of rodent carcinogens composed of a random selection of chemical classes, it is highly predictive of some classes, for example, approaching 100% predictive of PAHs and aromatic amines, but only 20% predictive of chlorinated organics. However, it detects ~90% of known human carcinogens, most of which are trans-species rodent carcinogens [Kirkland et al., 2005, 2007]. This discrepancy in rodent versus human predictivity of the genotoxicity test battery became apparent as the chemical selection criteria for the standard 2-year National Cancer Institute/National Toxicology Program (NCI/NTP) Carcinogen Bioassay Program shifted from likely carcinogenicity based on expert judgment to high production volume and human exposure [Fung et al., 1995].
The most widely used assays for mammalian chromosome damage—chromosome aberrations in human peripheral blood lymphocytes or Chinese Hamster Ovary (CHO) cells and the Mouse Lymphoma Assay (MLA)—have, on the other hand, met some serious challenges from many fronts. Although the mechanism for mutation induction in Salmonella [DeMarini, 2000] or E. coli is exquisitely clear, this is not the case for cytogenetics assays. Positive responses can occur through any of a number of relevant processes, including direct DNA damage, delayed or inhibited repair, interference with repair processing enzymes such as topoisomerase, or via artifacts of cytotoxicity, apoptosis or severe cell culture conditions affecting pH, osmolality, and so forth. So, problematic are these effects that major and prolonged discussion presently questions the value of continuing with these analyses [Kirkland et al., 2007]. A phenotype of positive mammalian clastogenicity but negative Ames mutagenesis and in vivo studies has been referred to as a “false positive” or “nonrelevant” finding when the weight of evidence indicates that the in vitro positive results occur under conditions not achievable in vivo, or through “indirect” mechanisms not expected to occur at lower doses [Kirkland et al., 2005]. There are continuing efforts to understand the mechanism of these in vitro positive results to help determine their significance.
Finally, the in vivo rodent bone marrow micronucleus (MN) assay provides the only mandated evaluation of in vivo genotoxicity for carcinogenicity as well as potential germ cell mutagenicity [Waters et al., 1994]. As with its in vitro counterpart, performing the in vivo assay using flow cytometry can provide for more robust statistical analyses and more reliable dose response evaluation. Among marketed drugs, positive in vivo MN findings occur, about one-third as frequently as positive in vitro cytogenetics findings. However, Only 5 of the over 400 drugs that were tested in the full battery were positive only in the in vivo micronucleus assay (equivocal responses excluded), [Snyder, 2009]. These are special cases for which no explanation is readily available. In this dataset, however, of 60-rodent bone marrow MN-positive/equivocal drugs, only 19 were Ames-positive, whereas the rest (41 of 60) were clastogenic as expected. On the other hand, 76 Ames and/or cytogenetics positives were negative in vivo, which might simply be attributable to failure to attain high enough plasma concentrations to elicit positive in vivo responses. Unfortunately, the supporting data, which might allow a better understanding of mechanisms associated with positive (or negative) in vivo findings, offered in regulatory packages are very difficult to obtain due to their often proprietary nature. With the present European Medicines Agency (EMEA) directives to reduce in vivo testing, this may become a moot point [EMEA, 1997]. It would be interesting, however, to know how other in vivo endpoints such as Comet or alkaline elution would fare relative to the MN assay in this set of drugs.
What, if any, in vivo endpoints to use going forward is an open and active question. A battery of tests, such as the above, that includes an in vivo endpoint is presumed more sensitive in the detection of presumptive carcinogens; however, its specificity has been questioned in a series of articles [Kirkland et al., 2005, 2006; Kirkland and Speit, 2008]. Despite many years of discussion, a 2009 World Health Organization/International Program on Chemical Safety Harmonization Program [Eastmond et al., 2009] concluded that, “analyses of test batteries and their correlation with carcinogenicity have indicated that an optimal solution to this issue has not yet been found.”
The above considerations are the basis for ongoing discussions as to whether and how to alter the battery of in vitro and in vivo testing required for regulatory submission. The outcome of these deliberations may be a complete paradigm shift or may result in only incremental change.
Driven by regulatory concerns, the primary reason for conducting in vitro and in vivo genotoxicity analyses has been to try to predict which molecules are likely to be rodent and/or human carcinogens. The 2-year rodent cancer bioassay itself was originally established as a screen to identify potential carcinogens that would be further analyzed in human epidemiological studies [Bucher and Portier, 2004]. The rodent cancer bioassay has evolved as the primary means and “gold standard” for determining the carcinogenic potential of a chemical by providing the quantitative information on dose-response behavior required for risk assessments. However, the cancer bioassay requires the use of more than 800 mice and rats and the histopathological examination of more than 40 tissues. As a result, it is extremely costly, time consuming, and low-throughput. Based on protocols established by Gold coworkers for assessing experimental sufficiency, it is estimated that only ~1500 chemicals have been studied to date [Gold et al., 2005]. Approximately 560 of this listing were tested by the NCI/NTP and are summarized within the Carcinogen Potency Data Base (CPDB) (See “Summary by Chemical of Carcinogenicity Results from Technical Reports of the NCI/NTP” at http://potency.berkeley.edu/pdfs/NCINTPchemical.pdf).
Out of the 400 chemicals tested as of 1995, 210 (52%) induced tumors in at least one organ of one sex of one species (of the two species, both sexes typically used by NCI/NTP [Fung et al., 1995]). Only 92 chemicals (23%) were positive in both rats and mice, and thus by international criteria (i.e., IARC, the International Agency for Research on Cancer) would be considered likely to pose a carcinogenic hazard to humans. Based on this analysis, it was concluded that less than 5–10% of the 75,000 chemicals in commercial use might be reasonably anticipated to be carcinogenic to humans [Fung et al., 1995]. A study by Zeiger and Margolin  estimated that ~20% based on industrial organic chemicals are mutagenic based on a random testing of 100 such compounds in Salmonella.
More than 90% of the chemicals classified by IARC as known human carcinogens are mutagenic or genotoxic across numerous short-term tests and induce tumors at multiple sites in rodent species [Waters et al., 1999]. For such agents, the current genotoxicity test battery enables relatively simple, rapid, and inexpensive hazard identification by assessing chemically induced genetic damage. However, increasingly, studies are identifying substances for which rodent carcinogenicity has a prominent nongenotoxic component, results that have proven refractory to modeling or evaluation using traditional approaches. Thus, chemicals acting through nongenotoxic mechanisms are likely to be undetected by current genetic toxicity test batteries and, similarly, underrepresented by the current IARC classification process. Clearly, detection of nongenotoxic mechanisms for carcinogenicity extends beyond the purview of standard genotoxicity testing. To what extent, then, should the field genetic toxicology be solely focused and defined by the aim of predicting rodent and potential human carcinogenicity? These are questions that may be irrelevant to actual practice of assessing chemical safety by various regulatory and industrial concerns, but they underscore blurring of lines and distinctions among subdisciplines of toxicology in recent years.
Hence, we face two overarching challenges as we consider the field of genetic toxicology in the 21st century. The first is to take full advantage of advances in genomics, biologically based and computational toxicology models, high-throughput and new assay technologies to improve our ability to assess the impacts of chemically induced genetic damage, in all its possible forms, on human health. The second is to use these technologies to accurately and reliably assess new and existing chemicals for genetic toxicity potential more efficiently, cost-effectively, and with less reliance on animal models. The vast numbers of biologically uncharacterized environmental, industrial, and novel pharmaceutical chemicals do not allow the testing of each for genotoxicity in the standard resource-intense battery. In addition, new technologies are coming on-line that allow for more mechanistically based biological associations of cellular systems, genomic responses, and human targets in relation to human carcinogenicity. Computational prediction of genotoxicity and carcinogenicity based on chemical structure, biological attributes, and/or physicochemical properties has proven of value when applied within the application boundaries, and can be applied, to chemicals not yet synthesized. More sophisticated computational modeling of DNA interactions has also proven its utility in mechanistically well-defined problem areas. Toxicogenomics, which can broadly probe genomic responses of chemicals in relation to in vivo endpoints, is proving useful for illuminating and categorizing chemicals according to modes of action for carcinogenesis. Finally, medium- and high-throughput screening (HTS) technologies are becoming more affordable, feasible, and informative for broad-based toxicity profiling. By virtue of their requirement for very small amounts of chemical, they provide a means to simultaneously and efficiently screen hundreds to thousands of compounds across a broad array of biochemical targets, cell-based systems, and even model organisms; however, they also require robust informatics and computational toxicology approaches. In the remainder of this article, we elaborate on each of the above-mentioned areas with specific examples relating to the field of genetic toxicity.
Computational prediction of genotoxicity based on molecular structure, or structure–activity relationships (SAR), has met with a measure of success due in part to the availability of large public datasets for modeling. In particular, data from Ames Salmonella assays have been used to build some of the more successful and widely used SAR predictive models [see e.g., Kazius et al., 2005; Yang et al., 2008]. Genotoxicity predictions are more facile than predicting, say, hepatotoxicity or nephrotoxicity; this being due to the multitarget, multimechanistic nature of organ toxicities. True genotoxicity, in which the chemical interacts directly with cellular DNA, should be predictable based solely on chemical reactivity and the physicochemical properties of a molecule, particularly in the case of DNA alkylators or electrophilic compounds. This accounts for the partial success of programs such as Derek for Windows and MC4PC, among others, which rely primarily on recognition of what are termed “biophores” or “structural alerts.” However, all SAR programs generate “false positives” (nonspecificity) and “false negatives” (insensitivity) for a complex set of reasons. These center around factors such as the chemical “space” around the alerting structural feature, which could hinder that reactive center, the need for metabolism to activate a chemical to its reactive form, and whether or to what extent a chemical can enter a cell due to size, lipophilicity considerations, and so on. Quantitative SAR (QSAR) approaches in which various physicochemical parameters of a series of compounds are used to identify relationships to genotoxicity and genotoxic potency have also had some success, particularly when applied to prediction within a series of structurally similar congeners.
It could be argued that the great majority of directly DNA-reactive chemical moieties have already been identified and, despite improvement of the learning sets on which programs such as Derek for Windows and MC4PC are based, that only incremental increases in sensitivity and specificity might be possible. However, this thinking does not take into consideration those factors identified more recently such as noncovalent DNA interaction and interference with critical DNA metabolizing proteins such as topoisomerase and DNA polymerases [Snyder and Strekowski, 1999; Snyder, 2000, 2007; Snyder and Arnone, 2002; Snyder et al., 2004, 2006; and references therein], all of which can clearly contribute to genotoxicity but for which, as yet, insufficient peer-reviewed data have been generated for extensive modeling in SAR or QSAR learning sets. The strengths and limitations of both the SAR and QSAR approach will be discussed below, along with a perspective on how these technologies are being applied in practice within a regulatory setting and in industry.
QSAR-based computational toxicology (comptox) approaches are now used extensively around the world by regulatory agencies as an adjunct to safety assessment. The heaviest user today is probably the European Union, with de-emphasis on animal testing and recently phased-in laws for regulating new and existing substances such as for the Registration, Evaluation, and Authorization of Chemicals (REACH), the Seventh Amendment (of the Cosmetics Directive), and the Screening Information Data Set (SIDS). At the same time, Health Canada has used SAR and weight-of-evidence approaches extensively to prioritize their Domestic Substances List, as directed by the Canadian Environmental Protection Act. In the United States, the Environmental Protection Agency (EPA) Office of Pollution Prevention and Toxics has been using SAR, weight-of-evidence, and analog approaches for over two decades under the Toxic Substances Control Act to evaluate the environmental safety of chemicals, including High Production Volume (HPV) chemicals. Finally, in the Pacific region, Japan is considering the use of computational toxicology under the Law Concerning the Evaluation of Chemical Substances and Regulation for their Manufacture, whereas Australia has implemented the National Industrial Chemicals Notification and Assessment Scheme.
At the U.S. FDA, a dedicated team at the Center for Food Safety and Applied Nutrition (CFSAN) uses comptox approaches in an ongoing program to evaluate food contact substances for which minimal experimental data may be available as part of the food contact notification program. Currently, the Center for Veterinary Medicine is considering how comptox approaches may benefit them. The Center for Drug Evaluation and Research (CDER) uses comptox models to help evaluate toxicities of drugs and drug-related substances, such as precursors, degradants, and contaminants. CDER has a draft Guidance for establishing the safety of genotoxic contaminants that includes the use of comptox (http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm079235.pdf).
CDER contains an applied regulatory research group that has for over 10 years created toxicological and clinical databases, developed rules for quantifying toxicological and clinical endpoints, evaluated data mining and QSAR software, and developed toxicological and clinical effect prediction programs through collaborations with software companies [Matthews et al., 2000; Benz, 2007; Kruhlak et al., 2007]. In the last 2 years, The CDER comptox group has evolved from a group primarily doing basic research to providing an applied computational toxicology consulting service that supplies comptox evaluations for drugs, metabolites, contaminants, excipients, degradants, and so forth to FDA/CDER safety reviewers in preclinical, clinical, post-market, and compliance groups.
The overall goal of the CDER computational toxicology program is to develop capabilities to predict accurately with in silico software all toxicological and clinical effect endpoints of interest to the U.S. FDA/CDER. The ultimate goal is to substantially reduce the need for animal toxicological testing and human clinical trials in establishing the safety of FDA/CDER-regulated chemical substances. Overall, the group advocates use of computer predictions of toxicological and clinical effects to inform and provide valuable decision support of regulatory actions. Computational toxicology information can serve an important role when a regulatory decision must be made in the absence of all the safety information desired for chemicals being considered, or, when the result of a safety study is equivocal, predictions can be provided on related endpoints or chemicals.
The U.S. FDA published a “white paper” on March 16, 2004, describing the Agency’s Critical Path Initiative (www.fda.gov/ScienceResearch/SpecialTopics/CriticalPathInitiative/CriticalPathOpportunitiesReports/ucm077262.htm). In this article, FDA stated that “not enough applied scientific work has been done to create new tools to get fundamentally better answers about how the safety and effectiveness of new products can be demonstrated, in faster time frames, with more certainty, and at lower costs.” In this document, FDA further stated that “a new product development toolkit—containing powerful new scientific and technical methods such as animal or computer-based predictive models, biomarkers for safety and effectiveness, and new clinical evaluation techniques—is urgently needed to improve predictability and efficiency along the critical path from laboratory concept to commercial product.”
Today, as part of the U.S. FDA Critical Path Initiative, change has been officially institutionalized. The Office of Critical Path Programs now exists within the FDA Office of the Commissioner, and the Office of Translational Science has become a component of FDA/CDER. In addition, The Critical Path Institute, a private organization, is working toward the same ends. Within CDER there are active working groups such as the Computational Science Center Committee, the Research and Development Computing Advisory Board, and the Data Mining Council. New resources have been provided to power change, including newly opened, state-of-the art production and research computer data centers, and a multiparallel High Performance Grid Computing Center, FDA’s new “super-computer.”
FDA/CDER are working to bring about an orderly transition to new testing paradigms to establish drug safety. This is being accomplished through: (1) education, with symposia, seminars, publications, and training (all of which having been instituted for the CDER computational toxicology service has resulted in well over a 10-fold increase in requests for consultations over the last year); (2) evaluation and buy-in by CDER safety personnel; (3) internal marketing as a major component of the process; and, finally, (4) written directives, including internal MaPPs (manual of policies and procedures), and public Guidance that will be established with public input.
For comptox in particular, evaluation and buy-in is primarily being accomplished through the establishment of a Computational Toxicology work group. This group is formally a subcommittee of the U.S. FDA/CDER Office of New Drug’s Pharmacology/Toxicology Coordinating Committee (PTCC). The official mission of the PTCC/CTSC is to disseminate appropriate guidance to CDER review staff and the pharmaceutical industry in the assessment of computational toxicology studies, and to serve as a resource to the PTCC and to the Center on scientific and regulatory aspects of computational toxicology issues.
QSAR computational toxicology is useful to predict the results of in vitro and animal preclinical tests when no actual test data are available, laboratory data are equivocal, or when additional decision support is needed. This methodology is less expensive and faster than traditional means of safety evaluation, can provide decision support information, suggestions of mechanism/mode of action, and, in general, goes well beyond the traditional use of Ashby–Tennant structural alerts [Ashby, 1985]. “Seat of the pants” QSAR, visually looking for Ashby–Tennant structural alerts, is a 25-year old procedure that has served us well but is now obsolete due to a number of factors: (1) Ashby and Tennant did not have many pharmaceutical data to consider—pharmaceutical-specific structural alerts exist; (2) a simple examination of a molecule for Ashby–Tennant alerts does not take into consideration the effect of other chemical groups in the same molecule that modify the activity of the alert (modulators); (3) a computer can consistently and exhaustively look for all alerts and modulators; this is difficult for humans to do by simple visual inspection with a molecule as complex as a typical pharmaceutical; (4) there are other highly valuable ways to perform QSAR analyses that do not involve examining the atom connectivity per se; (5) there are characteristic structural alerts for all toxicological and adverse human clinical endpoints, not just Salmonella mutagenesis.
U.S. FDA/CDER has created many QSAR models to predict the ability of organic chemicals to cause cancer in rodents [Matthews and Contrera, 1998; Contrera et al., 2003, 2005a, 2007; Matthews et al., 2008], as well as genetic toxicity at several endpoints [Contrera et al., 2005b; Matthews et al., 2006a, b; Contrera et al., 2008]. Carcinogenicity is predicted on the basis of a training set consisting of bioassay data from over 1600 chemicals, on which over 25,000 individual records have been obtained. These data have been harvested from FDA archives, CDER Cancer Assessment Committee reports, NTP technical reports, IARC monographs, the L. Gold Carcinogenic Potency Data Base, and from the published literature. Specific rodent carcinogenicity endpoints modeled at CDER are for male mice, female mice, male rats, female rats, rats (both genders pooled), and mice (both genders pooled).
Genetic toxicology prediction models used by CDER are based on laboratory data on 5,880 chemicals, with 27,498 individual records [15,691 mutation (58.0%), 8,783 clastogenicity (31.9%), 2138 cell transformation (7.8%), and 886 DNA effects (3.2%)], with data obtained from the EPA GENETOX database, FDA archives, NTP technical reports, the published literature, and data sets collected by MultiCASE. There are currently nine QSAR models that are used in combination to predict rodent carcinogenicity, each based on a different genetic toxicity test (Salmonella typhimurium strains, Escherichia coli strains, fungal mutagenicity, Drosophila genetic toxicity, rodent mutation in vivo, Hprt in CHO and CHL, rodent micronucleus in vivo, rodent chromosome aberrations in vivo, and rat and human unscheduled DNA synthesis). Genetic toxicology models used to predict the current International Committee on Harmonization (ICH) S2b battery are for bacterial mutagenicity, in vitro chromosome aberrations, mouse lymphoma Tk+/− mutation, and in vivo micronucleus.
A current focus of research at CDER is developing a paradigm for using the results of more than one computational toxicology program for any one endpoint [Contrera et al., 2007; Matthews et al., 2008]. CDER currently uses five in silico programs for estimating the toxic potential of diverse chemicals. Four make their predictions based on statistical correlations between chemical attributes and toxicity [MC4PC (www.multicase.com), SciQSAR (www.scimatics.com), BioEpisteme (www.prousresearch.com), and Model Applier (www.leadscope.com)]. The fifth predicts toxicity and adverse effects based on human expert rules [Derek for Windows (www.lhasalimited.org)]. CDER believes that using together different prediction paradigms and analyzing the results with a meta-analysis will give the best overall prediction. None of the currently commercially available computational toxicology programs have all necessary functionalities; none of the programs have 100% coverage, sensitivity, and specificity; and all of the programs have some unique features (i.e., they are not completely over-lapping) in their strategies. Hence, the various models combined can achieve greater prediction accuracies within consensus prediction strategies.
A simple example of a decision support strategy is to utilize two or more comptox programs to independently make predictions and combine the predictions in different ways, depending on the needs of the regulatory application, to tune the outcome towards higher specificity or sensitivity. To attain high sensitivity, call the overall result positive if any one software program gives a positive prediction. To attain high specificity, call the overall result positive only if all software programs give a positive prediction. In either case, the types of chemicals that can be predicted (coverage) are improved because different programs cover different parts of the chemical universe.
The near-term effects of the use of computational toxicology on regulatory review are that fewer problematical chemicals are submitted to regulatory agencies, decision support information is available, and concerns can be prioritized. However, there will be a continuing, though more delimited need for “classical” testing and review because computational toxicology is of limited use with truly novel molecular entities (lack of coverage). The software can not make a prediction about something never seen before.
Using computational toxicology effectively can lead to greater efficiencies in the review process. When model predictions have a high degree of confidence, interpretability, and transparency, laboratory testing (and lengthy reviews) may not be needed. Similarly, if a submission reporting equivocal test results can be augmented by high confidence model predictions, then additional testing potentially could be avoided. Comptox also can be used to rapidly determine if postmarket toxicity is likely, thus, triggering additional targeted studies, although further development of this capability is needed. Ultimately, QSAR computational toxicology methods, as well as other newly developed techniques, will be applied as a means of reduction, replacement, and refinement for longer, more expensive testing.
In their 2007 report entitled Toxicology Testing in the 21st Century, the National Academy of Sciences stressed the importance of in silico toxicity prediction in the future state of safety assessment of drugs and chemicals [NRC, 2007]. Some of the major drivers for adoption of in silico approaches include the ethical issues surrounding widespread use of mammals, especially nonhuman primates for toxicity testing, the high cost and long timelines associated with traditional toxicology testing paradigms, and the pressure to increase productivity in the pharmaceutical industry without compromising patient safety. For these and other reasons, in silico approaches have become a mainstay of modern drug discovery and development. In this respect, no other toxicological discipline has been more profoundly influenced by the growth and development of in silico approaches as genetic toxicology. In many cases, drug-induced DNA damage is governed by well understood principles of chemical reactivity and physicochemical properties, and this has greatly facilitated development of structure-based in silico prediction models. Such models have been used in the pharmaceutical industry for many years, but the approaches to and applications of this technology have evolved alongside advances in modeling technique, the state of the knowledge base and the regulatory environment. This section reviews current approaches to in silico genotoxicity modeling in the pharmaceutical industry and looks forward to the future evolution of this technology. Three main issues are addressed: (1) the current state of in silico genotoxicity prediction in the pharmaceutical industry; (2) the ability of current approaches to meet the needs of industry in the 21st century; and (3) opportunities for improvement of in silico genotoxicity prediction.
To assess the current state of in silico genotoxicity approaches, a web-based survey of genetic and computational toxicologists in the pharmaceutical industry was conducted. The survey was composed of 10 questions dealing with how and when in silico approaches are used in decision making, the types of models used, and perceived shortcomings and challenges for improvement of in silico genotoxicity models. In all, 15 companies participated in the survey, representing both large pharmaceutical (80%) and smaller biotechnology companies (20%).
Almost all of the companies surveyed use in silico modeling of genetic toxicity endpoints as a part of their decision-making process. The most common application is to inform and guide testing strategies for novel compounds and to prioritize compound testing based on level of concern. Approximately half of the responding companies indicated that in silico modeling is also used as a stand-alone approach for decision making. A good example of this approach is the use of negative in silico predictions to qualify process impurities in clinical and production batches of drugs without the need for biological testing [European Medicines Agency, 2006; U.S. Department of Health and Human Services, FDA, CDER, 2008].
Most companies make significant use of in silico genotoxicity prediction in both the discovery and development phases of the pharmaceutical pipeline. In silico modeling capabilities are most frequently deployed in toxicology and safety assessment groups where the focus is largely on later stage compounds. However, approximately half the responding companies also leverage in silico modeling capabilities within medicinal chemistry to support early discovery efforts. Primary applications of in silico techniques in early drug discovery include virtual screening of planned compounds and prioritization of compounds for biological testing, conserving synthetic and testing resources for those compounds with the best chance of advancing into development. In the development phase, in silico approaches can be used prospectively to help direct synthetic routes, and retrospectively to characterize impurities and unique/disproportionate human metabolites, contributing to prioritization of testing and development of testing strategies for these entities.
Almost all companies use at least one commercially available in silico model, and approximately half of the respondents also use some types of proprietary in-house models to support genotoxicity screening. Among commercially available models, Derek for Windows (Lhasa) and MC4PC (MultiCASE) are the most frequently used (93 and 60% of respondents, respectively). Derek for Windows is a rule-based expert system that uses a library or knowledgebase of structural alerts derived from the open literature and from user-contributed data to predict activity of unknown compounds. Derek for Windows also allows users to implement custom alerts based on in-house data. MC4PC evaluates structure–activity data in a training set of molecules and identifies structural features that correlate with higher or lower mutagenic activity (biophores and biophobes, respectively). It then uses this information to derive ad hoc local QSAR models to predict activity of unknowns. The most common approach to in-house model development is customization of existing commercial models (e.g., development of proprietary Derek for Windows alerts based on proprietary data). Some companies also develop models from scratch using modeling packages such as MDL-QSAR (MDL Information Systems) or the LeadScope Predictive Data Modeler (Leadscope). In-house model development encompasses global models to broadly predict chemical space, and local models focused on specific chemical classes. About 40% of the responding companies rely on a single model for prediction of genotoxicity. Other companies use multiple prediction models, using either a weight of evidence approach or a more conservative single hit approach to resolve conflicting model predictions.
Given the ever increasing role in silico models play in decision support, it is appropriate to periodically challenge the adequacy and robustness of this approach. Not surprisingly, model accuracy stands out as the single biggest perceived challenge for in silico genotoxicity prediction, with ~40% of the survey respondents identifying high false positive and/or false negative rates as the two highest priorities for improvement, respectively. Model transparency and interpretability were also viewed as areas for improvement. Transparency is particularly important from a medicinal chemistry standpoint because understanding the structural basis for predictions is essential for hypothesis-based optimization of chemistry away from positive genotoxicity results. In addition, adequate model transparency is important for assessing the reliability of model predictions. Poor extrapolation power was also viewed as a significant shortcoming of current prediction models; only one respondent felt that the ability to extrapolate beyond known chemical space was currently acceptable.
Each in silico genotoxicity prediction model has strengths and weaknesses that are largely determined both by the nature of the training data set, and the prediction challenge to which the model is applied. Clearly, accuracy, interpretability, and extrapolation are considered major weaknesses of current prediction models in large part because of the inability of available models to adequately address specific decision support needs within industry. Hence, it is appropriate to ask, what are the underlying challenges to improvement in these areas? The two most commonly cited roadblocks to model improvement are adequacy of existing training sets and insufficient mechanistic understanding. Available public databases are based largely on environmental and commodity chemicals, and drug-like compounds are poorly represented. Despite the relatively high degree of mechanistic understanding in the area of genotoxicity, many of the subtle structural features that modulate mutagenic activity are poorly understood, and the chemical rationale for differentiating active and inactive members of a given chemical series is often obscure. In addition to these two major challenges to model improvement, a shortage of qualified people to develop novel models as well as people to run and interpret the models were also recognized as potential barriers.
The results of the industry survey paint a picture of the current state of in silico genotoxicity prediction characterized by high importance and value accompanied by suboptimal performance and significant barriers to improvement. These same issues have plagued computational toxicology for many years, yet little progress has been made in overcoming these barriers, and substantive improvements in model performance have not been forthcoming. So, the question is how can we get to the next level in genotoxicity prediction from where we stand today?
Expanding and diversifying model training sets to cover larger areas of chemical space is certainly needed, but achieving this in a publicly accessible manner has been problematic. Developing and testing novel chemistry are expensive and time consuming, and pharmaceutical companies are understandably reluctant to give away the competitive advantage derived from such proprietary knowledge. One possibility for data sharing that has received some attention is precompetitive collaborative testing agreements focused on commonly used building blocks. This would provide a means for companies to share both the testing burden and the acquired knowledge without giving away proprietary information. Another possibility is the sharing of derived knowledge without sharing actual compound structures or data. As discussed previously, some member companies of the Lhasa consortium contribute knowledge in this form to improve existing Derek for Windows alerts or to create new ones. Within individual companies, opportunities certainly exist to expand training sets for proprietary use. One of the interesting results of the survey was that over 50% of the responding companies do not engage in any form of proprietary model development. This suggests that many companies are sitting on potentially large sets of structure–activity data that could be used to improve their internal predictive capabilities.
Toxicophore-based approaches to in silico prediction are generally only applicable within the structural domain of the training sets on which they are based. Therefore, even with expanded and diversified training sets, the ability of these models to predict genotoxicity of novel chemotypes will be somewhat limited.
Factors such as steric hindrance, electronic effects, stereochemistry, and planarity (to name a few) may, to some degree, get incorporated into conventional SAR prediction models based on sufficiently large datasets; however, these models will still be bounded by the chemical space of the training sets. One approach to circumvent this barrier is the development of models based on first principles of chemical behavior. Ab initio computations provide access to numerous parameters reflecting potential for chemical reactivity and, hence, DNA damage, that may be difficult to build into toxicophore-based models. These include factors such as localization of electron density, orbital energy levels, optimized geometries, and thermodynamic stability that can be used to develop generalized models of DNA reactivity. Two examples of the use of quantum mechanical parameters in genotoxicity modeling are given as follows.
Aromatic amines are key structural elements in medicinal chemistry and constitute a pharmaceutically relevant class of potential genotoxicants and carcinogens. The mutagenic activity of primary aromatic amines has been attributed to transient formation of reactive nitrenium ions resulting from decomposition of a hydroxylamine metabolite. Ford and Griffin  built a predictive model for aromatic amine mutagenicity based on semiempirical calculation of the thermodynamic stability of the putative nitrenium intermediate from a small series of aromatic amines encountered in cooked meat. We have recently extended this model to higher levels of theory (Hartree–Fock and density functional theory) and used it to evaluate a test set of 257 primary aromatic amines [Bentzien et al., 2010]. The nitrenium stability model exhibits good sensitivity and specificity and is not dependent on a predefined training set. The model provides a continuous spectrum of activity values that allows medicinal chemists to predict subtle trends in SAR during compound optimization.
Computational prediction of genotoxicity related to noncovalent binding modes such as intercalation has been a challenging issue, because this class of compounds is generally devoid of alerting structural moieties. Even more problematic than traditional planar fused ring compounds are the so-called atypical intercalators, which are characterized by two to three unfused rings and the presence of one or more cationic centers—typically dialkylamines [Snyder, 2007]. To address this problem, Snyder et al.  adapted a computational DNA-docking model that was originally developed to facilitate discovery and optimization of antiestrogenic compounds [Hendry et al., 1994]. In this model, energy-minimized structures of the test compounds were generated and computationally docked into unwound but structurally intact DNA. Using this approach, a high degree of concordance between docking energies and in vitro DNA intercalation potency was demonstrated, and it was shown that structural features differentiating genotoxic versus nongenotoxic intercalators could be rationalized based on the binding energies.
The examples described above indicate the potential gains in predictive power to be gained from the application of advanced computational modeling based on first principles. Clearly, the initial development of such models requires significant knowledge of, and capabilities in computational chemistry, and close collaboration between genetic toxicology and computational chemistry groups is essential for success. However, once developed, complex models such as these can often be made accessible to nonspecialist users via a user-friendly interface such as Pipeline Pilot (Accelrys). Although such models require more effort to develop and implement, the advanced predictive capabilities they offer should lead to the design of molecules with better genotoxicity profiles, resulting in fewer failures and overall lower resource costs in later stages of drug development. These types of advanced modeling approaches may well be a critical contributor to the future state of in silico genotoxicity prediction.
Potentially important guidance for the selection of biomarkers of genotoxicity that may be valuable for HTS as well as mechanistic evaluations is available from gene expression profiling. Over the past 6 years, a series of toxicogenomics investigations have shed new light on the potential use of gene expression analysis to more properly classify putative carcinogens and to predict carcinogenicity [Waters et al., in press]. Toxicogenomics (mRNA transcript profiling) to identify genes that respond to groups or categories of chemicals and statistical classification methods together with machine learning [Auerbach et al., 2010] clearly have progressed in their ability to predict carcinogenicity. Recent studies reviewed in Waters et al. [in press] by Van Delft et al. [2004, 2005], Ellinger-Ziegelbauer et al. [2005, 2008, 2009a], Nakayama et al. , Nie et al. , Tsujimura et al. , Fielden et al. [2007, 2008], Thomas et al.  suggest that it is possible to screen for carcinogenicity and to discern the potential mode of action (MOA) of a chemical based on analysis of gene expression using toxicogenomics methods, with the potential for screening of putative MOA by quantitative RT-PCR [Ellinger-Ziegelbauer et al., 2009b]. The strategy used involves selecting a training set of known carcinogenic (genotoxic or nongenotoxic) and non-carcinogenic compounds to which to expose mice or rats in vivo or cultured cells in vitro for periods of from 1 to 90 days. Statistical classification techniques [see Van Delft et al., 2005] are used to identify expressed genes that discriminate the possible outcome categories. The training set genes are then used in classifying untested chemicals. In other words, the genes expressed by the compounds in the “training sets” are used to reveal the potential MOA of unknown chemicals and to classify whether they are DNA reactive, or a carcinogen or non-carcinogen.
Toxicogenomics methods using rodent and human whole genome chips provide a means to identify potentially useful biomarkers and can also serve to confirm the results obtained using HTS. Performing toxicogenomics studies on chemicals that are both rodent and human carcinogens could thus identify biomarkers with more direct relevance to human health [Thomas et al., 2007], which would increase the predictivity of HTS. Compounds that do not produce positive test results in the conventional genotoxicity assays and that do not exhibit biomarkers of genotoxicity in toxicogenomics methods are very unlikely to pose a genotoxic carcinogenic risk to humans. The same cannot be said for putative nongenotoxic carcinogens that are identified through the use of conventional as well as toxicogenomics methods. However, it should be possible in such cases to use toxicogenomics methods to characterize their likely MOA by comparison to previously well-studied chemicals as demonstrated by Fielden et al.  and, with more experience, to predict relevance to humans [Waters et al., in press].
Several types of medium to high-throughput assays have been developed to assess bacterial mutagenesis, (e.g., SOS chromotest, Vitotox, Ames II), or clastogenesis (e.g., Radarscreen, GreenScreen, Yeast Deletion analysis), and a few (e.g., Ames II) have successfully incorporated metabolic activation through use of rodent liver metabolizing S9 fractions. In some cases, these assays are considered robust enough to provide a sufficient degree of confidence that a chemical is or is not genotoxic. In other cases, and more in line with the consensus use of these models in industry, the assays simply provide a means of reducing the numbers of compounds under consideration for development. With large numbers of candidates arising from combinatorial chemistry and library screening, the risk of throwing out a good chemical lead because of a bad response in an HTS assay is acceptably low because there are usually many more equally efficacious molecules that are not alerting for genotoxicity. All else being equal, one would always opt for compounds having no predicted or measured genotoxicity activity for further evaluation. Compounds eliciting positive genotoxicity responses most often contain chemical moieties to which this DNA reactivity can be attributed, but this is not always the case. Although the above-mentioned tests offer promising alternatives to the more resource and time intensive standard genotoxicity testing battery, the cost and throughput remain limiting for use in HTS testing programs currently underway in the public sector. In these cases, biological activity profiles derived from a panel of HTS assays probing multiple targets and cell-based activities may offer a broader context and alternative means for inferring common DNA reactivity or potential carcinogenicity of dissimilar chemicals.
The number of chemicals to which humans is potentially exposed, for which no or only limited toxicological data are available, has increased dramatically over the last century posing a public health challenge [Claxton et al., 2010]. The U.S. NTP, a partner in the U.S. Tox21 collaborative effort along with the U.S. EPA’s National Center for Computational Toxicology (NCCT) and the National Institutes of Health (NIH) Chemical Genomics Center (NCGC) [Collins et al., 2008; Kavlock et al., 2009], has been evaluating HTS technologies to prioritize compounds for more extensive toxicological testing, for identifying mechanisms of action and, ultimately, for predicting adverse health effects in humans (http://ntp.niehs.nih.gov/go/32132). The NCGC was established to apply the tools of small molecule screening and discovery to the development of chemical probe research tools for use in the study of protein and cell functions, as well as biological processes relevant to physiology and disease. To meet this goal, the NCGC performs quantitative HTS (qHTS) in 1536-well format using robotics. In this format, compounds are tested for activity in cell- or biochemical-based assays over a 15-point concentration-response curve at concentrations ranging typically from 5 nM to 92 μM. Assay volumes are ~5 μL with ~1000 cells/well. Among other constraints, assays conducted under these conditions are limited to those that are homogeneous (i.e., all of the components of the assay are present during measurement), and the assay must be completed within ~48 h of adding cells to the wells (see http://www.ncgc.nih.gov/guidance/HTS_Assay_Guidance_Criteria.html). In the initial evaluation of this qHTS screening approach at the NCGC, the NTP provided 1408 compounds (1353 unique and 55 in duplicate) dissolved in dimethylsulfoxide (DMSO) at 10 mM; molecular weights ranged from 32 to 1168 and the calculated log P from −3 to 13.2. Almost all of the compounds in this library were associated with one or more sets of in vitro and/or in vivo toxicological data generated by the NTP.
Among the toxicological endpoints of interest is the detection of compounds with genotoxic activity. However, the standard in vitro genotoxicity assays (mutational assays in prokaryotes and mammalian cells, chromosomal aberrations, or micronuclei in mammalian cells) are not currently amenable to qHTS using 1536-well plates and robotics. Thus, the NTP in collaboration with the NCGC conducted an evaluation of a novel qHTS strategy for genotoxicity based on the detection of differential cytotoxicity in a set of isogenic DT40 mutant clones deficient in different DNA repair pathways compared to the parental wild-type cell line [Ji et al., 2009]. The DNA-repair-deficient clones used included those deficient in ATM (ATM arrests cell cycle when chromosomal breaks are present, making cells tolerant to reactive O2 species), FANCc (required for eliminating interstrand crosslinks), ku70/rad54 (ku70 is required for repairing chromosomal breaks, rad54 is involved in repairing chromosomal breaks that occur during DNA replication), pol β (DNA polymerase β repairs base damage and single-strand breaks), and rev3 or ubc13 (both have multiple functions in cellular tolerance to a variety of DNA damage). The premise behind this testing strategy is that induced DNA damage will have an adverse effect on the growth characteristics of a DNA repair deficient clone, when compared to the parental wild-type cell line. Furthermore, a panel of DNA repair mutants would allow for the potential characterization of the nature of the DNA lesions caused by a genotoxic chemical. DNA repair deficient clones derived from a chicken DT40 cell line are used because gene disruption occurs very efficiently in this cell type [Buerstedde and Takeda, 1991]; thus, mutants of every DNA repair gene are available. Moreover, DT40 cells possess a number of other advantages compared with mammalian cells. First, these cells are able to proliferate more rapidly (cycle time is ~8 h) than most mammalian cell lines [Zhao et al., 2007; Hori et al., 2008] and, accordingly, contain a relatively high proportion of cells in S phase (70%). Second, DT40 cells are defective in the damage checkpoint (i.e., impaired transient cell cycle arrest at the G1/S and G2/M boundary in the presence of DNA damage). This increases the likelihood that cells with DNA damage will complete DNA synthesis, express chromosomal damage during mitosis, and die. To evaluate effects on cell growth, levels of adenosine triphosphate (ATP) were measured 24 h after the addition of the compounds [Xia et al., 2008].
Based on the characteristics of the negative and positive control data collected from the multiple experiments, the DT40 cell lines represented a robust assay for qHTS. In an initial assessment using the NCGC’s curve class approach for identifying active compounds [Inglese et al., 2006], the number of such compounds ranged from ~5% in the wild-type to ~15% in the clones deficient in rev3 or ubc13, which have multiple functions in cellular tolerance to a variety of DNA damage [Zhao et al., 2007; Ji et al., 2009]. Compounds well-known for their direct-acting genotoxicity [e.g., adriamycin (e.g., http://monographs.iarc.fr/ENG/Monographs/suppl7/Suppl7-10.pdf) and melphalan (http://monographs.iarc.fr/ENG/Monographs/suppl7/Suppl7-100.pdf)] exhibited the expected differential cytotoxicity in the DNA-repair-deficient clones, whereas follow-up chromosomal aberrations studies conducted on a small number of compounds demonstrated that the differential cytotoxicity is associated with an increased level of chromosomal aberrations in the DNA repair deficient clones (K. Yamamoto, manuscript in preparation). To potentially increase the detection of direct-acting genotoxic compounds using this approach, data collected at 48 h after the start of treatment are being compared with the original data collected at 24 h, with the expectation that the greater culture time will increase the extent of the differential growth response between the wild-type and the DNA-repair-deficient clones.
In addition to this approach, the utility of other HTS assays for detecting direct-acting genotoxic compounds is being evaluated at the NCGC. For example, data on the NTP 1408 compound library from an HTS assay to detect compounds that stimulate TP53, a gene upregulated in response to the presence of increased levels of DNA damage, are being compared with data from the DT40 clones. Also, assays suitable for HTS that would detect the upregulation of genes more specifically involved in the recognition of DNA damage are of interest (note: individuals wishing to nominate an assay for potential use at the NCGC should complete the assay nomination form located at http://ntp.niehs.nih.gov/go/27911). Clearly, a major limitation of genotoxicity-related assays at the present time is lack of metabolic activation capability. The typical approach used in classical genetic toxicology of adding S9 mix prepared from the liver of a male rat is not practical given the need to use homogeneous assays and exposure periods that last up to 48 h, so new approaches must be developed. The incorporation of metabolic activation capabilities in HTS is currently a primary assay development goal within the NCGC.
The next phase of qHTS at the NCGC will include the screening of a library of ~10,000 compounds with known structures as part of the Tox21 project. This library is being developed in collaboration with the EPA NCCT and the NCGC and will include the full EPA ToxCast™ chemical library (~1,000 chemicals), for which a much broader spectrum of commercial and research qHTS assay technologies are being used through the affiliated Tox-Cast™ project, discussed in the next section. In a departure from typical HTS applications in industry, Tox21 will incorporate procedures for determining the identity, purity, and stability of each compound. About one-third of the library will consist of approved drugs; the remainder of compounds is being selected by the NTP and the NCCT. Ultimately, all of the HTS data generated will be made publicly available through the National Library of Medicine’s PubChem. project (http://pubchem.ncbi.nlm.nih.gov/), NTP’s CEBS (Chemical Effects in Biological Systems; http://www.niehs.nih.gov/research/resources/databases/cebs/index.cfm), and EPA’s ACToR (Aggregated Computational Toxicology Resource; http://actor.epa.gov/actor/faces/ACToRHome.jsp). These very large and comprehensive data sets will be used to build models to prioritize compounds for more extensive testing, to provide mechanistic information on the relation between HTS data and adverse health outcomes in humans and laboratory animals and, ultimately, for predicting adverse health effects in humans.
The need to evaluate thousands of environmental chemicals posing exposure concerns yet lacking toxicity data has provided major impetus for development of computational toxicology programs within the U.S. EPA. To address this challenge, EPA’s ToxCast™ program [U.S. EPA, 2011a] is generating a broad spectrum of HTS in vitro assay results for a large number of environmental chemicals [Dix et al., 2007]. The computational objective of this research is to use informatics, mining, and modeling approaches to relate profiles of information in chemical and in vitro data domains with profiles of in vivo effects that constitute the various toxicity “endpoints” of concern. An important measure of the ultimate success and utility of this approach, however, will be the degree to which such associations can be anchored to biological pathway and mechanism concepts for grouping and rationalizing in vitro assay results, as well as to known chemistry and chemical reactivity concepts.
In Phase I of ToxCast™, HTS data were generated for 309 unique chemicals in nearly 500 HTS in vitro assays, the latter spanning several currently available commercial assay technology platforms and including a broad spectrum of biochemical (cell-free) and cell-based assays. ToxCast™ assays, similar to the NCGC-run assays discussed in the previous section, do not incorporate metabolic capability at this time, a well-acknowledged limitation of current HTS technologies. The Phase I chemical library consists of mostly pesticide actives spanning a diversity of pesticidal mode of action and chemical structure feature space. For the large majority of these chemicals, detailed, treatment-level, multispecies, in vivo reference toxicity data were collected from EPA pesticide registration data, and are available in standardized, computable form within the EPA ToxRefDB database [U.S. EPA, 2011b]. Thus, the full Phase I dataset consists of high dimensionality chemical property and structure data, in vitro assay results, and reference in vivo data. Recent ToxCast™ publications have reported various explorations of the in vivo database to propose candidate profiles [Martin et al., 2009a, b; Knudsen et al., 2009] as well as preliminary associations and putative signatures relating in vitro results to in vivo effects [Judson et al., 2009, 2010; Martin et al., 2010]. ToxCast™ Phase I in vitro and in vivo data and publications are available through the ToxCast website [U.S. EPA, 2011a], and ToxCast™ Phase I chemical structure files are publicly available through the EPA DSSTox website, which offers standardized structure-data files pertaining to toxicology for use in SAR modeling [U.S. EPA, 2011c].
Although Phase I did not explicitly include commercial higher throughput versions of the standard battery of genotoxicity test systems, largely due to cost and lower throughput considerations, the ToxCast™ Phase I dataset was examined from the vantage point of genotoxicity relevance [Knight et al., 2009]. The current version of the EPA ToxRefDB database, which has significant overlap with the chemicals in ToxCast™ Phase I, does not contain genetic toxicity study data, but rather is focused primarily on chronic and developmental in vivo endpoints. In addition, of the known tumorigens in this largely pesticidal database, most are considered to be carcinogenic through nongenotoxic mechanisms. Thus, the ToxCast™ Phase I dataset is a somewhat atypical representation of larger rodent carcinogenicity databases in the public domain (e.g., NTP, CPDB) consisting of primarily industrial chemicals. Within the ToxCast™ Phase I in vitro assays, however, there are a handful of assay systems that are believed to relate to genotoxicity mechanisms and that could, therefore, be examined for associations with the rodent carcinogenicity data within ToxRefDB. The 3 HTS assays considered represented two gene targets in their endpoints, TP53 and GADD45α in a TP53 competent cell line; TP53 is known to act as a “gatekeeper” to ensure genetic and cellular integrity during the cell cycle, and GADD45a (growth arrest and DNA damage) mediates the cell’s response to genotoxic stress. The three-assay systems gave a largely nonoverlapping result, which suggests different aspects of the biology are being probed. In addition, the GADD45α assay (GreenScreen) showed low overall sensitivity, but high specificity in predicting rodent tumorigens and, in comparison with Ames Salmonella summary data collected for 108 ToxCast™ chemicals from public sources, was able to detect cases of Salmonella negative tumorigens. Hence, although genetic toxicity was not explicitly addressed in these early phases of ToxCast™, similar to the efforts of the NTP described previously, some included assay systems offer the promise of providing useful genetic toxicity mechanism information for purposes of screening and prioritization.
ToxCast™ is moving beyond Phase I and into Phase II, expanding and diversifying the chemical library with an additional 700 chemicals that include data-rich nonpesticides, or are of high concern or interest to various EPA research, regulatory and program offices due to known mechanisms or high potential for exposure. The library includes 100 failed drugs donated by major pharmaceutical company partners along with preclinical and, in some cases, clinical data. The Phase II chemicals are being run through most of the same in vitro assays as Phase I chemicals, thus, expanding the biological depth and chemical breadth of in vitro data for use in modeling of in vivo endpoints. The affiliated Tox21 program, of which EPA’s NCCT is a partner, will include the full ToxCast™ Phase I and II chemical inventories (~960 unique chemicals [U.S. EPA, 2011a]), as well as several thousand additional industrial environmental chemicals sponsored by the EPA. The full Tox21 library of ~10,000 chemicals, mentioned previously, will be tested in a smaller, more targeted set of HTS assays being codeveloped with the NTP and NCGC, and implemented for the robotic 1536 plate well format at the NCGC testing facility. Across these programs, the DSSTox project framework is applying strict quality standards for chemical structures and information, pertaining to both generic (ToxRefDB) and actual test substances (ToxCast™, Tox21) and will facilitate public data release of all ToxCast™ and Tox21 structures and chemical-sample-related information through the DSSTox website [U.S. EPA, 2011c]. Both the ToxCast™ and Tox21 efforts are committed to full public data release of all chemical, in vivo, and in vitro data, along with efforts to create publicly available data formats and tools for facilitating data analysis.
It should be reiterated that in these initial phases, Tox-Cast™ and Tox21 program objectives do not extend to replacing guideline toxicity testing practice, but rather to developing new methods for screening and prioritizing chemicals with currently available and affordable HTS in vitro technologies, on a much larger scale than has been previously possible within the constraints of traditional toxicology methods. Along with this comes the need to build the necessary informatics infrastructure for representing the various domains of high-dimensional biological and chemical information in computable form, as well as learning how to best mine and model such data [Richard et al., 2008]. From a testing standpoint, the known limitations of current in vitro technologies (reproducibility, sensitivity, and biological relevance), chemical handling (purity, stability, and solubility), and limitations of in vitro in relation to in vivo endpoints (including ADME—adsorption, distribution, metabolism, and elimination) must be considered, communicated, and factored into program implementation [Claxton et al., 2010]. To this end, both the ToxCast™ and Tox21 programs will include significant considerations for plate and assay replicates, extensive dose range coverage, positive controls, and strict standards for chemical annotation, tracking, and analytical chemistry QC.
It is well established that metabolism plays an important role in the activation of many classes of genotoxic and carcinogenic chemicals (e.g., aromatic amines, PAHs, etc.) and, thus, presents a particular challenge to current HTS-based programs. Based on historical data on approximately 1900 NTP compounds evaluated in Salmonella TA100 with and without rat S9, ~27% are TA100 positive, and slightly more than half of these (14% of the total tested) require metabolic activation to express genotoxic activity. Similarly, ~12% of the TA100 actives (or ~2% of the total tested) are inactivated by metabolic activation [Zeiger, unpublished results]. These and other data can be used to identify areas of chemical space most impacted by lack of explicit metabolic capability in the assay systems, and metabolic predictions and SAR modeling can be used to anticipate outcomes in these cases. More generally, building on existing databases, SAR, and cheminformatics approaches will play an essential role in helping to anticipate and account for the role of metabolism and ADME in bridging the gap between in vitro and in vivo outcomes. Moving forward, it will be important to consider what role genetic toxicology can play in a new HTS in vitro toxicity testing paradigm, and what are the implications of the new HTS testing programs, such as Tox-Cast™, for the genetic toxicology community.
Advances in computational toxicology, SAR prediction models, databases, toxicogenomics, and HTS testing programs are poised to significantly alter the current paradigm of toxicity screening within industry and regulatory agencies. Despite the widespread application and use of genotoxicity screening assays, particularly the Ames Salmonella test, in drug development and industrial chemical safety assessment, both the current in vitro testing battery and widely used SAR computational approaches, such as Derek for windows and MC4PC, have well known and acknowledged limitations. SAR approaches have proven value for prediction of genotoxicity when the chemical space is well covered. SAR approaches also can indirectly account for metabolic and adsorption/elimination factors, given sufficient training data on which to base predictions of in vivo activity. SAR models for genotoxicity prediction are not only in routine use within industry and government agencies, such as the U.S. FDA, but also have well acknowledged limitations, particularly when applied to safety screening of novel structures such as in drug development. SAR also is capable of probing fundamental chemical–DNA interactions with ab initio techniques and is the only technique available for assessing the potential hazard of chemicals not yet synthesized, now and for the foreseeable future. In contrast, newer technologies such as HTS and toxicogenomics, offer the ability to circumvent some of the most serious limitations of SAR because novel chemical structures are likely to perturb common biological pathways and processes contributing to toxicological outcomes. Toxicogenomics and HTS testing strategies are enabling for the first time the ability to broadly probe biological targets, pathways, and mechanisms in relation to toxicity endpoints, including genetic toxicity, for large numbers of chemicals. The infusion of significant amounts of new data into the public realm pertaining to the biological profiles of thousands of chemicals is opening significant new opportunities for addressing fundamental problems in relation to toxicity screening and human health safety assessment. Given the central importance of genotoxicity screening as a front-line safety assessment tool, and the depth of experience and insight into genotoxicity mechanisms in relation to carcinogenicity, genotoxicity should serve as both a model and a target of new approaches to utilize these data.
In summary, a major challenge for the field of toxicology in general, and genetic toxicology in particular, is to embrace computational toxicology, structure-based and in silico prediction approaches, and new assay technologies that are able to efficiently screen thousands of chemicals, but to do this in a way that builds on the significant foundation of past genetic toxicology accomplishments. Given its unique placement in the discipline of toxicology, genotoxicity can serve to anchor and guide these new technologies, helping to facilitate and lead the transition to 21st century toxicology.
The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency, the U.S. Food and Drug Administration, or the NIEHS National Toxicology Program.