|Home | About | Journals | Submit | Contact Us | Français|
Quantitative structure-activity relationship (QSAR) models are widely used for in silico prediction of in vivo toxicity of drug candidates or environmental chemicals, adding value to candidate selection in drug development or in a search for less hazardous and more sustainable alternatives for chemicals in commerce. The development of traditional QSAR models is enabled by numerical descriptors representing the inherent chemical properties that can be easily defined for any number of molecules; however, traditional QSAR models often have limited predictive power due to the lack of data and complexity of in vivo endpoints. Although it has been indeed difficult to obtain experimentally derived toxicity data on a large number of chemicals in the past, the results of quantitative in vitro screening of thousands of environmental chemicals in hundreds of experimental systems are now available and continue to accumulate. In addition, publicly accessible toxicogenomics data collected on hundreds of chemicals provide another dimension of molecular information that is potentially useful for predictive toxicity modeling. These new characteristics of molecular bioactivity arising from short-term biological assays, i.e., in vitro screening and/or in vivo toxicogenomics data can now be exploited in combination with chemical structural information to generate hybrid QSAR–like quantitative models to predict human toxicity and carcinogenicity. Using several case studies, we illustrate the benefits of a hybrid modeling approach, namely improvements in the accuracy of models, enhanced interpretation of the most predictive features, and expanded applicability domain for wider chemical space coverage.
Computational toxicology is a rapidly growing field that combines methodologies from computer science, bio and cheminformatics, chemistry and molecular biology (reviewed by Kavlock et al., 2008; Nigsch et al., 2009; Rusyn and Daston, 2010). Due to advances in biological screening technologies, multiple streams of novel toxicological data, ranging from short-term in vitro assays to various in vivo endpoints, are available for hundreds of chemicals (Martin et al., 2009; Shukla et al., 2010). The Tox21 consortium of the U.S. Environmental Protection Agency (EPA), National Toxicology Program (NTP), National Institutes of Health Chemical Genomics Center (NCGC), and U.S. Food and Drug Administration (FDA) is generating extensive quantitative in vitro data by screening hundreds to thousands of environmental chemicals in hundreds of experimental systems with the goal of re-establishing the field of predictive chemical toxicology under the paradigm of in vitro-in vivo extrapolation (Collins et al., 2008). Many chemicals have been screened for toxicity phenotypes in cells from multiple individuals (Choy et al., 2008; Lock et al., 2012; O'Shea et al., 2011). In addition, toxicogenomics data collected for hundreds of chemicals provide another dimension of experimental knowledge that is potentially useful for predictive chemical toxicity modeling (Fielden et al., 2007; Uehara et al., 2011). Innovative frameworks are required to integrate these rich and diverse new data for systematic investigation of the determinants of endpoint toxicity, including underlying chemical, biological, and genetic factors.
The explosive accumulation of biomolecular screening data that may help explain and predict toxicity mechanisms has led to the development of novel computational tools and databases (Barros and Martin, 2008; Blomme et al., 2009; Fielden et al., 2007; Waters and Fostel, 2004). The ultimate goal of computational modeling is fast and accurate estimation of environmental hazards and human health risks with minimal to no dependence on animal testing (National Research Council, 2007).
Cheminformatics approaches, such as quantitative structure-activity relationship (QSAR) modeling, have been traditionally used to rationalize biological screening data and employ resulting models, or predictors, as an initial virtual screen for efficacy and/or safety of candidate chemicals. The availability of predictive multidimensional in vitro and/or in vivo molecular data on a particular compound greatly facilitates decision making regarding its potential health hazard and mechanisms thereof (Roth et al., 2011). However, new regulations in Europe and initiatives in the United States (National Research Council, 2007) are applying pressure on the scientific and risk assessment communities to develop improved methods for evaluating thousands of chemicals (Rusyn and Daston, 2010; Schwarzman and Wilson, 2009).
Adverse outcomes in vivo depend both on the chemical’s structure and the underlying toxicity mechanisms. In tune with the proliferation of transdisciplinary computational biology approaches to unravel chemical toxicity mechanisms, this review highlights several novel integrative strategies for prediction of in vivo chemical toxicity by concordant exploitation of both a chemical’s structure and its short-term biological effects. Several recent studies demonstrate that statistically significant and externally predictive hybrid models can be developed. Hybrid modeling also affords a possibility of mechanistic interpretation both in terms of underlying chemical features and mechanisms of toxicity. Herein, we describe a general computational framework for modeling chemical toxicity using cheminformatics approaches, summarize recent hybrid modeling methodologies for in vitro-in vivo extrapolation paradigms, and comment on the outlook for the future use of these tools in computational toxicology.
Chemical structure–based predictors generally fall into two types: QSAR and expert systems (Valerio, 2009). QSAR are statistical models linking molecular structures (represented by chemical descriptors) to an activity such as an adverse health outcome (e.g., toxicity). QSAR embodies the principle of similarity, assuming that structurally similar chemicals may also have closely aligned activities. For example, chemicals with ≥ 0.85 similarity (based on the Tanimoto coefficient) to known actives were 30 times more likely to be confirmed as active than those picked randomly (Martin et al., 2002). Expert systems, on the other hand, are models based on rules determined by the scientific consensus of the experts. For instance, the Ashby-Tennant structural alerts for carcinogenicity (Ashby and Tennant, 1994) have been incorporated into many software tools (Marchant et al., 2008). There are a number of public and commercial stand-alone or web-based modeling systems that have been developed for prediction of a large number of toxicity-relevant endpoints (Tables 1 and and2).2). Several recent publications provide an excellent overview of the computational tools employed in toxicology (Nigsch et al., 2009; Valerio, 2009).
Although QSAR modeling techniques are under continuous development, most predictors are not considered to be accurate enough for estimating complex biological phenotypes (Rusyn and Daston, 2010). Low quality of data, overextrapolation, and poor definition of the phenotypes to be predicted have been identified as factors limiting the accuracy of prediction of absorption, distribution, metabolism, excretion, and toxicity endpoints by QSAR (Penzotti et al., 2004; Stouch et al., 2003). In addition, the inherent limitations of QSAR lie in the general complexity of factors that impact the ultimate adverse health effect of a chemical, including pharmacokinetics, temporality, or the fact that multiple mechanisms and interconnected molecular signaling pathways may lead to the same toxicity phenotype. Thus, it is not surprising that the performance of QSAR models is inversely correlated to the complexity of the modeled endpoints (Hou and Wang, 2008; Penzotti et al., 2004), higher accuracy being expected for predicting in vitro results, and lower accuracy observed for more complex in vivo endpoints, such as carcinogenicity (Benigni and Bossa, 2008). Given these limitations, it is unlikely that significant gains in prediction accuracy would be achieved by implementing alternative machine learning techniques or developing new chemical descriptors.
Alternative methods have been proposed to improve predictive accuracy and take into account novel data streams that may help in overcoming some of the inherent limitations detailed above. Indeed, mechanistic toxicology research has taken advantage of technology developments in biomedical sciences. Toxicogenomics, proteomics, and metabolomics provide experimental approaches for viewing the complete biological system that is modulated by a chemical (Ekins et al., 2005). These complex multidimensional data are now routinely used in drug and chemical safety evaluation, providing valuable mechanistic understanding of the molecular changes associated with the disease or treatment (Cui and Paules, 2010). The utility of these data in predictive toxicology has also been explored. A number of studies reported on the development of models that use omics data (most of these used transcriptional profiling) to predict chronic toxicity phenotypes (e.g., carcinogenic potential) with acute or subchronic study–derived information (Fielden et al., 2007; Uehara et al., 2011) or to classify chemicals with respect to their potential mode of toxicity (Fielden et al., 2011; Uehara et al., 2010; Waters et al., 2010).
Recent advances in automated quantitative high-throughput screening (qHTS) have generated extensive biological data that can be modeled using statistical or machine learning techniques (Shukla et al., 2010). The Tox21 program (Collins et al., 2008), a partnership between EPA, NTP, NCGC, and FDA, is leading the field in use of a broad spectrum of in vitro assays, many in qHTS format, to screen thousands of environmental chemicals for their potential to disturb biological pathways that may result in human disease (Xia et al., 2008). Such data on toxicologically relevant in vitro endpoints can be utilized as hazard-based triggers to inform prioritization for additional testing (Reif et al., 2010), to predict in vivo toxicity (Martin et al., 2010), or to generate testable hypotheses concerning the underlying mechanisms of toxicity (Xia et al., 2009).
Statistical models employing biological data such as gene signatures or qHTS data as independent variables are in principle similar to QSAR models because both employ similar computational tools and focus on predicting similar toxicity phenotypes (Table 3). Importantly, the biological data–based models have been shown to be both predictive and interpretable (Coen, 2010; Van Hummelen and Sasaki, 2010; Wetmore and Merrick, 2004). Still, pure biological data–based predictive modeling approaches are not intended to explain chemical-induced factors but focus on the general biological processes related to toxicity. Furthermore, such models are inherently insensitive to explicitly defined chemical features of the tested compounds, and new biological data must be generated in order to predict the toxicity of novel compounds. In the case of biology-based approaches, additional factors such as experimental variability, interpretability, and data acquisition costs also need to be considered.
To properly realize the joint benefits of bioinformatics- and cheminformatics-based approaches, several strategies can be envisioned (Fig. 1). The simplest approach is to utilize a “consensus” of QSAR and biological models that were derived independently to predict the same endpoint (Fig. 1A). Consensus modeling is an approach to developing an overall prediction by combining multiple classifiers, and it is widely used in traditional QSAR (reviewed in Dearden, 2003; Tong et al., 2006). Proponents of the consensus approach expound that combining multiple models that otherwise individually encode for different relationships would result in a more robust prediction (Tong et al., 2006). On the other hand, opponents question if the marginal predictivity gains are worth the added complexity of consensus modeling (Hewitt et al., 2007). Success of consensus prediction depends on the relative performance, applicability domain, and the number of included individual models (Penzotti et al., 2004). Although there are no published examples of consensus between QSAR and biological data–based models, this approach is likely to yield models of predictive performance in between that of contributing models as a consequence of statistical averaging. For example, in the simplest instance, predictions from a QSAR model and a biological model would be averaged into a final consensus score. Further improvements in consensus prediction may lie in adjusting relative contributions of the individual models.
There are several examples of how the modeling routine may use a “hierarchical” approach (Fig. 1B). First, it was suggested by several groups that a hierarchy of chemical descriptors of increased complexity may be used to improve a model's accuracy. For instance, Basak et al. (2003) developed models of cytotoxicity of halocarbons by utilizing a hierarchy of different types of computed descriptors of inherent chemical properties. In this method, model building begins with descriptors, which can be computed most easily, and additional descriptors that may demand more computational resources are added only if the easily calculable ones do not give satisfactory results. A similar approach was incorporated into hierarchical QSAR (HiT QSAR) software (Kuz'min et al., 2008). Both studies showed that the complexity of chemical descriptors has an impact on the accuracy of model predictions.
Second, a hierarchy of computational methods was used, whereby compounds are classified into subgroups with different levels of response using liner discriminant analysis followed by recursive partitioning for each subgroup (Manga et al., 2003). This study developed a model of drug biotransformation using physicochemical and structural descriptors to predict the percent of unmetabolized drug excreted after iv dose. The resultant hierarchical model for biotransformation was a three-level decision tree that incorporated various classification techniques and a series of arbitrary cutoffs.
Third, a hierarchical workflow was proposed to explore chemical structure/in vitro/in vivo relationships (Zhu et al., 2009). Under this approach, in vitro/in vivo correlation patterns for all compounds in the modeling set could be ascertained, and compounds may be clustered into several subsets (e.g., toxic both in vitro and in vivo; nontoxic in both cases; toxic in vitro but nontoxic in vivo) based on the discovered relationships. The modeling set compounds were partitioned into two or more subclasses, and a classification QSAR model was developed using chemical descriptors only. Then, subclass-specific QSAR models were developed. Thus, for any external compound, the classification model is used first to make assignment to one of the subclasses, and then a subclass-specific model is used to make a quantitative prediction of a compound’s toxicity.
An alternative strategy is a “hybrid” approach (Fig. 1C), in which biology-derived features and chemical structural properties are pooled into a joint descriptor matrix, which is then used for modeling. Although, in principle, such joint descriptors may have limitations (i.e., data quality, cost of data acquisition, etc.), recent studies suggest that hybrid descriptors do afford improvement to the accuracy of prediction of in vivo toxicity. Several recent publications (Low et al., 2011; Sedykh et al., 2011; Zhu et al., 2008) provide illustrative examples of hybrid modeling.
For example, Zhu et al. (2008) have introduced a concept of chemical-biological descriptors where conventional chemical descriptors are augmented by binary qHTS results (“active” response is encoded as “1,” “inactive” as “0”) from a variety of assays to create a single combined array of hybrid descriptors. Using chemical descriptors only, QSAR modeling resulted in 62.3% prediction accuracy for rodent carcinogenicity applied to the data set of over 300 chemicals for which rodent 2-year cancer bioassay data were available. Importantly, the prediction accuracy of the model was significantly improved (to 72.7%) when chemical descriptors were augmented by qHTS cytotoxicity data on six rodent and human cell lines, which were regarded as biological descriptors.
Sedykh et al. (2011) have employed concentration-response qHTS data reported by Xia et al. (2008) by transforming them into quantitative biological descriptors of chemicals. The in vitro data, especially concentration-response qHTS profiles, were shown to improve the results of QSAR modeling of in vivo end points (i.e., rat LD50) as compared with conventional QSAR models that used only chemical structure descriptors. Furthermore, the biological qHTS descriptors data also enhanced the model’s coverage (i.e., the number of compounds within the applicability domain of the model), which is essential for applying models to large and diverse chemical libraries of environmental concern.
Toxicogenomic data provide another example of high-dimensional biological information that may be used for hybrid modeling. A comparative analysis of QSAR- and toxicogenomics data–based models was recently reported (Liu et al., 2011). The authors used gene expression profiles of liver tissue obtained from rats treated with 62 chemicals at different time points (1, 3, and 5 days) to predict rat liver carcinogenicity and concluded that the toxicogenomics data–based models outperformed QSAR. Low et al. (2011) reported a similar outcome when gene expression data–based models (24-h rat liver toxicogenomics profiles of 127 compounds) were compared with conventional QSAR in modeling 28-day hepatotoxicity in the rat. However, the latter study also attempted to combine toxicogenomics data and chemical descriptors for a hybrid approach. Although hybrid models did not afford prediction accuracy higher than that of toxicogenomics data–based models, they identified both chemical features and transcripts predictive of the phenotype, which provided additional insight regarding the mechanistic basis of subchronic liver injury.
Accurate and high-throughput predictive methods are needed to support efficient decision making regarding the efficacy and/or safety of candidate compounds and in tiered screening and assessment schemes. Chemical structure–based predictive methods have been widely applied in the screening and ranking of thousands of chemicals for bioactivity and have demonstrated the promise of in silico approaches for achieving these goals. However, predictive methods based on chemical structure alone have limitations, especially for accurately projecting complex in vivo outcomes. Integration of chemical features and biological screening and/or toxicogenomic data provides important advantages (i.e., improved prediction accuracy, greater chemical space coverage, and interpretability of predictive features) over traditional cheminformatic methods such as QSAR modeling. As shown in Figure 1, novel strategies for integrating chemical structural information with bioactivity data include consensus (Tong et al., 2006), hybrid (Sedykh et al., 2011; Zhu et al., 2008), and hierarchical approaches (Zhu et al., 2009).
Data limitations are currently the major obstacle to advancing these transdisciplinary integration approaches. In particular, the database of toxicity studies is limited to a small number of chemicals. These chemicals are both too few in number and too limited in structural diversity for reliable QSAR analysis. At present, there are only several sufficiently large omics data sets (e.g., Open Toxicogenomics Project Genomics Assisted Toxicity Evaluation System [http://toxico.nibio.go.jp/], Chemical Effects in Biological Systems Database [http://www.niehs.nih.gov/research/resources/databases/cebs/], ToxExpress [http://www.genelogic.com/knowledge-suites/toxexpress-program]) with hundreds of compounds of largely disparate chemotypes selected for phenotypic diversity. As such, most omics data sets are poorly suited for machine learning by QSAR. This deficit supports the more general and recognized need for hazard characterization of a greater number of more varied chemicals, including a larger proportion of the tens of thousands of yet untested chemicals in commerce and the environment. Other outstanding data needs concern the classification of chemicals according to a wider array of hazard traits and susceptibility factors (Guyton et al., 2009). There are ongoing efforts to address these significant data limitations by characterizing multiple in vitro and in vivo toxicological phenotypes (Martin et al., 2009; Padilla et al., 2012; Shukla et al., 2010), including in cells from genetically diverse individuals (Choy et al., 2008; Lock et al., 2012; O'Shea et al., 2011). The large-scale screening efforts of Tox21 (Huang et al., 2011) and other public-private partnerships (Cavero, 2011) hold particular promise for vastly expanding the database of chemicals and endpoints for which experimental data are available.
Additional types of hybrid/hierarchical modeling approaches can be envisioned to address the dependency of hybrid approaches on the availability of experimental data, a current limitation for the wide use of these models in predictive toxicology. In principle, QSAR models could be developed to predict the results of short-term toxicity assays (once enough data for a sufficiently large chemical library is available) because this task is inherently less challenging than modeling complex in vivo endpoints. This represents an intriguing possibility of applying QSAR methods to build predictive models of each of many individual molecular endpoints from which the resulting “predicted in vitro” data can then be used as inputs into models of in vivo toxicity (Martin et al., 2011; Sipes et al., 2011). The application of this strategy could potentially enable a predictive modeling workflow that does not require new experimental data and employs only compound descriptors that can be computed from chemical structure.
Future computational methods should aim to optimize the use of both chemical- and biological-based data domains to achieve the most accurate predictions possible, because each one individually provides limited and complementary insights regarding toxicity. To this end, studies can be designed with both approaches in mind, so as to provide sufficient diversity from both chemical and biological data domains. The goal should be to generate data matrices with broad and dense coverage of chemical structure and bioactivities for hybrid data analysis, i.e., combining chemical and biological data for machine learning. Additional improvements can be achieved by using mechanistically relevant short-term toxicity assays. The resulting integrative approaches have the potential to become a powerful tool for elucidating both relevant biological interactions and structural motifs that together better represent the underlying complex mechanisms by which toxic effects of chemicals develop. Systematic investigation of genetic and other determinants of chemical toxicity can also be envisioned. These approaches can, in turn, support applications in the design of new products and chemical processes as well as in the evaluation of in-use chemicals and environmental contaminants, based on comprehensive and integrative characterization by both chemical structural features and the results of multiple and diverse short-term biological assays and/or omics studies.
This work was supported, in part, by grants from the National Institutes of Health (NIH) (R01 ES015241) and U.S. Environmental Protection Agency (EPA) (RD83382501).
The research described in this article has not been subjected to each funding agency's peer review and policy review and therefore does not necessarily reflect their views and no official endorsement should be inferred. The authors declare no competing financial interests.