Quantitative high-throughput screening (qHTS) assays are increasingly being used to inform chemical hazard identification. Hundreds of chemicals have been tested in dozens of cell lines across extensive concentration ranges by the National Toxicology Program in collaboration with the National Institutes of Health Chemical Genomics Center.
Our goal was to test a hypothesis that dose–response data points of the qHTS assays can serve as biological descriptors of assayed chemicals and, when combined with conventional chemical descriptors, improve the accuracy of quantitative structure–activity relationship (QSAR) models applied to prediction of in vivo toxicity end points.
We obtained cell viability qHTS concentration–response data for 1,408 substances assayed in 13 cell lines from PubChem; for a subset of these compounds, rodent acute toxicity half-maximal lethal dose (LD50) data were also available. We used the k nearest neighbor classification and random forest QSAR methods to model LD50 data using chemical descriptors either alone (conventional models) or combined with biological descriptors derived from the concentration–response qHTS data (hybrid models). Critical to our approach was the use of a novel noise-filtering algorithm to treat qHTS data.
Both the external classification accuracy and coverage (i.e., fraction of compounds in the external set that fall within the applicability domain) of the hybrid QSAR models were superior to conventional models.
Concentration–response qHTS data may serve as informative biological descriptors of molecules that, when combined with conventional chemical descriptors, may considerably improve the accuracy and utility of computational approaches for predicting in vivo animal toxicity end points.