In this study we used data from the ToxCast Phase I chemical library, containing data for 309 unique chemical structures (U.S. EPA 2012f). Most of these chemicals are either current or former active ingredients in food-use pesticides that were designed to be bioactive, or they are industrial chemicals that are environmentally relevant. Details of the chemical library were reported by Judson et al. (2009)
. Data on an additional 23 reference chemicals were included that were tested in a separate study (Judson et al. 2010
), 17 of which were not in the ToxCast Phase I library. CAS registry numbers (CASRN) for the ToxCast Phase 1 chemicals and the additional 17 chemicals are available online in Supplemental_File_1.csv (Rotroff et al. 2012).
Guideline and non-guideline endocrine assays.
Data from guideline endocrine-related in vitro
and in vivo
studies were extracted from EDSP Tier 1 validation reports from the U.S. EPA EDSP web site (U.S. EPA 2012a). Non-guideline studies were obtained from open literature by querying PubMed (http://www.ncbi.nlm.nih.gov/pubmed
) and Google Scholar (http://scholar.google.com/
) using the following terms: (any chemical name or CASRN in the 309) AND (“in vitro”
OR “in vivo”
) AND (“estrogen” OR “androgen” OR “uterotrophic” OR “Hershberger” OR “steroidogenesis” OR “thyroid hormone”). The automated search found a wide variety of studies representing 2,113 individual studies. The list of studies was manually curated to remove studies that did not contain data usable for the current analysis, leaving 248 unique studies (e.g., studies of mixtures without testing compounds individually, studies that mentioned the chemical but did not test it in a bioassay, studies measuring bioaccumulation). Studies that identified their methods as following the Organisation for Economic Co-operation and Development (OECD) guidelines (Kanno et al. 2001
; OECD 1999
) or EDSP protocols were grouped together with EDSP T1S data for the guideline analysis. When available, PubMed identifiers (PMID) were used as unique annotations for each report. For the few instances when no PMID was available or for each EDSP T1S validation report, a unique identifying number was generated. The citation information for all documents used in the analysis is available online in Supplemental_File_2.txt (Rotroff et al. 2012).
Guideline endocrine-related assays gathered from EDSP validation reports and OECD guideline studies were categorized according to whether they tested estrogen-, androgen-, steroidogenesis-, or thyroid-related MOAs (guideline-E, guideline-A, guideline-S, guideline-T, respectively). Additional information captured included study type (e.g., amphibian metamorphosis, reporter gene), assay type (e.g., serum levels, organ weight), species, strain, cell type, target, and whether or not it was an EDSP/OECD guideline study. Chemical potency [e.g., concentration at half-maximum activity (AC50), lowest effective concentration] for a given end point was captured as it was represented in the study report along with the maximum concentration/dose tested. In addition, agonist or antagonist responses were noted when applicable. Data from guideline and non-guideline studies were dichotomized as either active if a response was observed, or inactive if no response was observed. If a study investigated multiple end points for a given endocrine MOA and produced at least one statistically significant end point, then that study–chemical–MOA combination was considered active. Activity/inactivity was determined based on the presence of a statistically significant response or was based on the study author’s conclusion. Data were further annotated as having a hit value of either 1 or 0 for active and inactive, respectively. We combined all guideline and non-guideline literature studies to have a single hit value for each study–chemical–MOA combination. Data that were conflicting or otherwise unclear were included in the data table but annotated as such, and removed from analyses. The data obtained from guideline endocrine-related studies and other non-guideline literature reports are available online in Supplemental_File_3.csv (Rotroff et al. 2012).
ToxCast in vitro assays. HTS competitive binding, enzyme inhibition, and reporter gene assays representing estrogen-, androgen-, steroidogenesis-, or thyroid-related end points (HTS-E, HTS-A, HTS-S, HTS-T, respectively) were selected as a subset of the > 500 HTS assays generated by the ToxCast program (ToxCastDB v.17; U.S. EPA 2012e) [see Supplemental_File_1.csv (Rotroff et al. 2012)]. The details and a description of each assay are reported in .
Summary of endocrine-related HTS assays.
For chemicals that produced a statistically significant and concentration-dependent response in a given assay, the AC50
was recorded. The criteria for determining the activity of a compound are assay platform dependent [see Supplemental Material, Appendix A, for further details (http://dx.doi.org/10.1289/ehp.1205065
)]. The data were then dichotomized so that if an AC50
was present for a given chemical end point concentration, a 1 was reported; if no response was observed, a 0 was reported. Chemicals tested in triplicate for quality control purposes were designated 1 or 0 on a majority basis. Chemicals that were run in duplicate with at least one sample producing an AC50
were designated as a 1. Experimental methods for each assay used are provided in Supplemental Material, Appendix A (http://dx.doi.org/10.1289/ehp.1205065
We performed an iterative, balanced optimization analysis to determine the ability of ToxCast HTS assays to correctly classify the results of guideline endocrine-related assays while maintaining balance between sensitivity and specificity. The process for this analysis is illustrated in . Because each HTS endocrine MOA may have multiple ToxCast HTS assays, we used disjunctive logic employing varied weight-of-evidence thresholds to determine optimal predictive performance. This model tested variable thresholds for the HTS ToxCast assay results represented as unweighted binary data, while the guideline or non-guideline endocrine-related assay results remained static. Initially, the model began with a threshold criterion of one positive ToxCast HTS assay out of the total number of ToxCast HTS assays for a chemical to be considered to perturb a given MOA. Once calculated, the model was then re-run with increasing increments of one assay until all ToxCast HTS assays for a given endocrine MOA were required to be positive for a chemical to be considered to perturb the given MOA. As the threshold for a positive call was increased, a larger weight of evidence was required for a chemical to be considered a “hit” for perturbing the given endocrine MOA. An exception was made for guideline pubertal studies and the ToxCast NVS_NR_hAR assay. Guideline pubertal studies test for effects that can arise through multiple different endocrine-related pathways. For this reason, if a chemical was considered positive in the pubertal assay and the result conflicted with other guideline studies (e.g., receptor binding, reporter gene), the pubertal assay was not included in the weight of evidence. The ToxCast NVS_NR_hAR assay is a human androgen receptor binding assay in the LNCaP prostatic cell line. The androgen receptor in this cell line is known to bind to steroid hormones other than androgens (Veldscholte et al. 1992
). For this reason, if a compound was negative in all other HTS-A assays, the result for the NVS_NR_hAR assay was not included in the weight-of-evidence.
Figure 2 Illustration of the balanced optimization model used to analyze predictive capacity of endocrine-related ToxCast assays. Multiple assays and study reports were available for each chemical–MOA combination. (A) Snapshot of a step in this modeling/optimization (more ...)
For a specific set of criteria across all overlapping chemicals, we calculated sensitivity, specificity, and balanced accuracy (BA) as measures of model performance (). The guideline analysis was performed comparing ToxCast HTS assays and guideline endocrine assays gathered from EDSP validation reports and OECD guideline studies. We also conducted a separate non-guideline analysis comparing ToxCast HTS assays with assays from non-guideline studies. Many of the EDSP/OECD guideline studies and those reported in non-guideline literature used multiple studies/assays for each chemical–MOA combination. Because separate studies are not always in agreement relative to a chemical–MOA perturbation, the model was run using two scenarios: a) Any positive report for a chemical resulted in a positive call for the chemical–MOA combination; or b) > 50% (threshold > 0.50) of guideline or non-guideline endocrine-related studies or assays must report the chemical to be active for a given endocrine MOA.
For each threshold criteria the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) were calculated. A TP was any chemical determined to be positive in the ToxCast HTS assays and was also positive in guideline endocrine reports. An FP was positive in ToxCast but reported as negative in the guideline endocrine reports. If a chemical was determined to be negative in the ToxCast HTS assays and positive in the guideline endocrine reports, it was recorded as an FN. Last, a TN was any chemical negative in the ToxCast HTS assays and negative in the guideline endocrine reports. At each threshold combination, all of the available chemicals were classified as TP, FP, TN, or FN and were used to calculate sensitivity, specificity, and BA as a measure of model performance.
Statistical analysis. To identify statistically significant BA values, we performed a permutation test. The test randomized which ToxCast assays were associated with guideline endocrine studies or biomedical literature for each endocrine MOA in order to determine whether or not a randomly chosen set of assays from the > 500 ToxCast end points would likely produce a similar association. The BA calculation based on random assay associations was performed using the same number of ToxCast assays as the model and with the same threshold criteria. Assays were permuted 10,000 times to build the random BA population distribution, and the percentile where the model BA fell among this distribution was calculated to provide a p-value. A p-value of < 0.01 was considered statistically significant. The distributions developed from the permutation tests were used to define the confidence intervals in and .
Figure 3 Forest plot illustrating the performance—as measured by sensitivity, specificity, and BA—of ToxCast endocrine-related assays for predicting outcomes captured in EDSP/OECD guideline studies. Symbols represent the optimal BA obtained across (more ...)
Figure 4 Forest plot illustrating the performance—as measured by sensitivity, specificity, and BA—of ToxCast endocrine-related assays for predicting outcomes captured in non-guideline endocrine studies. Symbols represent the optimal BA obtained (more ...)