Data

Data were obtained on the extent of cytotoxicity induced in 13 different cell types by 1353 compounds ^{11} as part of the joint HTS screening effort of the NCGC and the NTP. Assays were conducted using1536-well plates, with 18 plates per assay. The first two and last two plates served as vehicle control plates (i.e., plates inoculated with cells where dimethyl sulfoxide (DMSO), the vehicle used to solubilize the test compounds, was administered to each well). In the 1^{st} and 17^{th} plates, the wells used for test compounds contained DMSO at the concentration used for the compound samples in other plates, while those wells in the 2^{nd} and 18^{th} plates contained twice that concentration of DMSO (i.e., they were “double-dip” control plates). The 3^{rd} through 16^{th} plates contain increasing concentrations of the test compounds (concentration approximately doubling at each step), obtained through successive dilutions of the maximum concentration of 46 μM. The 16^{th} plate received two dispenses from the highest concentration stock compound plate, thus doubling the DMSO concentration in that well and producing a final compound concentration of 92 μM. The compound concentrations on plates 3–16 were, therefore, 0.59 nM, 2.95 nM, 14.8 nM, 33 nM, 74 nM, 0.165 μM, 0.369 μM, 0.824 μM, 1.84 μM, 4.12 μM, 9.22 μM, 20.6 μM, 46μM, and 92 μM.

Compounds for screening were placed in the 5^{th} through 48^{th} columns of the 32 row by 48 column plate allowing for 1408 total wells (1353 individual compounds where 1298 are tested using a single well and 55 are tested using replicate wells). Three plate formats were used for the first four columns. For most of the assays, the wells of the first two columns contain concentrations of doxorubicin and tamoxifen, respectively, while the wells of the 3^{rd} column contain the vehicle control (DMSO or DMSO double dip as appropriate). The wells of the 4^{th} column contain 100 μM tamoxifen as a positive control. This will be referred to as “Format 1” below. For the HepG2 assay (done in triplicate to test replicability of the assay), the wells of the 3^{rd} column is the positive control and the wells of the 4^{th} contain DMSO at the appropriate concentration. This will be referred to as “Format 2.” In the Jurkat cell assay, doxorubicin is in the 1^{st} and 2^{nd} columns and tamoxifen is in the 3^{rd} and 4^{th} columns of the first 24 rows of the plate. On those plates, vehicle controls are in the 1^{st} through 4^{th} columns of row 25 through 32 and there is no tamoxifen 100 μM positive control. This will be referred to as “Format 3.”

The CellTiter-Glo® luminescent cell viability assay (Promega Corporation, Madison, WI, USA) is a homogeneous method to measure the number of viable cells in culture and was used to screen 13 cell lines exposed to the test substances. The readout from this assay is based on quantitation of intracellular ATP, an indicator of metabolic activity, using the luciferase reaction. Luciferase catalyzes the oxidation of beetle luciferin to oxyluciferin and light in the presence of ATP. The luminescent signal is proportional to the amount of ATP present. The 13 cell lines tested include human cells (HEK 293, HepG2, SH-SY5Y, Jurkat, BJ, HUV-EC-C, MRC-5, SK-N-SH, mesenchymal cells), rat cells (primary renal proximal tubule cells, H-4-II-E) and murine cells (N2a, NIH 3T3). Details described in

^{11} and the normalized data are available for download from PubChem (

http://pubchem.ncbi.nlm.nih.gov).

It is possible that double-dipping of the vehicle control may have an effect on the measured response. Also, there may be contamination of some amount of the study substances in the final single-dip control plate. For these reasons, this concentration-response analysis is carried out using only the first control (plate 1) and the dosed plates (including the double-dip highest concentration, plate 16). This yields a total of 15 plates in the analysis, a control plate followed by 14 plates with different concentrations of the test substances. Each plate has its own control wells and these will be used to define control response for each plate, hence the loss of the other control plates should not significantly alter the findings.

Hill function

The response for each compound is the measured luminescence, which is an indicator of ATP levels, which decreases from control if the chemical reduces cell viability and/or the rate at which cells proliferate. Many authors analyze data of this type using rescaled values (e.g., control response is 0, maximum reduction is −100%), however, we prefer an approach using the original data directly without controlling for minor variances in assay protocol like plate incubation times and detector exposure. Mean concentration-response curves for data relating to complex cellular response following chemical exposure usually follows a sigmoidal form and has been routinely analyzed using the Hill model

^{4,7,9}. Several authors have also incorporated them in the context of HTS

^{2,3}. Hill functions can take on a number of different forms; for these analyses, we used the form:

where

*i* and

*j* refer to the row and column of a plate and define the unique compound in that position,

*l* refers to one plate,

*r*_{0ijl} is the control response for the chemical at (

*i*,

*j*),

*r*_{pijl} is the lowest possible activity at (

*i*,

*j*,

*l*) corresponding to the high-concentration positive control,

*v*_{ij} is the maximum fractional reduction caused by the test compound (restricted so that

*f* is between 0 and −1 for decreasing effects, 0 and 0.1 for increasing effects, which were not as strong as the decreasing effects due to cytotoxicity of the study substances),

*n*_{ij} is the Hill coefficient and governs the shape of the concentration response curve (e.g. a curve with a large value for

*n* is highly sigmoidal), and

*k*_{ij} is the concentration at which 50% of the maximum reduction

occurs (the AC

_{50}). Because of the experimental design, each concentration level corresponds to a plate

*l* but a separate symbol is used for plate vs. concentration to distinguish effects based on concentration-response from those based on plate location.

Examination of the results from control columns (columns 3 and 4) of plates in formats 1 and 2 suggests a row-dependent relationship between the values for the vehicle controls (column 3 in format 1, column 4 in format 2) and the positive controls (the other control column). This occurs in all assays, although the exact relationship varies and in some cases the difference between rows is small. In this analysis, it is assumed that the row-dependent ratio between vehicle controls and positive controls is consistent across all columns of the plate, i.e. that there is no column-dependent effect on the ratio of neutral to positive controls. If that is the case, then the positive control value

*r*_{p} in

Eq.(1) can be treated as a function of the vehicle control value

*r*_{0} and it will not need to be an optimized parameter itself. The pattern observed, with the ratio lowest in the middle, implies that the measured activity for positive controls is highest near the middle of the plate (). In the model, this behavior was approximated with a V-shaped function, giving the following relationship for the ratio

*μ*_{i}=

*r*_{0ijl}/r_{pijl} over all

*j* and

*l*:

where

*m* is a vector of length 32 of the form

*m*=[16,15,14, …, 1,1,2,3, …, 16],

*θ*_{1} and

*θ*_{2} are parameters estimated via simple least squares and

*μ* is a vector of length 32 containing the adjusted ratios [

*μ*_{1},

*μ*_{2},…,

*μ*_{32}]. See the

supplemental material for more information on the calculation of

*μ*. For most of the assays, the ratio between the highest and lowest values of

*μ* is below 1.3, although it does go as high as 2.4 for one assay. With this modification, the Hill function becomes:

If there is no row-dependent relationship in the ratio of the vehicle control to the positive control ratio, *θ*_{2} will be estimated as 0.

In HTS, there is a tendency to have location effects on the plates

^{5}. shows the location effects on one control plate. The observed data values are not randomly distributed around a constant value, but show consistent variation by row and column. Several analytical methods have been proposed to adjust for this effect

^{6,10}, but these are generally done prior to the analysis rather than as part of the formal analysis. In some cases, also, the adjustment does not completely remove location effects. Here, we adjust for plate-location effects by modeling the value for

*r*_{0ijl}. Preliminary analyses suggested that the control response,

*r*_{0ijl}, can be represented as a product of a row effect and a column effect:

where

*α*_{il} is the row effect for row

*i* in plate

*l* and

*γ*_{jl} is the column effect for column

*j* in plate

*l*. The row-dependent effect on the control response represented by

*α*_{il} is distinct from the row-dependent relationship between the vehicle and positive control activity levels represented by

*μ*_{i}. An additive rather than a multiplicative model can also fit the data on the control plates fairly well. The choice between the additive and multiplicative models is somewhat arbitrary. The use of

*r*_{0ijl} in this general form (

4) for the test substances results in a linear dependence amongst the model parameters. Without loss of generality, we force the adjustment to be relative to appropriate control responses by setting

*γ*_{jl} =1 for the column(s) containing vehicle controls. Given this model, a data value x

_{ijl} can be converted to a normalized value y

_{ijl} by the following formula:

A normalized value of 0 corresponds to no response; a value of −1 corresponds to full suppression of activity. Because of noise and measurement error, normalized values can be greater than 0 or less than −1.

Estimation of Model Parameters

The observations are assumed to have normal error:

where

*ε* ~

*N* (0,

*σ*^{2}) (A model using lognormal error instead of normal was also tested but gave significantly worse results). The parameters are constrained as follows: −0.1≤

*v*_{ij}≤1, 0≤

*k*_{ij}≤0.368 mM (4 times the maximum experimental concentration), 0≤

*n*_{ij}≤5, and 0.9≤

*γ*_{jl}≤1.1 for all i, j, and l. The limits on

*α*_{ik} are determined by observations of the data; the lower limit for

*α*_{il} is 0.9 times the lowest control response on plate l and the upper limit is 1.1 times the highest control response. The limits on

*γ*_{jl} are kept narrow so that most of the variation in the control response is contained in the

*α*_{il} term. The upper limit on

*v* is 1 because the maximum possible effect is full suppression of activity (corresponding to

*v*=1). The lower limit on v is somewhat arbitrary, but a limit of −0.1 seems in practice to be enough to account for increasing effects. See below for discussion of how often the optimized parameter values are at the limits of their ranges.

Each analysis involves 15 plates with 14 treatment levels for 1408 chemical wells and 32 untreated control wells yielding a total of 21,600 data values for each run (for the assay with format 3 plates, there are 21,360 data values). There are 32×15=480 *α*_{il} parameters which will be collected into a matrix denoted by *α*=[*α*_{il}]. Using similar notation, there are 44×15=660 *γ*_{jl}’s in *γ*, 32×44=1408 *n*_{ij}’s in *n*, 1408 *v*_{ij}’s and *k*_{ij}’s, and a single value for *σ*. The values of the adjustable parameters (*α, γ, v, k, n*) are found using a MATLAB^{@}(The Mathworks, Inc., Natick, MA, USA 2007) least squares minimization routine (lsqnonlin) to maximize the log-likelihood^{1} of the data. While simultaneous optimization of the entire set of 5365 parameters is possible, it is not practical and a stratified estimation procedure is used, with repetition of the following steps:

- Optimization by individual substance:
*α*, *γ*, and *σ* are kept constant, while v, *k,* and *n* are optimized. With *α* and *γ* constant, the maximum of the likelihood can be obtained through independent estimation of *v*_{ij}, *k*_{ij}, and *n*_{ij} for each test substance. - Optimization by plate, control-response parameters only: parameters other than
*α* and *γ* are kept constant. For each plate *l*, the values of *α*_{il} and γ_{jl} are estimated for all i, j. - Optimization by row, control-response parameters only: parameters other than
*α* are kept constant. For each row i, the values of *α*_{il} are estimated for all *l*. - Optimization by column, control-response parameters only: parameters other than
*γ* are kept constant. For each column j, the values of γ_{jl} are estimated for all *l*. - Optimization by row:
*γ* and *σ* are held constant, while *α*, *v*, *k*, and *n* are optimized. In this case, the maximum of the likelihood can be obtained through independent estimation of *v*_{ij}, *k*_{ij}, *n*_{ij}, and *α*_{il} for each row (i) of test substances for all j and k. - Optimization by column:
*α* and *σ* are held constant, while *γ*, *v*, *k*, and *n* are optimized. In this case, the maximum of the likelihood can be obtained through independent estimation of *v*_{ij}, *k*_{ij}, *n*_{ij}, and *γ*_{jl} for each column (j) of test substances for all i and l. - Reduction of parameter space: The significance of the concentration-response for each compound is tested (see below). If the p value for compound
*ij* is greater than a preset cutoff value (in the results below, the cutoff is 0.05/1408), then *v*_{ij} is set to 0.

The steps are carried out in a specified order with repetition as follows:

- Perform step 5 and step 6 to find a set of starting values for the parameters.
- Perform steps 1–4; if the change in the log-likelihood after all 4 steps is less than a prespecified small value, δ, then go on to the third step; if not, repeat this step.
- Perform steps 5 and 6; if the change in the log-likelihood is less than a prespecified δ, then go on to the next step; if not, return to the second step.
- Do a final optimization for each substance individually (step 1).
- Do step 7. If no new compounds are found to have nonsignificant concentration-response, stop here; otherwise, return to 2).

The order of steps is partly arbitrary, and partly chosen to improve optimization time by having more repetitions of the steps 1–4 (which are the quickest, since they use the fewest optimizable parameters) than of steps 5 and 6. Including step 7 helps to reduce the number of compounds that have significant but weak response. Those compounds may be false positives. Consequences of changing the cutoff value in step 7 are discussed in the

supplemental material. When

*v*_{ij} is fixed at 0, then none of the Hill function parameters (v, k, or n) for (i,j) needs to be optimized. In a least squares problem such as this, the standard deviation

*σ* does not need to be computed during the optimization; instead, it is computed from the sum of squared errors using the final model parameter estimates.

Outliers

An initial optimization revealed some data points that seemed to be outliers; for example, there were cases where all of the observed activity levels were within 10% of the control level, except for one data point at a low concentration with almost no measured luciferase activity. Identifying and removing outliers is complex and does not readily lend itself to simple statistical rules. For this reason, we propose a combination of three activities to identify outliers: a statistical/quantitative rule, visual inspection of model fits, and examination of the normalized values directly.

Computer code was written to identify outliers based on the following criteria.

- The difference between the data point x
_{ijk} and the model fit must be >3.5 σ AND - The absolute value of the normalized value of the data must be greater than 0.3 AND
- If a downward trend is present in the data for y
_{ijk}(the normalized value of x_{ijk}) (i.e., y_{ijk−1}>y_{ijk}>y_{ijk+1}), then: x_{ijk} is not a low outlier unless at least one its neighbors (x_{ijk−1} and x_{ijk+1}) is also a low outlier; it is not a high outlier unless one of its neighbors is also a high outlier.

These criteria were intended to be conservative and not identify too many outliers. A list of possible outliers was produced by the code and these data points were inspected visually in plots of normalized data vs. model fit. Points that seemed to satisfy the outlier criterion because of overall poor quality of fit rather than because they were isolated points of abnormally low or high response were excluded from the list of possible outliers.

Finally, plots of data values for each plate (in the same format as ) were also examined to find possible outliers not revealed by the above procedure. Several outliers were also identified in this way, all of them located in the corners of plates (at or near row 1, column 48, or row 32, column 48). These points had very low measured activity compared to the nearby wells on their plates, and their normalized data values represented very low responses in relation to the responses for the next higher and next lower doses. It is possible that functional form for the plate-location effects more complicated than that given in

equation (4) would be able to normalize these points without treating them as outliers. However, because the measured responses at those points are so low, the error in the normalized data would probably be large. Overall, 126 data points (of 323,760; ~0.06%) were identified as outliers. The estimation procedure above was then redone with the reduced data.

Statistical Testing

A likelihood ratio test ^{1} is used to test each compound for a significant concentration-response trend. For the compound in position *ij*, the null hypothesis is that there is no concentration-response (v_{ij}=0); the alternative hypothesis is that v_{ij}≠0. The likelihoods are calculated using only the data for substance *ij*, using the standard deviation *σ* from the overall fit. The likelihoods for the unrestricted (alternative hypothesis) and restricted (null hypothesis) models are compared using a likelihood ratio test (chi squared with 3 degrees of freedom, corresponding to the 3 removed parameters *v*_{ij}, *k*_{ij} and *n*_{ij}) to produce a p-value. This calculation is also carried out using only the first 14 data points (i.e., excluding the data for the highest concentration) to determine whether the significance of the response is sensitive to the high concentration data point. This procedure for testing trend is an approximation. Since all model parameters for all substances are actually related in the optimization algorithm, an exact calculation of the p-value would require a full optimization of all model parameters after setting *v*_{ij}=0 for a particular substance. Therefore, the calculated p-value is an underestimate of the true p-value and an overestimate of the significance of the trend.

It is important to know whether replication in this HTS assay yields reproducible results. There are two types of replication in the data; 55 test substances are replicated within plates and the HepG2 assay was conducted in triplicate. A likelihood ratio test was used to compare the triplicate data. Under the null hypothesis, replicates were assumed to share equal values for common parameters (v, k, and n). The alternative hypothesis was that the parameter values are different among replicates. For duplicates, the values of *k*, the AC_{50} parameter, were compared.

When using p-values to evaluate results in a large group of tests such as this set of high throughput screening data, the issue of multiple comparisons becomes important. In most of the results discussed below, a “significant” p-value will be taken to be one with p<0.05/1408 (3.5511 × 10^{−5}).

A simulation study was performed to test the performance of the algorithm on simulated data with known parameters. Details of the simulation are given in the

supplemental material. Results from the simulation will be compared to those from the actual data when relevant.