The percentage of substances labeled as ocular irritants based on three different classification strategies was compared. The current sequential testing strategy used to assign an FHSA classification is denoted as Strategy 1. Strategy 2 represents a minimum threshold of ≥1/3 (33%) positive animals. Strategy 3 represents a minimum threshold of ≥2/3 (67%) positive animals.

In order to compare the frequency with which each strategy would identify substances as ocular irritants, a number of different underlying population positive response rates were examined. This population positive response rate, denoted by p, is the overall likelihood that an animal will show a positive response for a given substance. Importantly, it is a “population” response rate, not the response rate observed in a given sample of 3 to 6 animals. However, for a specified value of p, it is possible to compute the likelihood of observing various responses in a given sample using binomial probabilities. This is illustrated in for a general p, and for p=20% and p=60% to provide specific examples. For example, for a substance with an underlying positive response rate of p=60%, the likelihood is 0.311 (31.1%) that there will be exactly 4 positive animals in a sample of 6 animals.

| **Table 4**Probability of observing 0 to 6 positive animals in a sample of n=3 or n=6 for various population positive response rates (p) assuming a binomial model |

presents the likelihood of classifying a substance as an ocular irritant for various underlying values of p. However, it does not show whether or not this classification is “correct” because this would require knowledge of the underlying positive response rate that differentiates irritants from nonirritants. As indicated in , the weakest possible response that is considered positive by the current sequential testing strategy is 22% (4/18), while a response of 17% (1/6 or 3/18) is considered negative. Therefore, it could be argued that the threshold positive response rate for considering a substance as an irritant for the current FHSA requirements should logically lie between 17% and 22%, perhaps 20%. However, this conclusion is complicated by the fact that an observed response rate of 28% (5/18) may occur and result in a chemical to not be labeled as an irritant (see ). Because the underlying positive response rates in a population that are characteristic of an irritant or a nonirritant are not definitively known, a range of different underlying positive response rates were compared () and presented graphically in .

| **Table 5**Percentage of substances labeled as ocular irritants based on various population positive response rates (p) for the three strategies |

For purposes of illustration, consider p=20%. summarizes all the possible ways in which Strategy 1 could lead to a negative classification for a substance with a 20% population positive response rate. The probabilities in are derived from . Thus, by subtraction from 1.0, the likelihood of a positive classification for Strategy 1 for p=20% is 1 – 0.796 or 0.204 or 20.4% (see ).

| **Table 6**Probability that Strategy 1 will result in a negative classification for p=20% |

These calculations are much simpler for Strategies 2 and 3. The likelihood of a positive classification using Strategy 2, assuming p=20%, is just the likelihood of observing 1/3, 2/3, or 3/3 positive responses, which using the probabilities in , are 0.384 + 0.096 + 0.008 = 0.488 or 48.8% (see ). For Strategy 3 and p=20%, the likelihood of a positive classification is the sum of the likelihood of observing 2/3 or 3/3 positive responses, which is 0.096 + 0.008 = 0.104 or 10.4% (see ).

Even though it uses fewer animals, Strategy 2 is more powerful than current FHSA requirements for detecting positive response rates of up to 40% and has approximately the same power for response rates of 50% and greater (). Strategy 3 will identify far fewer irritants than Strategy 2 for underlying positive response rates of 80% and fewer. Strategy 3 considers a single positive response (1/3) to not indicate an irritant response, and Strategy 3 has lower power than current FHSA requirements for underlying positive response rates of 20% to 80%.

The previous calculations were based on a variety of underlying positive response rates without consideration of whether or not they reflect the positive response rates seen in practice. Rather than assuming that each irritant and nonirritant has its own unique (and unknown) underlying positive response rate, a potentially useful approach is to derive a mathematical model that accurately describes the observed distribution of positive responses seen for a large database of test substances. If a definitive structure can be imposed upon the data (and if the model fits the data), then the model parameters can be used to estimate over- and underprediction rates. With this in mind, a NICEATM database of 481 rabbit eye test studies using 6 animals each was analyzed. This database includes a wide range of chemical and product classes and represents the types of test substances typically evaluated in ocular safety testing (see and ). Chemical classes were assigned to each substance using a standard classification scheme based on the National Library of Medicine Medical Subject Headings (MeSH®) classification system. If not assigned in the study report, the product class was sought from other sources, including the National Library of Medicine’s ChemIDplus® database.

| **Table 7**Chemical classes in the NICEATM database |

| **Table 8**Product classes in the NICEATM database |

To calculate the estimated over- and underprediction rates for the three strategies using the NICEATM database, the first step was to find a model that fits the observed outcomes (see ), some of which are irritants and some of which are nonirritants. We used a model that assumed a mixture of three binomial distributions, because it is unlikely that every irritant has exactly the same likelihood of producing a positive response in an animal. We assumed that the irritants could be categorized into two groups: Type I irritants (high underlying positive response rate) and Type II irritants (smaller underlying positive response rate).

| **Table 9**Goodness of fit for a database of 481 test results using a mixture of three binomial distributions |

From the observed distribution of positive animals in a 6-animal test, five key parameters were estimated: the underlying positive response rates for nonirritants and Type I and Type II irritants, and the percentage of Type I and Type II irritants in the database (the percentage of nonirritants in the database can then be calculated by subtraction from 100%). The following parameter estimates provided the best fit to our database:

Model parameter estimates for the NICEATM database:

- Type I irritants: Underlying positive response rate = 97.8%
- Type II irritants: Underlying positive response rate = 50.0%
- Nonirritants: Underlying positive response rate = 1.7%
- Percentage of Type I irritants in the sample: 54% or 260 substances
- Percentage of Type II irritants in the sample: 12.9% or 62 substances
- Percentage of nonirritants in the sample: 33.1% or 159 substances

Given this excellent fit to the data as indicated in , we calculated the percentage of substances that would be labeled as ocular irritants using each of the three strategies (see ). The likelihood that a Type I irritant would be labeled as an ocular irritant is close to 100% for all three strategies. The likelihood that a Type II irritant would be labeled as an ocular irritant is approximately 88% for Strategies 1 and 2 but 50% for Strategy 3. The likelihood of labeling a nonirritant as an ocular irritant is 0% for Strategy 1, 5.0% for Strategy 2, and 0.1% for Strategy 3 ().

| **Table 10**Percentage of substances labeled as ocular irritants based on estimated underlying positive response rates for three strategies: three binomial distributions |

Based on these outcomes, the underlying over- and underprediction rates associated with this model were then calculated. All three strategies have a very low underprediction rate for Type I irritants. However, for Type II irritants, Strategies 1 and 2 have underprediction rates of approximately 12%, while Strategy 3 has a 50% underprediction rate. For nonirritants, Strategies 1 and 3 have very low overprediction rates, while the overprediction rate for Strategy 2 is 5% (see ).

| **Table 11**Percentage of substances that would be over- and underpredicted for the three strategies |

It is important to note that this approach is similar to the approach used by

Springer et al. (1993) except for the fact that we assumed two different underlying positive response rates for irritants, whereas Springer et al. used only one (i.e., they assumed that every irritant has exactly the same likelihood of producing a positive response in an animal). Based on the distribution of positive animals in a 6-animal test in the NICEATM database, the use of two different underlying positive response rates for irritants provided a much better fit to the data.