|Home | About | Journals | Submit | Contact Us | Français|
U.S. Federal Hazardous Substances Act (FHSA) regulations specify eye safety testing procedures and hazard classification criteria for substances regulated by the U.S. Consumer Product Safety Commission (CPSC). Current regulations require up to three sequential 6-animal tests. Testing consistent with the Organisation for Economic Cooperation and Development (OECD) test guideline for eye irritation/corrosion, which specifies 3 animals, can also be submitted to U.S. agencies. However, current FHSA regulations do not provide criteria to classify results from 3-animal tests. An analysis was conducted to determine criteria using results from 3-animal tests that would provide equivalent labeling to FHSA regulations. The frequency that FHSA requirements identify substances as ocular irritants was compared with the frequency that a criterion of either ≥1/3 or ≥2/3 positive animals would identify these substances. A database of rabbit eye tests was also used to estimate over- and underprediction rates for each criterion. In each instance, a criterion of ≥1/3 positive animals more closely matched the expected outcome based on FHSA requirements, while a criterion of ≥2/3 positive animals identified far fewer irritants. Using a classification criterion of ≥1/3 positive animals provided equivalent or greater eye hazard labeling as current FHSA requirements, while using 50–83% fewer animals.
Each year, approximately 2 million eye injuries occur in the U.S. (McGwin et al. 2006a). Of these, more than 40,000 result in permanent visual impairment. Household cleaning chemicals and other chemical products are the leading cause of consumer product-related eye injuries in children under age 10 (McGwin et al. 2006b). In order to provide warnings to consumers and workers of the potential for chemicals and products to cause eye injuries, regulatory authorities require ocular safety testing to determine if substances may cause temporary or permanent eye damage. Testing results are then used for hazard classification and labeling of eye injury potential as required by appropriate national and/or international hazard classification systems. If classified as an eye hazard, hazard labeling of the chemical or product is required to warn users of the potential to cause temporary or permanent eye injuries, to provide the safety precautions necessary to avoid injuries, to provide the immediate first-aid procedures that should be followed in case of an accidental exposure, and to provide guidance on whether medical care should be sought.
The U.S. Federal Hazardous Substances Act (FHSA1) “requires that certain hazardous household products (“hazardous substances”) bear cautionary labeling to alert consumers to the potential hazards that those products present and to inform them of the measures they need to protect themselves from those hazards” (see http://www.cpsc.gov/businfo/fhsa.html). The U.S. Consumer Product Safety Commission (CPSC) issues regulations implementing the FHSA. The regulations for hazardous substances under the FHSA are found in Title 16 Part 1500 of the U.S. Code of Federal Regulations (16 CFR 1500 [CPSC 2010]). Current U.S ocular hazard classification regulations to implement FHSA labeling requirements for these products are provided in the Test for Eye Irritants (16 CFR 1500.42 [CPSC 2010]). This test provides criteria and procedures for identifying ocular hazards based on rabbit eye test results.
Current FHSA regulations require 6 animals per test and may require up to three sequential tests for each substance, thereby requiring 6, 12, or 18 animals to reach a hazard decision. The requirement for second and third sequential tests is based on the number of positive responses in the previous test. In 2002, the Organisation for Economic Co-operation and Development (OECD) adopted U.S. proposed revisions to Test Guideline 405: Acute Eye Irritation/Corrosion (OECD 2002) to reduce the maximum number of required animals from 6 to 3. Testing conducted in accordance with the OECD test guideline can be used to meet CPSC labeling requirements. However, current FHSA regulations do not provide criteria to classify results from a 3-animal test. Therefore, an analysis was conducted to determine classification criteria based on results from a 3-animal test that would provide hazard classification equivalent to that provided by current FHSA regulations, which require the use of 6 to 18 animals.
The National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM) and the Interagency Coordinating Committee on the Validation of Alternative Methods (ICCVAM) used the results from this analysis to develop recommendations for updating the CPSC Test for Eye Irritants (CPSC 2010) to require a maximum of 3 animals per test substance, which would be consistent with current ocular safety testing guidelines for the U.S. Environmental Protection Agency (EPA 1998) and the OECD (OECD 2002). ICCVAM recommendations are provided to Federal agencies to assist them in meeting Federal laws that require agencies, before adopting new alternative test methods, to determine that the test method will generate data in an amount and of a scientific value that is at least equivalent to the data generated from existing tests for hazard identification or risk assessment purposes (PHS 2000).
The testing requirements necessary to determine the ocular hazard potential for substances regulated under the FHSA (FHSA 2008) are provided in 16 CFR 1500.42 (CPSC 2010) (see Table 1). Testing is conducted using an initial group of 6 albino rabbits, and 0.1 mL or 0.1 grams of the test substance is placed in the conjunctival sac of one eye with the contralateral eye serving as a negative or solvent control. Observations and severity scores are recorded at 24, 48, and 72 hours after test substance administration for four types of ocular injuries: corneal ulceration/opacity, conjunctival redness, conjunctival swelling, and iritis (see Table 2). Positive responses for individual animals are based on meeting or exceeding the minimum severity criteria for any one of the four types of eye injuries at any of the three time points. Criteria based on the number of positive animals are provided for each sequential test as to whether the hazard test result is positive, negative, or if a second or third test is required (Table 1).
The U.S. proposed revisions to OECD Test Guideline 405: Acute Eye Irritation/Corrosion (OECD 1987) to reduce the maximum number of required animals from 6 to 3 (deSilva et al. 1997; OECD 1999; Springer et al. 1993). The revised Test Guideline 405 was adopted in 2002 (OECD 2002). In accordance with the OECD Mutual Acceptance of Data Treaty (OECD 1981), U.S. agencies accept test data for review generated in accordance with OECD test guidelines.
The Animal Welfare Act (2010) requires that only the minimum number of animals necessary to obtain scientifically valid results be used, and the Public Health Service requires that a rationale for the appropriateness of the number of animals used be provided to and approved by the Institutional Animal Care and Use Committee (PHS 2002). In light of these policies and regulations, it is expected that most in vivo ocular safety testing would adhere to the 3-animal procedure described in the OECD and EPA test guidelines (OECD 2002; EPA 1998). However, current FHSA regulations do not provide criteria to classify results from a 3-animal test. Therefore, an analysis was conducted to determine classification criteria based on results from a 3-animal test that would provide hazard classification equivalent to that provided by current FHSA regulations that require the use of 6 to 18 animals.
The minimum number of animals that would be required under the FHSA sequential testing strategy to assign a definitive test classification as positive or negative was evaluated for each of the possible test outcomes (Table 3). The minimum percentage of positive animal responses that can result in a positive FHSA hazard classification is 22% (2/6+1/6+1/6 or 4/18). The maximum percentage of positive animal responses that can result in a negative FHSA hazard classification is 17% (1/6) to 28% (3/6+2/6+0/6 or 5/18) (Table 3). Ideally, a classification system should not produce internal inconsistencies, where the percentage of positive animal responses that can result in an irritant or not labeled hazard classification overlap.
The percentage of substances labeled as ocular irritants based on three different classification strategies was compared. The current sequential testing strategy used to assign an FHSA classification is denoted as Strategy 1. Strategy 2 represents a minimum threshold of ≥1/3 (33%) positive animals. Strategy 3 represents a minimum threshold of ≥2/3 (67%) positive animals.
In order to compare the frequency with which each strategy would identify substances as ocular irritants, a number of different underlying population positive response rates were examined. This population positive response rate, denoted by p, is the overall likelihood that an animal will show a positive response for a given substance. Importantly, it is a “population” response rate, not the response rate observed in a given sample of 3 to 6 animals. However, for a specified value of p, it is possible to compute the likelihood of observing various responses in a given sample using binomial probabilities. This is illustrated in Table 4 for a general p, and for p=20% and p=60% to provide specific examples. For example, for a substance with an underlying positive response rate of p=60%, the likelihood is 0.311 (31.1%) that there will be exactly 4 positive animals in a sample of 6 animals.
Table 5 presents the likelihood of classifying a substance as an ocular irritant for various underlying values of p. However, it does not show whether or not this classification is “correct” because this would require knowledge of the underlying positive response rate that differentiates irritants from nonirritants. As indicated in Table 3, the weakest possible response that is considered positive by the current sequential testing strategy is 22% (4/18), while a response of 17% (1/6 or 3/18) is considered negative. Therefore, it could be argued that the threshold positive response rate for considering a substance as an irritant for the current FHSA requirements should logically lie between 17% and 22%, perhaps 20%. However, this conclusion is complicated by the fact that an observed response rate of 28% (5/18) may occur and result in a chemical to not be labeled as an irritant (see Table 3). Because the underlying positive response rates in a population that are characteristic of an irritant or a nonirritant are not definitively known, a range of different underlying positive response rates were compared (Table 5) and presented graphically in Figure 1.
For purposes of illustration, consider p=20%. Table 6 summarizes all the possible ways in which Strategy 1 could lead to a negative classification for a substance with a 20% population positive response rate. The probabilities in Table 6 are derived from Table 4. Thus, by subtraction from 1.0, the likelihood of a positive classification for Strategy 1 for p=20% is 1 – 0.796 or 0.204 or 20.4% (see Table 5).
These calculations are much simpler for Strategies 2 and 3. The likelihood of a positive classification using Strategy 2, assuming p=20%, is just the likelihood of observing 1/3, 2/3, or 3/3 positive responses, which using the probabilities in Table 4, are 0.384 + 0.096 + 0.008 = 0.488 or 48.8% (see Table 5). For Strategy 3 and p=20%, the likelihood of a positive classification is the sum of the likelihood of observing 2/3 or 3/3 positive responses, which is 0.096 + 0.008 = 0.104 or 10.4% (see Table 5).
Even though it uses fewer animals, Strategy 2 is more powerful than current FHSA requirements for detecting positive response rates of up to 40% and has approximately the same power for response rates of 50% and greater (Figure 1). Strategy 3 will identify far fewer irritants than Strategy 2 for underlying positive response rates of 80% and fewer. Strategy 3 considers a single positive response (1/3) to not indicate an irritant response, and Strategy 3 has lower power than current FHSA requirements for underlying positive response rates of 20% to 80%.
The previous calculations were based on a variety of underlying positive response rates without consideration of whether or not they reflect the positive response rates seen in practice. Rather than assuming that each irritant and nonirritant has its own unique (and unknown) underlying positive response rate, a potentially useful approach is to derive a mathematical model that accurately describes the observed distribution of positive responses seen for a large database of test substances. If a definitive structure can be imposed upon the data (and if the model fits the data), then the model parameters can be used to estimate over- and underprediction rates. With this in mind, a NICEATM database of 481 rabbit eye test studies using 6 animals each was analyzed. This database includes a wide range of chemical and product classes and represents the types of test substances typically evaluated in ocular safety testing (see Tables 7 and and8).8). Chemical classes were assigned to each substance using a standard classification scheme based on the National Library of Medicine Medical Subject Headings (MeSH®) classification system. If not assigned in the study report, the product class was sought from other sources, including the National Library of Medicine’s ChemIDplus® database.
To calculate the estimated over- and underprediction rates for the three strategies using the NICEATM database, the first step was to find a model that fits the observed outcomes (see Table 9), some of which are irritants and some of which are nonirritants. We used a model that assumed a mixture of three binomial distributions, because it is unlikely that every irritant has exactly the same likelihood of producing a positive response in an animal. We assumed that the irritants could be categorized into two groups: Type I irritants (high underlying positive response rate) and Type II irritants (smaller underlying positive response rate).
From the observed distribution of positive animals in a 6-animal test, five key parameters were estimated: the underlying positive response rates for nonirritants and Type I and Type II irritants, and the percentage of Type I and Type II irritants in the database (the percentage of nonirritants in the database can then be calculated by subtraction from 100%). The following parameter estimates provided the best fit to our database:
Model parameter estimates for the NICEATM database:
Given this excellent fit to the data as indicated in Table 9, we calculated the percentage of substances that would be labeled as ocular irritants using each of the three strategies (see Table 10). The likelihood that a Type I irritant would be labeled as an ocular irritant is close to 100% for all three strategies. The likelihood that a Type II irritant would be labeled as an ocular irritant is approximately 88% for Strategies 1 and 2 but 50% for Strategy 3. The likelihood of labeling a nonirritant as an ocular irritant is 0% for Strategy 1, 5.0% for Strategy 2, and 0.1% for Strategy 3 (Table 10).
Based on these outcomes, the underlying over- and underprediction rates associated with this model were then calculated. All three strategies have a very low underprediction rate for Type I irritants. However, for Type II irritants, Strategies 1 and 2 have underprediction rates of approximately 12%, while Strategy 3 has a 50% underprediction rate. For nonirritants, Strategies 1 and 3 have very low overprediction rates, while the overprediction rate for Strategy 2 is 5% (see Table 11).
It is important to note that this approach is similar to the approach used by Springer et al. (1993) except for the fact that we assumed two different underlying positive response rates for irritants, whereas Springer et al. used only one (i.e., they assumed that every irritant has exactly the same likelihood of producing a positive response in an animal). Based on the distribution of positive animals in a 6-animal test in the NICEATM database, the use of two different underlying positive response rates for irritants provided a much better fit to the data.
Results from DeSousa et al. (1984) and Talsma et al. (1988) showed that using 3 rabbits per test provided accuracy of up to 94% in predicting a 6-animal test (using subsets of 3 animals). Springer et al. (1993) also conducted analyses to determine if the standard group size of 6 rabbits for ocular safety testing could be reduced in order to use fewer animals and concluded that a 3-animal test and a decision rule requiring at least 2 positive animals to classify a substance as an irritant yielded accuracy of 98%.
As indicated above, the model used by Springer et al. (1993) assumed two mutually exclusive populations: irritants and nonirritants, each population having a single underlying positive response rate estimated from the data. They fit a mixture of two binomial models to each of four different databases, but the only database with a distribution of outcomes that closely matched the NICEATM database of 481 rabbit eye test studies was an EPA database of 48 substances. Springer et al. (1993) reported the following parameter estimates for the EPA database:
Note that the estimated percentage of nonirritants in the EPA database (35%) is very similar to our own estimate (33.1%) for the much larger NICEATM database, but the Springer et al. model does not differentiate between Type I and Type II irritants. As a result, their parameter estimates provided a poor fit to the NICEATM database of 481 studies (Table 12). In fact, we found that their model did not provide a good fit to the EPA data upon which their parameter estimates were based (e.g., predicting only 0.2 3/6 outcomes compared with 3 actually observed, a 15-fold underprediction). This lack of model fit was more apparent using the NICEATM database of 481 substances, which was approximately 10-fold larger than the Springer et al. (1993) EPA database.
The largest database used by Springer et al. (1993) was the 139-substance Marzulli and Ruggles database, but the pattern of response seen in these studies was quite different from that seen in the NICEATM database of 481 studies. Even so, the best-fitting Springer et al. (1993) model showed the same lack of fit problem. For example, ten 3/6 positive responses were observed compared with only 3.1 predicted by the best-fitting Springer et al. (1993) model.
It is important to understand the factors that led to different conclusions in our evaluation, which favored Strategy 2, and that of Springer et al. (1993), which favored Strategy 3. For example, Table 1 in Springer et al. (1993) suggests that Strategy 2 may have an unacceptably high overprediction rate.
The primary reason for the different conclusions is that the EPA 48-substance database was of insufficient size to detect the Type II irritants that were producing positive response rates of approximately 50%. By not taking these irritants into account, the Springer et al. (1993) model underestimated the underprediction rate for Strategy 3, because this strategy does not perform well for detecting positive response rates of approximately 50% (see Table 5).
Another consequence of Springer et al. (1993) ignoring the Type II irritants was a 5-fold overestimation of the positive response rate of nonirritants (8.6% vs. 1.7%). This difference is important because the overprediction rate of Strategy 2 increases substantially as the assumed positive response rate for nonirritants increases (see Table 5). It is the Springer et al. (1993) overestimation of the positive response rate for nonirritants that produced the artificially high overprediction rate for Strategy 2 shown in their Table 1.
The results indicate that using a classification criterion of at least one out of three positive animals in a 3-animal test for the identification of eye hazards will provide the same or greater level of eye hazard labeling as current FHSA requirements, while using 50% to 83% fewer animals. A criterion of at least two out of three positive animals in a 3-animal test will identify far fewer irritants, especially those irritants with a smaller underlying positive response rate. Accordingly, this analysis should facilitate regulatory decisions on classification criteria that will support the adoption of test methods using fewer animals. The analysis is also expected to assist agencies in complying with U.S. laws requiring that, before adopting alternative methods, that they determine that the test method will generate data in an amount and of a scientific value that is at least equivalent to the data generated from existing tests for hazard identification or risk assessment purposes.
This work was supported by the Intramural Research Program of the National Institutes of Health, National Institute of Environmental Health Sciences. ILS staff are supported by NIEHS contract N01-ES 35504. The views expressed in this manuscript do not necessarily represent the official position of any U.S. Federal agency.
1Abbreviations used: CFR, U.S. Code of Federal Regulations; CPSC, U.S. Consumer Product Safety Commission; EPA, U.S. Environmental Protection Agency; FHSA, U.S. Federal Hazardous Substances Act; ICCVAM, Interagency Coordinating Committee on the Validation of Alternative Methods; NICEATM, National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods; OECD, Organisation for Economic Co-operation and Development; PHS, Public Health Service
Conflict of Interest Statement
The authors declare that there are no conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.