|Home | About | Journals | Submit | Contact Us | Français|
Deriving No Observed Adverse Effect Level (NOAEL) or benchmark dose is important for risk assessment and can be influenced by study design considerations. In order to define the di-(2-ethylhexyl) phthalate (DEHP) dose-response curve for reproductive malformations, we retained more offspring to adulthood to improve detection of these malformations in the reproductive assessment by continuous breeding study design. Sprague-Dawley rats were given a dietary administration of 1.5 (control), 10, 30, 100, 300, 1000, 7500, and 10,000 ppm DEHP. Male pups were evaluated for gross reproductive tract malformations (RTMs) associated with the “phthalate syndrome.” DEHP treatment had minimal effects on P0 males. There was a statistically significant increase in F1 and F2 total RTMs (testis, epididymides, seminal vesicle, and prostate) in the 7500-ppm dose group and F1 10,000-ppm dose group. The 10,000–ppm exposed F1 males did not produce an F2 generation. The NOAEL for F1 and F2 RTM combined data, because in utero exposures were similar, were 100 ppm (4.8 mg/kg/day), which was close to the 5% response benchmark dose lower confidence limit of 142 ppm. The utility of evaluating more pups per litter was examined by generating power curves from a Monte Carlo simulation. These curves indicate a substantial increase in detection rate when three males are evaluated per litter rather than one. A 10% effect across male pups would be detected 5% of the time if one pup per litter was evaluated, but these effects would be detected 66% of the time if three pups per litter were evaluated. Taken together, this study provides a well-defined dose response of DEHP-induced RTMs and demonstrates that retention of more adult F1 and F2 males per litter, animals that were already produced, increases the ability to detect RTMs and presumably other low-incidence phenomena.
In guideline toxicology studies, the design of the study must incorporate adequate power to detect potential adverse effects induced by the test article. Due to concerns about cost and animal use, studies are often limited in numbers of animals used such that calculating the power of detection based upon litter size and variation of the end point in reproduction studies is warranted. In studies that assess reproductive toxicity, which incorporate the production of rodent litters, the evaluation of postweaning end points in the offspring involves the selection of only one male and one female from each litter according to Organization for Economic Co-Operation and Development (OECD; OECD 416) and Environmental Protection Agency (EPA) guidelines (OPPTS 870.3800) for mating. The rest of the litter is removed during standardization (usually at post-natal day [PND] 4) or weaning. Histopathology is performed on animals selected for mating (OECD) or 10 randomly chosen males and females selected for mating (EPA). Because the litter is the statistical unit of analysis for these studies, the number of animals selected for evaluation in adulthood will influence the detection of adverse effects of low incidence (Hotchkiss et al., 2008). In the present study, a reproductive assessment of di-(2-ethylhexyl) phthalate (DEHP) was conducted with seven dose levels and retained extra animals per litter, which would otherwise have been culled, to adulthood in order to better define the dose-response curve.
DEHP is a member of a group of phthalate esters that induce male reproductive tract malformations (RTMs) via a disturbance in androgen signaling at critical periods in in utero sexual differentiation (Parks et al., 2000). These male RTMs include malformations of the testis, epididymis, prostate, seminal vesicles, and external genitalia with common RTMs being epididymal agenesis, undescended testes, small fluid-filled testis, hypoplastic accessory sex organs, and hypospadias. This pattern of malformations in organs that require androgen for their normal development has been described as the phthalate syndrome (Gray and Foster, 2003). These malformations were not observed with DEHP or other phthalates in standard rat developmental toxicity (teratology) study designs (exposure from gestational day [GD] 6 to 15 in the rat) due to lack of exposure throughout the period of sexual differentiation, and offspring evaluations conducted prior to expected delivery of the offspring before the reproductive organs have fully differentiated or developed.
A reproductive assessment by continuous breeding (RACB), a multigenerational study design, was conducted to evaluate the DEHP dose response of male RTMs. The Endocrine Disruptor Screening and Testing Advisory Committee (EDSTAC) and EPA recognized the deficiency in standard teratology study design and proposed the multigeneration reproduction study as the “definitive” Tier II test to evaluate endocrine active chemicals (EACs) (EDSTAC, 1998; EPA, 1998). This Tier II test would confirm or refute data obtained in the Tier I screens and provide essential dose-response information for risk assessment. However, it has been argued that the single male and female F1 animals evaluated per litter at adulthood, when RTMs are the most apparent, in the conventional, EPA regulatory multigeneration study design unnecessarily restricts the power to detect and characterize effects produced by EACs (Foster, 2002; Foster and McIntyre, 2002; McIntyre et al., 2000). Low-powered studies hinder determination of an accurate no observed adverse effect level (NOAEL) and dose-response curve for chemicals (Hotchkiss et al., 2008), and poor public health decisions can result.
The present study was designed to evaluate the postnatal effects of DEHP on reproductive development and define the dose-response relationship for the induction of RTMs. This study used more than the traditional three dose groups plus control and retained a greater number of F1 and F2 male rats until adulthood to better assess the shape of the dose-response curve and define the DEHP NOAEL for male RTMs. In conducting this study under Good Laboratory Practices (GLP), similar to guideline studies, we hope to demonstrate that it is feasible to retain extra animals per litter in a complex multigenerational design. This approach did not constitute the generation of any extra animals because the animals normally produced were simply not culled at PND 4 or weaning but were maintained alongside the animals selected for breeding. Furthermore, we generated power curves to quantify detection rates based upon the effect size within the litter and the number of pups examined (one up to six per sex per litter), which further illustrates the importance of retaining extra animals to detect effects of low incidence within a litter.
The Continuous Breeding Protocol (Chapin and Sloane, 1997), a modified multigeneration reproduction study, was used to evaluate the reproductive toxicity of DEHP. Sprague-Dawley rats (Charles River Laboratories, Portage, MI) were delivered to TherImmune Research Corporation (Gaithersburg, MD) and 7 days after receipt were weighed and randomly assigned to treatment groups. Treatment groups were fed a diet with DEHP (99.8% pure by gas chromatography; Aldrich Chemical Company, Milwaukee, WI) concentrations of 1.5 (control), 10, 30, 100, 300, 1000, and 7500 ppm (anticipated to be equivalent to ~0.1, 0.5, 1.5, 5, 15, 50, and 400 mg DEHP/kg/day). DEHP is a ubiquitous environmental contaminant and was present at trace levels even in our control NTP-07 diet. Additional groups, 1.5 ppm (control) and 10,000 ppm (anticipated to be > 500 mg DEHP/kg/day), were added after the start of the study, and these groups followed the same design.
Exposure started with the P0 generation and was continuous through the F1 and F2 generations. P0 males and females, ~5 weeks of age, were given the DEHP diet for 6 weeks prior to mating (premating exposure), then they were cohabitated (17 pairs per group) for 9 weeks with the same diet in order to produce three litters (F1a, F1b, and F1c). The first two litters produced (F1a and F1b) during the cohabitation period were counted and weighed at PND 1. These litters were euthanized by sodium pentabarbitol overdose and discarded without necropsy on PND 1. The third litter born (F1c) was reared (without culling) by the dam until weaning on PND 21. On PND 16, up to six males and two females were randomly selected from each litter to be maintained to adulthood for histological evaluations. One to two males in the litter were used to breed for the production of the F2 generation while avoiding sibling matings. The additional nonbreeding males were maintained until necropsy as sexually mature adults. The non-mated males were necropsied ~2 weeks prior to the necropsy of their sibling mated male. The same methods for producing and evaluating the F1 generation were used for producing and evaluating the F2 generation such that three litters were produced in each generation and the third litter was evaluated postweaning. The F2c litters were selected for breeding and necropsy (same as the F1c), whereas one to two males of the F3c litter were terminated at PND 63/64. The F3c males were not evaluated for RTMs.
Animals from all control litters (n = 14 litters for F1 and n = 10 for F2) and 8–17 litters per dose level were necropsied. The F1c non-mated males were necropsied on PND 194 ± 12 or PND 249 (the added 1.5-ppm and 10,000-ppm groups) and breeding males and females on PND 215–217 ± 12 or PND 263 (the added 1.5-ppm and 10,000-ppm groups). The F2c non-mated males were necropsied on PND 241–243 ± 10 and the mated males and females were necropsied on PND 248–250 ± 10. During necropsy, RTMs observed by gross evaluation were recorded for testes, epididymides, prostate, and seminal vesicles.
Pregnancy and litter end points (e.g., litter size, pup weights) were compared between the 1.5-ppm (control) and dose groups. Quantitative data were analyzed using the nonparametric multiple comparisons procedure of Dunn (1964) or Shirley (1977), as modified by Williams (1986). Shirley’s test is designed to detect treatment-related differences when the response to treatment consistently increased (or decreased) with increasing dose, which was verified by Jonckheere's test (Jonckheere, 1954). If the p value from Jonckheere’s trend test was < 0.01, Shirley’s test was used; otherwise, Dunn’s test was used.
Male RTMs were recorded as present or absent for each F1c and F2c litter according to whether any of the malformations were observed in the litter. RTM incidences, with the litter as the unit of analysis, were analyzed using Fisher’s exact test to determine statistically significant differences between control and each treated group. Evaluations were conducted separately on the F1c and F2c litters, as well as on the pooled litters from the F1c and F2c groups (since their in utero exposure period was identical). One-tailed p values < 0.05 were considered significant. Benchmark dose (BMD) and benchmark dose lower confidence limit (BMDL) were calculated using EPA’s BMDS 2.1.1 software (Build 11-6-09), after evaluating the fit of the various models to the observed data. BMDs were calculated for a 5% response using a dichotomous Weibull model with extra risk of RTM incidence in the F1c, F2c, and combined F1c + F2c litters.
Power curves at α = 0.05 were generated for Fisher's exact test via Monte Carlo simulations. Computer-generated samples of 1, 2, 3, 4, 5, or 6 pups per litter for 20 “treated” litters were compared with a “control” group having no abnormalities in 20 litters, using a one-sided Fisher's exact test. Each treated pup had the same probability of having an abnormality, with probabilities ranging between 0.01 and 0.90. A litter was positive if at least one pup had the abnormality and negative if none of the pups had the abnormality. A total of 5000 samples of 20 treated and 20 control litters were generated for each combination of number of pups per litter and probability of an abnormality. Power was estimated as the proportion of samples for which the one-sided Fisher's exact test p value was ≤ 0.05. Each power curve shows the probability of getting a significant result at the 0.05 level by retaining and examining the specified number of pups per litter, assuming that the chemical produces in each pup an abnormality with the probability plotted on the x-axis.
Based upon feed consumption of the 1.5, 10, 30, 100, 300, 1000, 7500, and 10,000 ppm DEHP diet, average consumed doses were 0.12, 0.78, 2.4, 7.9, 23, 77, 592, and 775 mg/kg/day for the P0 animals; 0.09, 0.48, 1.4, 4.9, 14, 48, 391; and 543 mg/kg/day for the F1 animals, and 0.1, 0.47, 1.4, 4.8, 14, 46, and 359 mg/kg/day for the F2 animals (there were no F2 animals at 10,000 ppm). P0 food consumption was increased and decreased at certain time points during exposure, but there was no dose- or time-related pattern (i.e., no consistent effect across time and/or dose level; data not shown). In the P0 generation, male body weights were significantly decreased by 5–6% at 10,000 ppm during weeks 21 and 23 and dam body weights at 10,000 ppm were significantly reduced at delivery of each of the three litters. In the F1 generation, body weights of males and females were significantly reduced throughout exposure in the 10,000-ppm dose group. Body weights were also reduced in 7500-ppm adult F1 males and F2 males and females. Food consumption (grams per animal in kilogram) was generally increased throughout exposure in the F1 7500 and 10,000 ppm males and F2 7500 ppm males and females.
In the mated cohorts, the pregnancy index (number of females delivering/number of cohabiting pairs) was unaffected by treatment in the P0 generation, but a clear effect was evident at the 10,000 ppm dose level in the F1 generation (Table 1). No litters were produced by the F1 10,000-ppm dose group so exposure continued only for the 1.5-, 10-, 30-, 100-, 300-, 1000-, and 7500-ppm dose groups. In the production of the F2 generation, the pregnancy index of F1 dams was significantly decreased at the 7500 ppm dose. The litter size (number of live pups) produced by the P0 generation was significantly reduced in the 7500-ppm dose group (Table 2), but this effect was not consistent across generations or dose dependent. Of the three litters produced by each generation, average pup weights on PND1 (combined male and female) were generally unaffected by treatment below 7500 ppm. Body weights at PND 1 were reduced in the F1 pups at 10,000 ppm and in the F2 pups at 7500 ppm (Table 2).
Two P0 10,000 ppm males had a mild testicular lesion with no consequent effects on fertility in this group and one 7500 ppm animal had a prostate lesion. In stark contrast, all the F1 10,000 ppm males had an RTM (Table 3), and no F2 litters were produced (Table 2). At 7500 ppm, there were significant increases compared with controls in the incidence of RTMs in the F1 and F2 males. RTMs were also observed in the 1000- and 300-ppm groups of the F1 and F2 generations. A single incidence of a seminal vesicle and prostate malformation was observed in the F1 10-ppm group and 30-ppm group groups. A single testis malformation (tunica albuginea aplasia) was noted in the control (1.5 ppm) group in breeding male of the F2 generation but was not considered for the analysis of malformations because it was not typical of the phthalate syndrome. There was a significant increase in total RTMs, consistent with the phthalate syndrome, at and above 300 ppm when malformations over the F1 and F2 generations were combined (Table 3). The dose-response curves were similar for the F1 and F2 generations and when the RTM of the two generations were combined (Fig. 1). A Hill model (Graphpad Prism 5.01, San Diego, CA) of the 1.5–10,000 ppm RTM data, which was constrained between 0 and 100% incidence, calculated a half maximal concentration (EC50) of 2771 ppm (R2 = 0.9139) for the F1 generation, an EC50 of 1480 ppm (R2 = 0.9894) for the F2 generation, and the EC50 for the combined F1 + F2 generation was 2094 ppm (R2 = 0.9564). The roughly twofold difference in EC50 between the F1 and F2 generations is due to the low-incidence observed in the F1 1000-ppm group, and when removed, the F1 EC50 is 1406 ppm (R2 = 0.9564) similar to the F2 generation EC50. After evaluating various types of BMD models, the Weibull model provided the best fit across the F1, F2, and F1 + F2 data sets (see Supplementary data). The BMD and BMDL values for the F1, F2, and combined F1 + F2 data were 257 and 169 ppm, 233 and 77 ppm, and 198 and 142 ppm, respectively.
The Monte Carlo simulations demonstrated that when more males per litter were examined using the non-mated cohort, the ability to detect an effect (i.e., statistical power) increased (Fig. 3). Detection was defined as getting a p value of ≤ 0.05 using Fisher’s exact test with the litter as the unit of analysis. If the probability of an abnormality is 20% in each treated pup and 0% in each control pup, this effect can be detected 99.5% of the time when 3 male pups are examined from each of 20 litters per group versus only 37.9% of the time when 1 pup per litter was examined. A lower probability of an abnormality of 10% would be detected only 4.7% of the time when one male pup per litter is examined but detected 66.4 and 86.5% of the time when three or four pups are examined, respectively.
DEHP produced a dose-dependent increase in the incidence of male RTMs at 300 ppm (~15 mg/kg/day) and higher when F1 and F2 offspring were pooled together. Although not statistically significant, the presence of multiple malformations in multiple litters within the 300-ppm dose group of F1 generation was considered to be an indication of an adverse effect. Based upon the combined F1 and F2 number of litters with malformations, the NOAEL was considered to be 100 ppm or 4.8 mg/kg/day and the lowest observed adverse effect level (LOAEL) was 300 ppm or 14 mg/kg/day. The single RTM present in the 10- and 30-ppm group of the F1 generation were considered to be equivocal responses because it was unclear if these single RTMs present only in the F1 generation were related to treatment. However, these specific malformations observed were part of the phthalate syndrome and therefore their relationship to treatment with DEHP cannot be completely dismissed.
A more accurate dose response (Fig. 1) of DEHP-induced RTMs was generated by examining more pups per litter. The LOAEL from this study is consistent with a recently published LOAEL of 11 mg/kg/day after a GD 8 to PND 17 exposure to Sprague-Dawley rat dams in which all the male pups within a litter were evaluated in adulthood (Gray et al., 2009). Although the length of exposure differed between the current study and Gray et al., 2009, both studies exposed animals to DEHP during the important in utero window of male rat sexual differentiation. In the current study, the lack of fertility in the F1 10,000-ppm DEHP group is likely due to the widespread incidence of RTMs within males. The BMD values of a 5% response of male RTM incidence by litter did not differ greatly (198–257 ppm) when using data from the individual generations or when the data were combined and were lower than the identified LOAEL. There was more variation in the BMDL values (77–169 ppm) across the F1, F2, and F1 + F2 data sets due to larger variation in the F2 data set. However, these BMDL values were close to the NOAEL of 100 ppm, which may be due to the use of seven dose groups and spacing of doses within this design. Some limitations of the NOAEL approach are its dependency on the number of dose groups, spacing of doses, level of detection, and limitation to one of the experimental doses (Barnes et al., 1995; Sand et al., 2002) such that a design with a low number of dose groups spaced greatly apart can lead to an artificially low NOAEL or insufficiently powered study could lead to an inappropriately high NOAEL. The current study reduced some of these limitations by making use of more pups per litter (i.e., increased detection) and by including additional dose groups.
The power calculations demonstrate that the sampling of additional males per litter can add substantial value to detecting low-incidence effects. When only one male offspring is examined, the dose-response curve of a test article could artificially be steep which will affect the determination of NOAEL and BMD, whereas sampling more males per litter may smooth out the dose-response curve. As mentioned previously, examination of these extra animals makes use of pups already produced that would normally be removed during culling. The utility of adding extra dose groups in order to define the dose response could be hampered by low power when only one male per litter is examined and lead to wastage of animals in the extra dose groups. In the current study, the extra dose groups in combination with evaluating more pups per litter contributed to a well-defined dose response for DEHP. However, if only one pup was evaluated per litter, the DEHP-induced effects in the lower dose litters would likely have been missed so the utility of those extra dose groups would be little to none.
Power calculations demonstrate that increasing the number of animals sampled per litter increases the probability of detecting low-incidence malformations and suggests that if more animals are sampled per litter, fewer litters would be needed thus reducing the overall animal use and production (Gray et al., 2009; Hotchkiss et al., 2008). Although this study focused on power to detect noncontinuous outcomes (i.e., specific androgen-dependent malformations), the ability to detect effects in continuous outcomes (e.g., organ weights) is also influenced by the number of pups sampled per litter (Gray et al., 2009; Hotchkiss et al., 2008).
These data illustrate the increased ability to detect RTMs at lower dose levels by the retention of more adult F1 and F2 males per litter following in utero exposure to the endocrine disruptor, DEHP. This study was a feed exposure multigenerational reproductive toxicity assessment conducted under GLP conditions, which incorporates the technical and logistical necessities that are similar to guideline studies. We demonstrate here that it is possible to examine more than one pup per sex per litter in adulthood under these conditions. Retaining these extra littermates makes better use of the animals produced and strengthens the power of the study. While additional pups are examined grossly at weaning under the current EPA and OECD guidelines, malformations in the reproductive tract may not be apparent at this young age. The current sampling of one adult male per litter under the current OECD guidelines would have a 5% power of detecting an effect in males at low litter effect sizes of 10%.
In order to improve the dose-response information from RACB studies, the National Toxicology Program will retain more adult offspring in all its future studies. Regulatory entities should consider incorporating in their respective guidelines the evaluation of an increased number of male and female offspring (e.g., n = 3 per sex per litter) in adulthood. This change in study design will improve the dose-response information on postnatal development and provide more informative reproductive toxicity data for use in human health risk assessment.
Intramural Research Program of the National Institutes of Health; National Institute of Environmental Health Sciences under Research Project Number 1 Z01 ESO45004-11 BB.