Simulation study. We conducted an extensive simulation study to compare the performance of the proposed trend test, *T*, with the NTP trend test. A total of 750 nonnull configurations and 150 null configurations, similar to those commonly encountered in the NTP rodent bioassays, were simulated. All simulation results reported in this article are based on 10,000 simulation runs, and the nominal level of significance is α = 0.05.

Simulation parameters were patterned after the NTP rodent cancer bioassays. We considered a total of three dose groups (low, medium, and high) and a control group, with 50 animals assigned to each group. As described by other authors (e.g.,

Dinse 1991;

Peddada et al. 2001,

2005), for each animal in the

*i*th dose group,

*i* = 1, 2, 3, 4, we generated realizations of two independent Weibull random variables,

*Y*_{i}_{1} and

*Y*_{i}_{2}, where

*Y*_{i}_{1} represented the time to tumor onset and

*Y*_{i}_{2} represented the time to death from natural causes. The survival function of

*Y*_{ij},

*i* = 1, 2, 3, 4 and

*j* = 1, 2 is given by

*P*(

*Y*_{ij} >

*t*|d

_{i}) = exp(−ψ

_{j}_{ij}^{di}*t*^{γ}^{j}). We simulated these random variables such that the duration of the study was 24 months, which is typical of the NTP rodent bioassays. We simulated two dose patterns, 2-fold dose spacing and 5-fold dose spacing, namely, (

*d*_{1},

*d*_{2},

*d*_{3},

*d*_{4}) = (0, 0.5, 1, 2) and (0, 0.1, 0.5, 2.5).

As previously described (

Peddada et al. 2005), we considered constant dose effect on mortality; that is,

_{i}_{2} =

_{2}, with patterns of

_{2} = 1 (no effect), 1.5, 2, 2.5, and 3 (severe effect). We set the mortality shape parameter at γ

_{2} = 5 and baseline mortality scale parameter at ψ

_{2} = 4.479 × 10

^{−8} so that 70% of the animals in the control group survived to the end of the 2-year study, a rate often observed.

The three tumor onset shape parameter (γ

_{1}) values considered in this study were 1.5, 3, and 6. Poly-3 survival adjustments are based on the assumption that the true tumor onset is Weibull with shape parameter γ

_{1} = 3 (

Portier and Bailer 1989). Thus, the ideal situation for the poly-3 survival correction is γ

_{1} = 3. We considered five different background tumor rates, π

_{1}, ranging from rare (0.001, 0.01, 0.05) to common (0.15, 0.30). Values of the baseline tumor onset scale parameter, ψ

_{1}, corresponding to each π

_{1} are given in . Finally, we chose six different sets of the effect of dose on tumor onset,

_{i}_{1}, for each of the five background tumor rates; values of

_{i}_{1} are given in . In each case, the null hypothesis corresponds to the case when the incidence rates are all equal; that is, the ratios are (1:1:1:1). Thus, a total of 375 nonnull and 75 null configurations were considered for each of the two dose spacings.

| **Table 1**Patterns of tumor onset shape parameter (γ_{1}), and tumor onset scale parameter (ψ_{1}) by background tumor rate (π_{1}). |

Results of the simulation study are represented by scatter plots of false-positive error rates (or power) with the NTP procedure on the horizontal axis and the proposed procedure on the vertical axis. For the trend tests, false-positive error rates are summarized in and powers in . For the pairwise comparison procedure, false-positive error rates are summarized in . In each case, the diagonal line represents the line of equality between the two tests. The horizontal and vertical lines in and are drawn at a distance of

from the origin. In and , points falling to the right of the vertical line indicate instances in which the NTP procedure exceeds the nominal level of 0.05, and points falling above the horizontal line correspond to instances in which the proposed test exceeds the nominal level of 0.05. In , points falling below and to the right of the diagonal line correspond to instances in which the NTP trend test has more power than the proposed trend test, whereas points falling above and to the left of the diagonal line correspond to instances in which the proposed trend test has more power than the NTP trend test. To reduce clutter in the plots, we tested equality of the false-positive error rates (or power) of the NTP procedure and the proposed procedure using a two-sample

*z*-test for proportions, and we plotted only those points for which there was a significant difference between the NTP test and the proposed test at the 5% level of significance.

For the 75 null patterns considered in this simulation study, there were 23 patterns where the two tests had significantly different false-positive error rates (). This result was observed for 2-fold spacing as well as 5-fold spacing. The proposed test was rarely more liberal than the NTP trend test when both tests exceeded the nominal level; that is, the false-positive rate of the proposed test never exceeded that of the NTP trend test. Furthermore, the NTP trend test was more liberal than the proposed test for common tumors (π_{1} ≥ 0.15) considered in this study. Although we only plotted the cases for which the false-positive error rates of the two tests differed significantly, the false-positive error rate of the proposed trend test never exceeded 0.087, and that of the NTP trend test never exceeded 0.099.

The power of the two tests differed significantly in 270 of the 375 nonnull dose patterns for 2-fold spacing (). In approximately 70% of these 270 patterns, the proposed trend test had higher power than did the NTP trend test. Thus, a large number of points in are above the diagonal line. Further, in 15 of the 270 patterns (about 6%), the false-positive error rate of the NTP trend test exceeded the nominal 0.05 significance level and was significantly higher than that of the proposed test. These cases are denoted by a “+” in . The gain in power for the proposed test was as high as 0.275 (0.69 for the proposed test vs. 0.415 for the the NTP trend test), a relative gain of 66%. In contrast, the best gain observed for the the NTP trend test was 0.048 (0.502 for the the NTP trend test vs. 0.454 for the proposed test), a modest relative gain of < 10%.

The power gains made by the proposed test were even more substantial for 5-fold dose spacing (). Power of the two tests differed significantly in 264 of the 375 nonnull patterns, and the proposed test had higher power in almost 85% of these patterns. Thus, most points in are above the diagonal. Further, as we observed with the 2-fold dose spacing, in 13 of the 264 patterns (about 5%) the false-positive error rate of the the NTP trend test exceeded the nominal 0.05 significance level and was significantly higher than that of the proposed test. As in , these cases are denoted by a “+.” The gain in power for the proposed test was as high as 0.460 (0.671 for the proposed test vs. 0.211 for the NTP trend test), > 300%. In contrast, the best gain observed for the NTP trend test was 0.038 (0.331 for the NTP trend test vs. 0.293 for the proposed test), a modest relative gain of < 12%.

In cases for which tumor incidence rates increased monotonically, but not linearly, with dose, the proposed trend test performed better than the NTP trend test in terms of both power and false-positive error rate. As expected, the NTP trend test performed better than the proposed test in cases for which tumor incidence rates increased linearly with dose. But even in such cases, the gains made by the NTP trend test were modest. Furthermore, the false-positive error rate of the NTP trend test often exceeded the nominal 0.05 significance level.

For the null configurations described above, we also compared false-positive error rates of the proposed pairwise comparisons procedure with the NTP procedure for pairwise comparisons between the medium- and high-dose groups with the control group. shows that the proposed method maintained false-positive error rates at or below the nominal 0.05 level, whereas the NTP procedure was often liberal, exceeding the nominal level of 0.05. Although we plotted only the cases in which the false-positive error rates of the two tests differed significantly, the proposed pairwise test never exceeded 0.05, whereas the NTP pairwise test had false-positive error rates as high as 0.11.

An NTP example. As part of an NTP bioassay on isoprene, female F344/N rats were exposed to isoprene for 2 years through inhalation (

NTP 1999). Isoprene is a naturally occurring compound in plants, as well as a byproduct of ethylene production. It is similar in structure to 1,3-butadiene, a potent rodent carcinogen.

Fifty female rats were exposed to 0, 220, 700, or 7,000 ppm isoprene; 19, 35, 32, and 32, respectively, developed mammary gland fibroadenomas. Survival-adjusted tumor proportions showed a plateau-shaped response, with 44%, 74%, 74%, and 73%, respectively, of the animals developing fibroadenomas. The NTP trend test gave a *p*-value of 0.105, whereas each dosed group differed from the control group at *p* < 0.002. Because of the wide dose spacing and the plateau-shaped response beginning at the low dose of 220 ppm, the NTP trend test was not sensitive enough to detect the dose-related response.

The proposed trend test provided a significant dose-related trend in mammary gland fibroadenomas with a *p*-value of 0.0014. As indicated in our simulation study discussed above, this statistic is capable of detecting monotonic nonlinear trends with dose and is not affected by wide dose spacing. Furthermore, using the proposed method for pairwise comparisons, each dose group differs from the control group at *p* < 0.005. From our simulations, we can be confident that, among all of the pairwise comparisons with the control group, the overall false-positive rate of 0.05 is not exceeded.