2.1 Notation
Let
T be the time to the first of two events, either tumor onset or death, and let
Y (
t) be an indicator that is 1 if the tumor is present at time
t and 0 otherwise. The tumor incidence rate and the tumor-free death rate correspond to the event-specific hazard functions
and
respectively. Let π be the expected proportion of animals that develop a tumor during the study (i.e. the lifetime tumor rate), which can be expressed in terms of λ(
t) and
β(
t):
where
tTS denotes the terminal sacrifice time (i.e. the length of the study).
Suppose there are
k treatment groups in the current experiment, with associated dose levels
d1 <
d2 <

<
dk, where animals in the first group are unexposed controls (
d1 = 0). More generally,
di is a dose score assigned to the
ith group (
i = 1, 2, … ,
k). Let π
i, λ
i(
t), and
βi(
t) denote the lifetime tumor rate, the tumor incidence rate, and the tumor-free death rate in the
ith group. Tests for a positive trend in tumor incidence rates involve the following null and ordered alternative hypotheses:
and
respectively, where at least one inequality in
Ha is strict for at least one t.
Equation (1) shows that tests oriented toward the {π
i} are not necessarily valid for comparing the {λ
i(
t)}, as such tests can be confounded by differences among the {
βi(
t)}.
Suppose
nci animals in the current study are randomized to the
ith group and receive dose
di of the treatment (
i = 1, 2, … ,
k). We use
c to index variables and parameters related to the current study, and later we use
h to index similar quantities related to the historical data. Let
ycij and
tcij denote the tumor status and death time, respectively, for the
jth animal in the
ith group of the current study (
j = 1, 2, … ,
nci;
i = 1, 2, … ,
k). Thus,
ycij is 1 if a tumor is present at necropsy and 0 otherwise. Furthermore,

is the total number of animals in the current study, and

and

are the numbers of tumor-bearing animals in the
ith group and in the entire study, respectively.
2.2 Background
Many early analyses used the linear trend test of
Cochran (1954) and
Armitage (1955) to compare lifetime tumor rates. Inferences based on lifetime rates can be misleading, though, if survival differs across treatment groups. For example, unless all animals live to the end of the study, the sample proportion
yci+/
nci tends to underestimate π
ci because it does not account for early deaths reducing the time at risk for developing tumors. Bias arises when the amount of reduction in the time at risk varies with dose (i.e. differential mortality).
Bailer and Portier (1988) addressed the survival issue by defining a modified sample size

, where
δcij is a weight for animal
j in group
i. Clearly an animal that developed a tumor was at risk of tumor onset and thus receives a weight of 1. An animal that dies without a tumor, however, is assigned a weight linked to the fraction of the study it survived.
Bailer and Portier (1988) suggested using (
tcij/
tTS)
3 for this fractional weight, which is known as the Poly-3 weight; then they applied the Cochran-Armitage test after substituting

for
nci. The resulting survival-adjusted estimate of π
ci is

.
Bieler and Williams (1993) noted that

is a random variable and proposed a corrected variance estimator, which under the assumption that π
c1 = π
c2 =

= π
ck is given by
where

, and the plus sign indicates summation over the missing index.
The usual Poly-3 trend test, with the Bieler-Williams variance adjustment, is
The NTP currently uses the following (conservative) variation of
WBW:
which shrinks
WBW toward 0. The continuity correction factor in the above formula is
where

is a weighted average of the dose scores. In the absence of survival differences,
CF reduces to half the largest difference between successive dose scores. Both
WNTP and
WBW are approximately normal when π
c1 = π
c2 =

= π
ck. Bailer and Portier showed that while the Poly-3 test is oriented toward survival-adjusted comparisons of the {π
ci}, it also provides an approximately valid comparison of the age-specific tumor incidence rates when each λ
ci(
t) follows a Weibull model with a shape parameter of 3. However, simulation studies demonstrate that the Poly-3 test is sensitive to certain departures from the underlying Weibull model (
Mancuso et al., 2002;
Peddada et al., 2005).
Peddada et al. (2005) suggested an alternative trend test which uses the Poly-3 weights, but does not assume the tumor rates are linear in dose, does not depend on the numerical scores assigned to the dose groups, and does not incorporate a continuity correction. Let

denote the survival-adjusted isotonic regression estimate of π
ci (
i = 1, 2, … ,
k) under the constraint that π
c1 ≤ π
c2 ≤

≤ π
ck. These estimates can be obtained, for example, by applying the pool-adjacent-violators algorithm (PAVA) (
Barlow et al., 1972) to the {

} with weights {

}. As a simple illustration of PAVA, suppose there are
k = 4 groups, all weights are equal, the true (unknown) proportions are constrained by π
1 ≤ π
2 ≤ π
3 ≤ π
4, and the sample proportions are

and

, respectively. Clearly,

and

violate the constraint. The isotonic solution can be obtained by retaining those estimates which satisfy the inequality constraints while averaging the estimates which are in violation. Thus, in this illustration, the “isotonized” estimates are

, and

. Mathematically, in the general case, the “isotonized” values

(
i = 1, 2, …,
k) under the above inequality constraint can be computed using the following min-max formula:
The isotonic test statistic proposed by
Peddada et al. (2005) can be written
which was motivated by a procedure that
Williams (1977) derived to test the equality of normal means. Define
k standard normal random variables:
Zi ~
N(0, 1) for
i = 1, … ,
k. Given equal {π
ci}, the distribution of
WISO can be approximated by the distribution of
where

are the isotonized values of the {
Zi} using unit weights.
Peddada et al. (2005) conducted extensive simulations which showed that their test generally was more powerful than the Bailer-Portier/Bieler-Williams test when comparing age-specific incidence rates, even though both tests are more naturally oriented toward the{π
ci} than the {λ
ci(
t)}.
2.3 Proposed Test
Suppose there are
p relevant studies in the historical control database, where this number depends on several factors. Generally, we would prefer to include only those historical studies for which the route of exposure (e.g. feed, gavage, inhalation, skin painting) is the same as in the current study. Also, we typically restrict our comparison to studies which were performed relatively recently, since background tumor rates are known to “drift” over time (
Haseman, 1992). Thus, in practice, we anticipate incorporating control data from approximately 5 to 10 historical studies.
Let
nhm be the number of animals in the
mth historical control group, and let
yhmj and
thmj denote the tumor status and death time, respectively, for animal
j in group
m (
j = 1, 2, … ,
nhm;
m = 1, 2, … ,
p). In addition, set

and

. Finally, let π
hm denote the proportion of animals [summationtext] in the
mth historical control group that are expected to develop a tumor if surviving the entire study (i.e. the lifetime tumor rate).
The problem of detecting trends in tumor rates can be viewed as a special case of order restricted inference and can be solved by maximizing certain distances in a directed graph. shows a directed graph appropriate for the usual NTP situation, where there is a high-dose (HD) group, a mid-dose (MD) group, a lowdose (LD) group, a current control (CC) group, and a collection of historical control (HC) groups. The arrows indicate the restrictions that the tumor rates in the control groups do not exceed the rate in the low-dose group, which does not exceed the rate in the mid-dose group, which does not exceed the rate in the high-dose group. The control animals, both current and historical, are assumed to come from a common population, and thus there are no arrows connecting CC and HC.
In the spirit of Kolmogorov distance measures, we follow
Peddada et al. (2001) and define a test statistic for the graph in as the maximum distance between connected points. Specifically, our test is
max(
D1,
D2), where
D1 is the distance between the current control group and current dose groups, and
D2 is the distance between the historical control groups and current dose groups. Each of these two distances,
D1 and
D2, is itself the maximum of two quantities. Distance
D1 is the maximum of the Bieler-Williams version of the Poly-3 statistic and the order restricted inference statistic of
Peddada et al. (2005) based on the current control group. Our choice for
D1 was motivated by a test proposed by
Peddada and Kissling (2006), who selected this pair of statistics because the former performs well against linear trends, while the latter performs well when trends are not linear. By taking the maximum of the two, we obtain a test with reasonably good power for detecting linear or nonlinear (but still non-decreasing) trends. Distance
D2 has exactly the same form as
D1, except that it is based on the historical control groups rather than the current control group. Thus, the proposed test statistic is the maximum of four components. Two of the components are
WBW and
WISO, as described in the previous subsection, and now we derive the other two analogous components, say

and

.
Our approach views the lifetime tumor rates in the current and historical control groups (π
c1, π
h1, … , π
hp) as random realizations from a common control population with mean π. If no animals die before the end of the study, the sample proportion
yhm+/
nhm has expected value π (
m = 1, 2, … ,
p) and we assume the variance can be expressed as
where σ
2 is an unknown parameter that allows for extra-binomial variation.
Now suppose that some animals die before
tTS and we want to account for this by using Poly-3 adjusted sample sizes. Allowing for the random nature of the sample size in variance formula (2), we propose separate estimators for σ
2 and π(1 — π), where a Bieler-Williams type adjustment is made to the latter. Under the assumption that the lifetime tumor rates are equal in all control and treated groups, we estimate σ
2 by

, where
and
The Bieler-Williams estimate of π(1 — π), say

, is obtained by applying the formula for

to the combined set of
p historical control groups and
k current groups. We estimate the variance of the Poly-3 adjusted tumor proportions in the current high-dose group,

, and in the pooled historical control groups,

, by
and
respectively, where

. Note that although the extra binomial variation among the historical control groups should not extend to the current dose groups, we purposely inflate the estimated variance of

by

in the above formula to avoid the Behrens-Fisher problem. This modification leads to a slightly conservative test (described below), but we view it as a small price to pay for the resulting simplification.
Thus, we define a new pair of test statistics, say

and

, which are analogous to
WBW and
WISO, except that they are oriented toward detecting a trend in the current dose groups relative to the historical control groups rather than relative to the current control group. Specifically, define

exactly the same as
WBW , except for replacing the current control group quantities
wc1 and

with the historical control group summaries
wh+ and

, as well as replacing
S1 with

. Similarly, let

be the following analog of
WISO:
where

are the isotonized values of {

} under the constraint that π ≤ π
c2 ≤ π
c3 
π
ck, with weights

.
Finally, the proposed survival-adjusted test statistic for detecting a dose-related trend in the tumor rates, using both current and historical controls, is given by
Deriving the exact null distribution of
W is not trivial. Hence we use critical values based on the approximations described below. In the next section, simulation studies reveal that our approximations control the Type I error rate very well.
Initially, assume that all animals survive to the end of the study and the lifetime tumor rates in all control and treated groups are equal to π. Also, let
vg denote the estimated variance of

for
g =
h, c1,
c2, … ,
ck. These assumptions, together with the consistency of the {
vg}, suggest the following approximations:
Denote the above limiting standard normal random variables by {
Z0,
Z1, … ,
Zk}, where the first term (
Z0) corresponds to the pooled set of
p historical control groups and the remaining terms correspond to the
k groups in the current study. Next, define

and
Ui =
Zi for
i = 1, … ,
k. In our application, the observations in all control and treated groups are independent; thus, the {
Zi} are independently distributed, as are the {
Ui}. Note that all 4 components of
W involve

, but in addition to these terms
WBW and
WISO are also functions of

, whereas

and

are also functions of

. Thus, we approximate the null distribution of
WBW and
WISO by using the random variables
Z1, Z2, … ,
Zk, and we approximate the null distribution of

and

by using the random variables,
U0,
U2,
U3, … ,
Uk. As
WBW can be obtained by regressing the lifetime tumor rates on dose, it should be distributed approximately the same as the regression of
Z1,
Z2, … ,
Zk on dose:
We approximate the distribution of

using the analogous function of
U0,
U2, … ,
Uk:
Both
WISO and

resemble the statistic proposed by
Williams (1977) to test for a monotone trend in dose. Hence, using the asymptotic approximations given above, the null distributions for these statistics can be approximated by
and
respectively, where

are the isotonized values of {
Z1,
Z2, … ,
Zk} with unit weights and

are the isotonized values of {
U0,
U2, … ,
Uk} with a weight of
p for
U0 and unit weights for the others.
Therefore, in the absence of deaths prior to the end of the experiment, the null distribution of our test statistic
W can be approximated by the distribution of
As
V is a function of independent N(0, 1) random variables, its distribution can be simulated by repeatedly generating
k + 1 independent standard normals. The level
α critical value for
W is the 100(1 —
α)
th percentile of the simulated distribution of
V. In practice, we take a million random samples to approximate the distribution of
W.
More generally, when some animals die early, the components of W are functions of the Poly-3 weights, which provide a simple survival adjustment. However, despite possible survival effects, we approximate the distribution of W by the same simulated distribution of V described above. We demonstrate that the proposed test performs well in the more general case by simulating a variety of realistic data sets with many deaths before tTS. The simulation results show that our test operates at (or below) the nominal level in a wide range of situations typically observed in practice.