Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Pharm Stat. Author manuscript; available in PMC 2010 October 27.
Published in final edited form as:
PMCID: PMC2964842

Interval Approach to Assessing Antitumor Activity for Tumor Xenograft Studies


In preclinical cancer drug screening tumor xenograft experiments, the tumor growth inhibition ratio (T/C) is commonly used to assess the antitumor activity of the agents. Unfortunately, this measurement can discard useful data and result in a high false-negative rate. Furthermore, the degree of antitumor activity based on the T/C ratio is assessed on the basis of an arbitrary cutoff point that does not reflect variations in different tumor lines. To overcome these drawbacks, we propose an adjusted area-under-the-curve (aAUC) ratio to quantify tumor growth inhibition. A nonparametric bootstrap t-interval of the aAUC ratio is also proposed for assessing the significance of the anti-tumor activity of the agents. The proposed method is then applied to a real tumor xenograft study.

Keywords: area-under-the-curve, antitumor activity, bootstrap, confidence interval, xenografts

1. Introduction

In preclinical cancer drug screening tumor xenograft experiments, human cancer cells are engrafted into mice to produce xenograft models. Tumor-bearing mice are randomized into control (C) and treatment (T) groups, and the maximum tolerated dose of each drug is administered. The volume of each tumor (one tumor per mouse) is measured at the initiation of study and periodically throughout the study. Mice are euthanized when their tumor volume reaches four times its initial volume, thus resulting in incomplete longitudinal tumor volume data. Because of the incompleteness of the experimental data, the T/C ratio (Corbett, et al., 2003, Houghton, et al., 2007) calculated at a given time with complete observations is commonly used to quantify tumor growth inhibition, where T and C represent the means of the relative tumor volumes of the control and treatment mice, respectively. Here, the relative tumor volume of each tumor is defined as the tumor volume divided by its initial volume. This approach could, however, discard some observed data and result in a high false-negative rate. Furthermore, the degree of antitumor activity is assessed on the basis of an arbitrary cutoff point (Corbett, et al., 2003, Houghton, et al. 2007). Several new statistical methods have recently been developed for assessing antitumor activity in tumor xenograft models. Tan et al. (2002) proposed a small-sample t-test via the EM algorithm. They assumed a multivariate normal distribution for the repeated log tumor volumes with a Toeplitz covariance matrix. Due to the strong model assumption, their method has limited application to drug screening tumor xenograft data. Furthermore, the method yields only a p-value and does not quantify tumor growth inhibition. Vardi et al. (2001) proposed a nonparametric two-sample U-test. The proposed methodology is a fully nonparametric approach, which is generally applicable to drug screening tumor xenograft data. The method, however, assesses the treatment effect by the cross-treatment difference instead of the ratio and yields a p-value only; no confidence intervals are available. Liang (2006) proposed a non-parametric approach to compare antitumor effects in two treatment groups. The approach is, in essence, a comparison of two tumor volume curves and yields a p-value only. Some sophisticated models have been proposed to fit the tumor growth curves; for example, a biexponential model (Demidenko, 2004, Liang and Sha, 2004), a LINEXP model (Demidenko, 2006), and a nonparametric model (Liang, 2005). However, it is often difficult to model tumor growth curves due to a short study period, small sample size, and diverse tumor growth patterns in tumor xenograft experiments. Hothorn (2006) proposed an interval approach for the T/C ratio. However, Hothorn’s interval is obtained on the basis of an assumed normal distribution of the tumor volume. Recently, the Pediatric Preclinical Testing Program (PPTP) established panels of childhood cancer xenografts and tumor lines for in vivo screening (Houghton, et al, 2007). The objective of the PPTP study was to identify novel agents with significant antitumor activity. The PPTP has produced a large amount of tumor xenograft screening data. Therefore, the development of an appropriate methodology for analysis of tumor xenograft data is important for the evaluation of existing and new childhood anticancer agents. It is also the motivation of this paper. Here we propose an adjusted area-under-the-curve (aAUC) ratio to quantify tumor growth inhibition. The aAUC ratio, unlike the T/C ratio, uses all observed data and retains high efficiency for evaluating antitumor activity. Furthermore, a nonparametric bootstrap t-interval (Efron and Tibshirani, 1993) is developed for the aAUC ratio to assess the significance of antitumor activity of the tested agent, which avoids the specific assumptions about the distribution of tumor volume and the tumor growth curve.

2. Adjusted AUC Ratio and Interval Estimate

To make full use of the observed but incomplete tumor volume data, an aAUC is proposed. Tumor growth inhibition is then quantified by the ratio of the aAUCs of the treatment and control groups. To assess the significance of the treatment effect, two nonparametric bootstrap intervals are proposed in this section: a bootstrap bias-corrected (BC) interval and a bootstrap t-interval.

Consider a typical tumor growth experiment with two groups: control and treatment. Let ξ (t) be the relative tumor volume growth curve at time t [set membership] T = [0, t*] for a mouse; the associated right censoring time is τ. The censoring occurs because a mouse dies due to toxicity before the end of study or is euthanized when its tumor volume reaches four times its initial volume. Then the area-under-the-curve (AUC) could be calculated up to τ as


where T (a) = {t: t [set membership] T, ta} and ν is a suitable measure on T. The raw AUCs are not comparable due to uneven censoring. Therefore, the AUC is adjusted by the length of the interval between the initiation of the experiment and the last time point with existing tumor volume measurements, and the resulting the aAUC is


It is often difficult to model the growth curve ξ (t) explicitly due to diverse tumor growth patterns in xenograft experiments. Therefore, an empirical approach is used to calculate the aAUC. Specifically, assuming that there are k + 1 measured relative tumor volumes ξi for a tumor at time ti (i = 0, …, k), the AUC is calculated up to the last time point with existing tumor volume measurements using a simple trapezoidal rule,


Then the aAUC can be calculated as


The normality assumption of the aAUC is unreliable due to the skewness of the aAUC and small sample size. Therefore, a nonparametric approach is more appropriate and useful.

To simplify the notation, let XC and XT be the aAUC of the control and treatment groups and μC and μT be the corresponding means. We can define the aAUC ratio, γ, as the ratio of the means of the aAUCs of the treatment and control groups; that is,


which quantifies the tumor growth inhibition of the agent and can be estimated by


where XC and XT are the sample means of the control and treatment groups, respectively. To construct the interval of γ, we take a log-transformation of γ as


An estimate of the standard error of [theta w/ hat] = log([gamma with circumflex]) can be obtained by the Delta method as


where σ^C2, nC and σ^T2, nT are the sample variances and sample sizes of the control and treatment groups, respectively. Because the only change of interest is the tumor volume reduction after treatment, a one-sided confidence interval will be constructed.

The 100(1−α)% one-sided upper confidence limits of bootstrap intervals can be estimated by following bootstrap procedures.

  1. Generate B independent bootstrap samples of aAUC from each group, XCb={X1Cb,,XnCCb} and XTb={X1Tb,,XnTTb}, b = 1,···, B.
  2. Compute the bootstrap replication [gamma with circumflex] *b, where γ^b=X¯Tb/X¯Cb for b = 1, ···, B.
  3. A 100(1−α)% bootstrap bias-corrected (BC) percentile interval upper limit is obtained directly from bootstrap distribution Ĝ (s) = #{[gamma with circumflex]*b < s}/B of {[gamma with circumflex]*b, b = 1, ···, B} as
    where α1 = Φ (2z0 + z1−α) and z0 = Φ−1(Ĝ ([gamma with circumflex])), Φ (·) is the standard normal distribution.
  4. A 100(1−α)% bootstrap t-interval upper limit is obtained directly from bootstrap sample
    where [theta w/ hat]*b = log([gamma with circumflex]*b) and se^b is calculated using (1) for the bootstrap sample { XTb,XCbb = 1, ···, B}. Let the αth percentile of {t*b, b = 1, ···, B} be estimated by the value of tα such that #{t*b < tα= α, then upper limit of the bootstrap t-interval is given by

3. Simulation Studies

In this section, we designed simulation studies to investigate the power to distinguish treatments by using three endpoints: a) the relative tumor volume at the last time point with complete data from the two groups (RTVC), b) the aAUC, and c) the cross-treatment difference (CTD) defined by Vardi et al. (2001). We also conducted a simulation to study the coverage probability of the proposed nonparametric bootstrap intervals for the aAUC ratio.

3.1 Comparison of Power

Two simulation scenarios are considered to assess the powers of the three endpoints mentioned above, one with equal censorship (scenario I), meaning the censoring times for the two groups have the same distribution. However, this kind of homogeneity assumption may be unrealistic in some circumstances. For example, treatment group mice could have a slower growth rate and higher mortality rate than control group mice. Thus, it is desirable to consider heterogeneous censorship (scenario II), with different censoring distributions for the two groups. To yield simulation data similar to the real tumor xenograft data, tumor volumes for both scenarios were generated by following exponential growth curves, which are the typical for control mice. The error terms reflect measurement errors and resulting correlated longitudinal tumor volumes. The σ measures the departure from the exponential form of the growth curve. We generated 10 tumor growth curves for each group with tumor volume at each week for a 6-week study period. Tumor volumes were treated as missing when the tumor volume exceeded four times its initial volume.

  1. Scenario I: Group 1 censored growth curves are generated from
    and Group 2 censored growth curves are generated from
  2. Scenario II: the same growth curves and censoring times are generated according to Scenario I, with the exception of τ, which is replaced by
    where C is a random time independent of {ξ(t), t ≤ 6}, and distributed as the integer part of 6U1/2 + 1 with a uniform random variable U in (0,1).
    Both scenarios assume that {et, εt, t ≥ 1} are independent N(0, 1) variables, with ξ (0) = ζ (0) = 1, σ = 0.3, 0.5, and δ = 0 and λ is specified in Table 1.
    Table 1
    Power evaluation of three endpoints based on 10,000 simulations for two scenarios

To make the power comparisons meaningful, a randomization test was used for three endpoints to test equal treatment effects, which yields type I errors at a fixed level, say 0.05. Let XC and XT be the one of three endpoints for the control and treatment groups, respectively, and SnC,nT be the corresponding test statistic, where nC and nT are the sample sizes of the control and treatment groups, respectively. In the following simulations, a two-sample t-statistic, SnC,nT(XC,XT)=(X¯CX¯T)/σ^C2/nC+σ^T2/nT, was used for the endpoints of RTVC and aAUC, and two-sample U-statistic, SnC,nT(XC,XT)=i=1nCj=1nTDij/(nCnT), was used for the endpoint of cross-treatment difference, where Dij is the cross-treatment difference defined by Vardi et al. (2001). The algorithm to calculate a p-value using the randomization test is as follows:

  1. Choose M independent index vectors g*(1), g*(2), · , g*(M), each of which contains nC of XC = {X1C, · , XnCC } indexes and nT of XT = {X1T, ·;, XnTT} indexes, which are randomly selected from the set of all possible permuted vectors.
  2. Evaluate test statistic SnC,nT for each randomization sample of (g*(b), XC, XT) obtained in step 1, such that
    where SnC,nT (g*(b), XC, XT) is a realization of SnC,nT by taking the first nC vectors of the combined sample (XC, XT) according to the index g*(b) as the XC and the rest as XT.
  3. A two-sided p-value is given as
    where S0 > 0 is the observed value of SnC,nT and SnC,nT(·)=b=1MSnC,nT(b)/M.

The proportion of rejections of the null hypothesis H0: δ = λ, or equal treatment effect, represents the empirical power of the test. For a prespecified significance level α = 0.05 and sample size of 10 in each group, Table 1 lists the results based on 10,000 simulation runs and 2,000 randomization samples. The last two columns of the table represent the average proportions of missing measurements (MP) of the two groups and the average of the last time points with complete data (LTC) of the two groups. For both scenarios, Table 1 shows that the powers of the RTVC endpoint are much lower than those of the aAUC or CTD endpoints when the LTC is less than 4 weeks if σ = 0.3 and 2 weeks if σ = 0.5. That shows that using the tumor volume at the last time point with complete data could be very inefficient if some mice died or were euthanized early in the experiment. The empirical power of the proposed aAUC is comparable to that of the cross-treatment difference endpoint, which has been shown to be highly efficient by Vardi et al. (2001).

3.2 Comparison of Coverage Probability

The coverage probability is the probability that a confidence interval captures the true parameter, and it is estimated here as the proportion of cases in a simulation in which the calculated interval includes the true value. We chose to use the parameter configurations to represent different types of antitumor activity, from low to high activity. We calculated one-sided upper confidence limits. The sample size considered in the simulation was n=10 for each group, which is a typical sample size for tumor xenograft experiments. The standard deviations (σC, σT) of a normal or log-normal distribution used were (0.05, 0.05), (0.1, 0.1), (0.5, 0.5), (0.1, 0.2), and (0.2, 0.1). The mean parameters of a normal or log-normal distribution and shape (α) and rate (λ) parameters of a gamma distribution for the simulation study are listed in Table 2. For each parameter configuration, we generated 10,000 random samples from a normal distribution N(μg,σg2), or a log-normal distribution LN(μg,σg2), or a gamma(αg, λg) distribution, where g = C, T, and used 2,000 bootstrap samples. The simulated 95% coverage probabilities are shown in Table 3. The simulated results showed that the bootstrap BC interval had a low coverage probability, whereas the coverage probabilities of the bootstrap t-interval were close to the target nominal level (0.95), regardless of whether the underlying distribution was normal or skewed log-normal or gamma distributions.

Table 2
Parameter configurations for the simulation study
Table 3
95% Coverage probability based on 10,000 simulations for normal, log-normal and gamma data and 2,000 bootstrap samples used

4. A Real Tumor Xenograft Model

For a real example, we used data from a published single-agent tumor xenograft study conducted by the PPTP (Houghton, et al, 2007). In this study, vincristine was administered intraperitoneally at a dosage of 1 mg/kg every 7 days for 6 weeks and evaluated in 38 tumor cell lines as a single agent. For each tumor cell line, nearly 20 mice were equally randomized into control and treatment groups. Table 4 shows 22 solid tumor cell lines together with the T/C ratios and antitumor activity ratings published by Houghton et al. The T/C ratio was calculated on day 21 or when all mice in both groups still had measurable tumor volumes if less than 21 days. Table 4 also shows the estimated aAUC ratios, standard errors, and 95% upper limits of the bootstrap t-intervals. The PPTP study used arbitrary cutoff points for the T/C ratio antitumor activity rating. Agents producing a T/C ≤ 0.15 were considered highly active; those with T/C > 0.15 but ≤ 0.45 were considered to have intermediate activity; and those with T/C > 0.45 were considered to have low activity. The drawback to the use of such arbitrary cutoff point is obvious: it fails to consider the variations among different cell lines. Assessment of antitumor activity based on the interval approach is more intuitive. Whenever the upper confidence limit is lower than 1, significant antitumor activity is observed; otherwise, no significant antitumor activity can be claimed. From Table 4, we see that the antitumor activity assessments of the two methods produced agreement for 16 tumor lines, but not for four cell lines, Rh10, Rh30, Rh30R, and NB-1771, in which the T/C ratio showed a low activity but the aAUC ratio led to an active evaluation. In one cell line, D212, the T/C ratio showed an intermediate activity but the aAUC ratio led to an inactive evaluation. The discrepancy between the T/C ratio and the aAUC ratio for cell lines Rh10, Rh30, and Rh30R was because the T/C ratio was calculated early in the study and thus failed to capture the antitumor activity seen later (Figure 1). In contrast, the discrepancy between the T/C ratio and the aAUC ratio for cell lines NB-1771 and D212 was because the T/C ratio did not account for the variability of the different tumor lines, therefore leading to false-negative (NB-1771) and false-positive (D212) antitumor activity evaluations.

Figure 1
Solid lines are tumor volume profiles of control mice, and dotted lines are tumor volume profiles of treated mice.
Table 4
Antitumor activity evaluation for PPTP vincristine xenograft study

5. Discussion and Conclusion

The analysis of tumor xenograft experimental data presents several statistical challenges, such as incomplete longitudinal observations, small samples, and diverse tumor growth patterns. Existing methods either are based on rather restrictive normality assumptions or discard useful data and result in a loss of efficiency. The proposed aAUC ratio fully uses the experimental data and is therefore more efficient for evaluating antitumor activity. Furthermore, the proposed bootstrap t-interval is a nonparametric approach, which avoids the underlying distribution assumption and has a good coverage probability even for sample size as small as 10 per group. The upper limit of the bootstrap t-interval defines the effect of size and variability and therefore avoids the use of an arbitrary cutoff point. For the final conclusion, the proposed aAUC ratio not only efficiently quantifies tumor growth inhibition but also gives a simple assessment of antitumor activity for tumor xenograft experiments.


The authors are thankful to the editor and anonymous referees whose careful reading and constructive comments improved this article. This work was supported in part by National Cancer Institute (NCI) support grants CA21765 and NO1-CM-42216 and by the American Lebanese Syrian Associated Charities (ALSAC).


  • Corbett TH, White K, et al. Discovery and preclinical antitumor efficacy evaluations of LY32262 and LY33169. Investigational New Drugs. 2003;21:33–45. [PubMed]
  • Demidenko E. Mixed Model: Theory and Applications. Wiley; New York: 2004.
  • Demidenko E. The assessment of tumor response to treatment. Appl Statist. 2006;55:365–377.
  • Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman and Hall; New York: 1993.
  • Hothorn L. Statistical analysis of in vivo anticancer experiments: tumor growth inhibition. Drug Information Journal. 2006;40:229–238.
  • Houghton PJ, Morton CL, Gorlick R, et al. The Pediatric Preclinical Testing Program: description of models and early testing results. Pediatr Blood Cancer. 2007;49:928–940. [PubMed]
  • Liang H. Modeling antitumor activity in xenograft tumor treatment. Biometrical Journal. 2005;47:1–11. [PubMed]
  • Liang H. Comparison of antitumor activities in tumor xenograft treatment. Contemporary clinical trials. 2007;28:115–119. [PubMed]
  • Liang H, Sha NJ. Modeling antitumor activity by using a nonlinear mixed-effects model. Mathematical Biosciences. 2004;189:61–73. [PubMed]
  • Tan M, Fang HB, Tian GL, Houghton PJ. Small-sample inference for incomplete longitudinal data with truncation and censoring in tumor xenograft models. Biometrics. 2002;58:612620. [PubMed]
  • Vardi Y, Ying ZL, Zhang CH. Two-sample tests for growth curves under dependent right censoring. Biometrika. 2001;88:949960.