Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2964842

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Adjusted AUC Ratio and Interval Estimate
- 3. Simulation Studies
- 4. A Real Tumor Xenograft Model
- 5. Discussion and Conclusion
- References

Authors

Related links

Pharm Stat. Author manuscript; available in PMC 2010 October 27.

Published in final edited form as:

PMCID: PMC2964842

NIHMSID: NIHMS241365

The publisher's final edited version of this article is available at Pharm Stat

See other articles in PMC that cite the published article.

In preclinical cancer drug screening tumor xenograft experiments, the tumor growth inhibition ratio (*T/C*) is commonly used to assess the antitumor activity of the agents. Unfortunately, this measurement can discard useful data and result in a high false-negative rate. Furthermore, the degree of antitumor activity based on the *T/C* ratio is assessed on the basis of an arbitrary cutoff point that does not reflect variations in different tumor lines. To overcome these drawbacks, we propose an adjusted area-under-the-curve (aAUC) ratio to quantify tumor growth inhibition. A nonparametric bootstrap *t*-interval of the aAUC ratio is also proposed for assessing the significance of the anti-tumor activity of the agents. The proposed method is then applied to a real tumor xenograft study.

In preclinical cancer drug screening tumor xenograft experiments, human cancer cells are engrafted into mice to produce xenograft models. Tumor-bearing mice are randomized into control (*C*) and treatment (*T*) groups, and the maximum tolerated dose of each drug is administered. The volume of each tumor (one tumor per mouse) is measured at the initiation of study and periodically throughout the study. Mice are euthanized when their tumor volume reaches four times its initial volume, thus resulting in incomplete longitudinal tumor volume data. Because of the incompleteness of the experimental data, the *T/C* ratio (Corbett, et al., 2003, Houghton, et al., 2007) calculated at a given time with complete observations is commonly used to quantify tumor growth inhibition, where *T* and *C* represent the means of the relative tumor volumes of the control and treatment mice, respectively. Here, the relative tumor volume of each tumor is defined as the tumor volume divided by its initial volume. This approach could, however, discard some observed data and result in a high false-negative rate. Furthermore, the degree of antitumor activity is assessed on the basis of an arbitrary cutoff point (Corbett, et al., 2003, Houghton, et al. 2007). Several new statistical methods have recently been developed for assessing antitumor activity in tumor xenograft models. Tan et al. (2002) proposed a small-sample *t*-test via the EM algorithm. They assumed a multivariate normal distribution for the repeated log tumor volumes with a Toeplitz covariance matrix. Due to the strong model assumption, their method has limited application to drug screening tumor xenograft data. Furthermore, the method yields only a *p*-value and does not quantify tumor growth inhibition. Vardi et al. (2001) proposed a nonparametric two-sample U-test. The proposed methodology is a fully nonparametric approach, which is generally applicable to drug screening tumor xenograft data. The method, however, assesses the treatment effect by the cross-treatment difference instead of the ratio and yields a *p*-value only; no confidence intervals are available. Liang (2006) proposed a non-parametric approach to compare antitumor effects in two treatment groups. The approach is, in essence, a comparison of two tumor volume curves and yields a *p*-value only. Some sophisticated models have been proposed to fit the tumor growth curves; for example, a biexponential model (Demidenko, 2004, Liang and Sha, 2004), a LINEXP model (Demidenko, 2006), and a nonparametric model (Liang, 2005). However, it is often difficult to model tumor growth curves due to a short study period, small sample size, and diverse tumor growth patterns in tumor xenograft experiments. Hothorn (2006) proposed an interval approach for the *T/C* ratio. However, Hothorn’s interval is obtained on the basis of an assumed normal distribution of the tumor volume. Recently, the Pediatric Preclinical Testing Program (PPTP) established panels of childhood cancer xenografts and tumor lines for *in vivo* screening (Houghton, et al, 2007). The objective of the PPTP study was to identify novel agents with significant antitumor activity. The PPTP has produced a large amount of tumor xenograft screening data. Therefore, the development of an appropriate methodology for analysis of tumor xenograft data is important for the evaluation of existing and new childhood anticancer agents. It is also the motivation of this paper. Here we propose an adjusted area-under-the-curve (aAUC) ratio to quantify tumor growth inhibition. The aAUC ratio, unlike the *T/C* ratio, uses all observed data and retains high efficiency for evaluating antitumor activity. Furthermore, a nonparametric bootstrap *t*-interval (Efron and Tibshirani, 1993) is developed for the aAUC ratio to assess the significance of antitumor activity of the tested agent, which avoids the specific assumptions about the distribution of tumor volume and the tumor growth curve.

To make full use of the observed but incomplete tumor volume data, an aAUC is proposed. Tumor growth inhibition is then quantified by the ratio of the aAUCs of the treatment and control groups. To assess the significance of the treatment effect, two nonparametric bootstrap intervals are proposed in this section: a bootstrap bias-corrected (BC) interval and a bootstrap *t*-interval.

Consider a typical tumor growth experiment with two groups: control and treatment. Let *ξ* (*t*) be the relative tumor volume growth curve at time *t* *T* = [0, *t*^{*}] for a mouse; the associated right censoring time is *τ*. The censoring occurs because a mouse dies due to toxicity before the end of study or is euthanized when its tumor volume reaches four times its initial volume. Then the area-under-the-curve (AUC) could be calculated up to *τ* as

$$\text{AUC}={\int}_{T(\tau )}\xi (t)\nu (dt),$$

where *T* (*a*) = {*t: t* *T, t* ≤ *a*} and *ν* is a suitable measure on *T*. The raw AUCs are not comparable due to uneven censoring. Therefore, the AUC is adjusted by the length of the interval between the initiation of the experiment and the last time point with existing tumor volume measurements, and the resulting the aAUC is

$$\text{aAUC}=\frac{1}{\tau}{\int}_{T(\tau )}\xi (t)\nu (dt).$$

It is often difficult to model the growth curve *ξ* (*t*) explicitly due to diverse tumor growth patterns in xenograft experiments. Therefore, an empirical approach is used to calculate the aAUC. Specifically, assuming that there are *k* + 1 measured relative tumor volumes *ξ _{i}* for a tumor at time

$$\text{AUC}={\int}_{{t}_{0}}^{{t}_{k}}\xi (t)dt\simeq \frac{1}{2}\sum _{i=0}^{k-1}({\xi}_{i}+{\xi}_{i+1})({t}_{i+1}-{t}_{i}).$$

Then the aAUC can be calculated as

$$\text{aAUV}=\frac{1}{{t}_{k}-{t}_{0}}\text{AUC}.$$

The normality assumption of the aAUC is unreliable due to the skewness of the aAUC and small sample size. Therefore, a nonparametric approach is more appropriate and useful.

To simplify the notation, let *X _{C}* and

$$\gamma =\frac{{\mu}_{T}}{{\mu}_{C}},$$

which quantifies the tumor growth inhibition of the agent and can be estimated by

$$\widehat{\gamma}=\frac{{\overline{X}}_{T}}{{\overline{X}}_{C}},$$

where * _{C}* and

$$\theta =\mathit{log}(\gamma )=\mathit{log}({\mu}_{T})-\mathit{log}({\mu}_{C}).$$

An estimate of the standard error of = *log*() can be obtained by the Delta method as

$$\widehat{se}(\widehat{\theta})\simeq {(\frac{1}{{\overline{X}}_{T}^{2}}\frac{{\widehat{\sigma}}_{T}^{2}}{{n}_{T}}+\frac{1}{{\overline{X}}_{C}^{2}}\frac{{\widehat{\sigma}}_{C}^{2}}{{n}_{C}})}^{1/2},$$

(1)

where
${\widehat{\sigma}}_{C}^{2}$, *n _{C}* and
${\widehat{\sigma}}_{T}^{2}$,

The 100(1−*α*)% one-sided upper confidence limits of bootstrap intervals can be estimated by following bootstrap procedures.

- Generate
*B*independent bootstrap samples of aAUC from each group, ${X}_{C}^{\ast b}=\{{X}_{1C}^{\ast b},\cdots ,{X}_{{n}_{C}C}^{\ast b}\}$ and ${X}_{T}^{\ast b}=\{{X}_{1T}^{\ast b},\cdots ,{X}_{{n}_{T}T}^{\ast b}\}$,*b*= 1,···,*B*. - Compute the bootstrap replication
^{*}, where ${\widehat{\gamma}}^{\ast b}={\overline{X}}_{T}^{\ast b}/{\overline{X}}_{C}^{\ast b}$ for^{b}*b*= 1, ···,*B*. - A 100(1−
*α*)% bootstrap bias-corrected (BC) percentile interval upper limit is obtained directly from bootstrap distribution*Ĝ*(*s*) = #{<^{*b}*s*}/*B*of {,^{*b}*b*= 1, ···,*B*} as$${\widehat{\gamma}}_{\mathit{upper}}={\widehat{G}}^{-1}({\alpha}_{1}),$$where*α*_{1}= Φ (2*z*_{0}+*z*_{1−}) and_{α}*z*_{0}= Φ^{−1}(*Ĝ*()), Φ (·) is the standard normal distribution. - A 100(1−
*α*)% bootstrap*t*-interval upper limit is obtained directly from bootstrap sample$${t}^{\ast b}=\frac{{\widehat{\theta}}^{\ast b}-\widehat{\theta}}{{\widehat{se}}^{\ast b}},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}b=1,\cdots ,B,$$where^{*}=^{b}*log(*^{*}) and ${\widehat{se}}^{\ast b}$ is calculated using (1) for the bootstrap sample { ${X}_{T}^{\ast b},{X}_{C}^{\ast b}$^{b}*b*= 1, ···,*B*}. Let the*α*percentile of {^{th}*t*,^{*b}*b*= 1, ···,*B*} be estimated by the value ofsuch that #{_{α}*t*<^{*b}=_{α}*α*, then upper limit of the bootstrap*t*-interval is given by$${\widehat{\gamma}}_{\mathit{upper}}=\widehat{\gamma}\mathit{exp}(-{\widehat{t}}_{\alpha}\widehat{se}).$$

In this section, we designed simulation studies to investigate the power to distinguish treatments by using three endpoints: a) the relative tumor volume at the last time point with complete data from the two groups (RTVC), b) the aAUC, and c) the cross-treatment difference (CTD) defined by Vardi et al. (2001). We also conducted a simulation to study the coverage probability of the proposed nonparametric bootstrap intervals for the aAUC ratio.

Two simulation scenarios are considered to assess the powers of the three endpoints mentioned above, one with equal censorship (scenario I), meaning the censoring times for the two groups have the same distribution. However, this kind of homogeneity assumption may be unrealistic in some circumstances. For example, treatment group mice could have a slower growth rate and higher mortality rate than control group mice. Thus, it is desirable to consider heterogeneous censorship (scenario II), with different censoring distributions for the two groups. To yield simulation data similar to the real tumor xenograft data, tumor volumes for both scenarios were generated by following exponential growth curves, which are the typical for control mice. The error terms reflect measurement errors and resulting correlated longitudinal tumor volumes. The *σ* measures the departure from the exponential form of the growth curve. We generated 10 tumor growth curves for each group with tumor volume at each week for a 6-week study period. Tumor volumes were treated as missing when the tumor volume exceeded four times its initial volume.

*Scenario I*: Group 1 censored growth curves are generated from$$\xi (t)=exp\left(\delta t+\sigma \sum _{s=1}^{t}{e}_{s}\right),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\tau ={\mathit{inf}}_{t\in \{1,2,3,4,5\}}\{t:\xi (t)\ge 4\phantom{\rule{0.16667em}{0ex}}\text{or}\phantom{\rule{0.16667em}{0ex}}t=6\},$$and Group 2 censored growth curves are generated from$$\zeta (t)=exp\left(\lambda t+\sigma \sum _{s=1}^{t}{\epsilon}_{s}\right),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\eta =in{f}_{t\in \{1,2,3,4,5\}}\{t:\zeta (t)\ge 4\phantom{\rule{0.16667em}{0ex}}\text{or}\phantom{\rule{0.16667em}{0ex}}t=6\},$$*Scenario II*: the same growth curves and censoring times are generated according to Scenario I, with the exception of*τ*, which is replaced by$$\tau =in{f}_{t\in \{1,2,3,4,5\}}\{t:\xi (t)\ge 4\phantom{\rule{0.16667em}{0ex}}\text{or}\phantom{\rule{0.16667em}{0ex}}t\le C\},$$where*C*is a random time independent of {*ξ*(*t*),*t*≤ 6}, and distributed as the integer part of 6*U*^{1/2}+ 1 with a uniform random variable*U*in (0,1).Both scenarios assume that {*e*≥ 1} are independent_{t}, ε_{t}, t*N*(0, 1) variables, with*ξ*(0) =*ζ*(0) = 1,*σ*= 0.3, 0.5, and*δ*= 0 and*λ*is specified in Table 1.

To make the power comparisons meaningful, a randomization test was used for three endpoints to test equal treatment effects, which yields type I errors at a fixed level, say 0.05. Let *X _{C}* and

- Choose
*M*independent index vectors*g*^{*}(1)*, g*^{*}(2), · ,*g*^{*}(*M*), each of which contains*n*of_{C}*X*= {_{C}*X*_{1}, · ,_{C}*X*} indexes and_{nCC}*n*of_{T}*X*= {_{T}*X*_{1}, ·;,_{T}*X*} indexes, which are randomly selected from the set of all possible permuted vectors._{nTT} - Evaluate test statistic
*S*_{nC,nT}for each randomization sample of (*g**(*b*),*X*,_{C}*X*) obtained in step 1, such that_{T}$${S}_{{n}_{C},{n}_{T}}^{\ast}(b)={S}_{{n}_{C},{n}_{T}}({g}^{\ast}(b),{X}_{C},{X}_{T}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}b=1,\dots ,M,$$where*S*_{nC,nT}(*g**(*b*),*X*,_{C}*X*) is a realization of_{T}*S*_{nC,nT}by taking the first*n*vectors of the combined sample (_{C}*X*,_{C}*X*) according to the index_{T}*g**(*b*) as the*X*and the rest as_{C}*X*._{T} - A two-sided
*p*-value is given as$$\widehat{p}=\frac{{\sum}_{b=1}^{M}I(\mid {S}_{{n}_{C},{n}_{T}}^{\ast}(b)-{S}_{m,n}^{(\xb7)}\mid \phantom{\rule{0.16667em}{0ex}}\ge \phantom{\rule{0.16667em}{0ex}}\mid {S}^{0}-{S}_{{n}_{C},{n}_{T}}^{(\xb7)}\mid )}{M},$$where*S*^{0}> 0 is the observed value of*S*_{nC,nT}and ${S}_{{n}_{C},{n}_{T}}^{(\xb7)}={\sum}_{b=1}^{M}{S}_{{n}_{C},{n}_{T}}^{\ast}(b)/M$.

The proportion of rejections of the null hypothesis *H*_{0}: *δ* = *λ*, or equal treatment effect, represents the empirical power of the test. For a prespecified significance level *α* = 0.05 and sample size of 10 in each group, Table 1 lists the results based on 10,000 simulation runs and 2,000 randomization samples. The last two columns of the table represent the average proportions of missing measurements (MP) of the two groups and the average of the last time points with complete data (LTC) of the two groups. For both scenarios, Table 1 shows that the powers of the RTVC endpoint are much lower than those of the aAUC or CTD endpoints when the LTC is less than 4 weeks if *σ* = 0.3 and 2 weeks if *σ* = 0.5. That shows that using the tumor volume at the last time point with complete data could be very inefficient if some mice died or were euthanized early in the experiment. The empirical power of the proposed aAUC is comparable to that of the cross-treatment difference endpoint, which has been shown to be highly efficient by Vardi et al. (2001).

The coverage probability is the probability that a confidence interval captures the true parameter, and it is estimated here as the proportion of cases in a simulation in which the calculated interval includes the true value. We chose to use the parameter configurations to represent different types of antitumor activity, from low to high activity. We calculated one-sided upper confidence limits. The sample size considered in the simulation was *n*=10 for each group, which is a typical sample size for tumor xenograft experiments. The standard deviations (*σ _{C}*,

For a real example, we used data from a published single-agent tumor xenograft study conducted by the PPTP (Houghton, et al, 2007). In this study, vincristine was administered intraperitoneally at a dosage of 1 mg/kg every 7 days for 6 weeks and evaluated in 38 tumor cell lines as a single agent. For each tumor cell line, nearly 20 mice were equally randomized into control and treatment groups. Table 4 shows 22 solid tumor cell lines together with the *T/C* ratios and antitumor activity ratings published by Houghton et al. The *T/C* ratio was calculated on day 21 or when all mice in both groups still had measurable tumor volumes if less than 21 days. Table 4 also shows the estimated aAUC ratios, standard errors, and 95% upper limits of the bootstrap *t*-intervals. The PPTP study used arbitrary cutoff points for the *T/C* ratio antitumor activity rating. Agents producing a *T/C* ≤ 0.15 were considered highly active; those with *T/C* > 0.15 but ≤ 0.45 were considered to have intermediate activity; and those with *T/C* > 0.45 were considered to have low activity. The drawback to the use of such arbitrary cutoff point is obvious: it fails to consider the variations among different cell lines. Assessment of antitumor activity based on the interval approach is more intuitive. Whenever the upper confidence limit is lower than 1, significant antitumor activity is observed; otherwise, no significant antitumor activity can be claimed. From Table 4, we see that the antitumor activity assessments of the two methods produced agreement for 16 tumor lines, but not for four cell lines, Rh10, Rh30, Rh30R, and NB-1771, in which the *T/C* ratio showed a low activity but the aAUC ratio led to an active evaluation. In one cell line, D212, the *T/C* ratio showed an intermediate activity but the aAUC ratio led to an inactive evaluation. The discrepancy between the *T/C* ratio and the aAUC ratio for cell lines Rh10, Rh30, and Rh30R was because the *T/C* ratio was calculated early in the study and thus failed to capture the antitumor activity seen later (Figure 1). In contrast, the discrepancy between the *T/C* ratio and the aAUC ratio for cell lines NB-1771 and D212 was because the *T/C* ratio did not account for the variability of the different tumor lines, therefore leading to false-negative (NB-1771) and false-positive (D212) antitumor activity evaluations.

Solid lines are tumor volume profiles of control mice, and dotted lines are tumor volume profiles of treated mice.

The analysis of tumor xenograft experimental data presents several statistical challenges, such as incomplete longitudinal observations, small samples, and diverse tumor growth patterns. Existing methods either are based on rather restrictive normality assumptions or discard useful data and result in a loss of efficiency. The proposed aAUC ratio fully uses the experimental data and is therefore more efficient for evaluating antitumor activity. Furthermore, the proposed bootstrap *t*-interval is a nonparametric approach, which avoids the underlying distribution assumption and has a good coverage probability even for sample size as small as 10 per group. The upper limit of the bootstrap *t*-interval defines the effect of size and variability and therefore avoids the use of an arbitrary cutoff point. For the final conclusion, the proposed aAUC ratio not only efficiently quantifies tumor growth inhibition but also gives a simple assessment of antitumor activity for tumor xenograft experiments.

The authors are thankful to the editor and anonymous referees whose careful reading and constructive comments improved this article. This work was supported in part by National Cancer Institute (NCI) support grants CA21765 and NO1-CM-42216 and by the American Lebanese Syrian Associated Charities (ALSAC).

- Corbett TH, White K, et al. Discovery and preclinical antitumor efficacy evaluations of LY32262 and LY33169. Investigational New Drugs. 2003;21:33–45. [PubMed]
- Demidenko E. Mixed Model: Theory and Applications. Wiley; New York: 2004.
- Demidenko E. The assessment of tumor response to treatment. Appl Statist. 2006;55:365–377.
- Efron B, Tibshirani RJ. An introduction to the bootstrap. Chapman and Hall; New York: 1993.
- Hothorn L. Statistical analysis of in vivo anticancer experiments: tumor growth inhibition. Drug Information Journal. 2006;40:229–238.
- Houghton PJ, Morton CL, Gorlick R, et al. The Pediatric Preclinical Testing Program: description of models and early testing results. Pediatr Blood Cancer. 2007;49:928–940. [PubMed]
- Liang H. Modeling antitumor activity in xenograft tumor treatment. Biometrical Journal. 2005;47:1–11. [PubMed]
- Liang H. Comparison of antitumor activities in tumor xenograft treatment. Contemporary clinical trials. 2007;28:115–119. [PubMed]
- Liang H, Sha NJ. Modeling antitumor activity by using a nonlinear mixed-effects model. Mathematical Biosciences. 2004;189:61–73. [PubMed]
- Tan M, Fang HB, Tian GL, Houghton PJ. Small-sample inference for incomplete longitudinal data with truncation and censoring in tumor xenograft models. Biometrics. 2002;58:612620. [PubMed]
- Vardi Y, Ying ZL, Zhang CH. Two-sample tests for growth curves under dependent right censoring. Biometrika. 2001;88:949960.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |