PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Lifetime Data Anal. Author manuscript; available in PMC 2010 November 26.
Published in final edited form as:
Lifetime Data Anal. 2004 June; 10(2): 103–120.
PMCID: PMC2992553
NIHMSID: NIHMS252528

Two-Sample Statistics for Testing the Equality of Survival Functions Against Improper Semi-parametric Accelerated Failure Time Alternatives: An Application to the Analysis of a Breast Cancer Clinical Trial

Abstract

This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests.

Keywords: accelerated failure time models, cure rate model, improper model, semi-parametric model

1. Introduction

In recent years, there has been a renewed interest in methods for analyzing survival data with long-term survivors fraction or a ‘cure fraction’ (for a review, see Maller and Zhou, 1996). Most of these methods attempt to distinguish between the different mechanisms by which a prognostic factor may act on the event's occurrence. Indeed, a prognostic factor may affect either the probability of never experiencing the event of interest (termed ‘long-term effect’ in the following text) or the time to occurrence of the event (termed ‘short-term effect’ in the following text), or both.

Testing procedures for these effects were proposed in a recent paper where we assumed proportional hazards for the short-term effect (Broët et al., 2001). In the present paper, we extend this procedure to a non-proportional hazards behavior of the short-term effect. This work was motivated by the analysis of a breast cancer randomized trial comparing the distribution of the disease-free interval in two treatments groups where Kaplan–Meier curves (Kaplan and Meier, 1958) cross each other during the follow-up (see Figure 1). The two groups are defined according to the different way of administration of the same chemotherapy which is scheduled either to follow (adjuvant) or precede (primary or neo-adjuvant) the local regional treatment. The question addressed in Section 5 is whether primary chemotherapy, which shrinks the tumor before local treatment, modifies the timing of the recurrence as compared to adjuvant chemotherapy, taking a long-term recurrence-free rate into account. This situation was quite puzzling and prompted us to derive test statistics suited for this case. As it will be seen, the proposed tests provide interesting results that would have been overlooked if only results from classical tests were considered.

Figure 1
Kaplan–Meier estimates of the recurrence-free interval according to the group of treatment.

In the literature, the most common approach for modeling failure-time data with a cure fraction relies on the assumption that the overall distribution of the survival times is a mixture of two components: one corresponding to the subjects who are not susceptible (‘cured’ subjects) and the other to the subjects who are susceptible of experiencing the event (‘uncured’ subjects). In this setting, most of the published non-parametric procedures which are convenient for the two-sample comparison do not allow testing for both effects (short-term and long-term) (Gray and Tsiatis, 1989; Laska and Meisner, 1992; Sposto et al. 1992; Lee, 1995). Others require complex computations which are too heavy for a practical use in routine (Kuk and Chen, 1992; Taylor, 1995).

A different approach which overcomes these drawbacks relies on models that define the cumulative hazard as a bounded increasing positive function in a parametric (Aalen, 1992; Cantor and Shuster, 1992; Yakovlev and Tsodikov, 1996) or semi-parametric way (Tsodikov, 1998; Shen and Sinha, 2002; Ibrahim, 1999, 2001; Broët et al., 2001). Such semi-parametric modeling was used in our previous work to test for no short-term or no long-term effect against improper short-term proportional hazard alternatives (Broët et al., 2001). In this paper, the restrictive proportional hazard short-term effect assumption is relaxed and statistics are proposed for improper short-term accelerated failure time alternatives.

In Section 2, a semi-parametric improper accelerated failure time model is described. In Section 3, the proposed score statistics are derived from a Cox model with a time-dependent covariate. In Section 4, we present the results of simulation experiments. In Section 5, the clinical relevance of these tests is demonstrated by the analysis of a breast cancer clinical trial with long-term follow-up. Section 6 contains a discussion and guidelines for the use of the tests.

2. The Semi-Parametric Improper Accelerated Failure Time Model

Let i = 0, 1 denote the two groups to be compared, with ni subjects in group i (n = n0 + n1). For each patient j, let the random variables Tj and Cj be the survival and censoring times which are assumed to satisfy the condition of independent censoring (Fleming and Harrington, 1991, pp. 26–27). We denote Xj = min(Tj, Cj) the observed time of follow-up, δj = 1{Xj=Tj} the indicator of death, Yj(t) = 1{tXj} the indicator of being at rist at time t, and Zj the indicator variable of group 1. For the subject j, the data consist of Xj, δj and Zj. The hazard function of Tj corresponding to every subject j belonging to group i is denoted by: λi(t) = fi(t)/Si(t), where fi(t) and Si(t) are the probability density function and the survival function, respectively. The corresponding cumulative hazard function is denoted by Λi(t) = −log[Si(t)]

A semi-parametric improper model is defined by the following general survival function in group i:

Si(t)=exp{θeβ1i[1A(t,β2i)]}
(1)

where A(t, β2i) is a function decreasing with time from one to zero, which is similar to a survival function, and where θ is a positive parameter. The function Si(t) is improper and its limiting value exp(−θeβli) is called the tail defect and represents the probability of not experiencing the event of interest in group i. The cumulative hazard Λi(t) = θeβ1i[1 – A(t, β2i)] is less than or equal to θeβ1i.

The model (1) has two components: the first term containing β1 which quantifies the long-term effect and the function A(t, β2i) which expresses the short-term effect. More precisely, if β1 = 0, the two groups have the same cure fraction (no long-term effect) and if β2 = 0, the model reduces to a proportional hazard model. In this case, the relative risk is constant over time which implies no short-term effect.

A particular case of (1) was considered in a previous work (Broët et al., 2001), where we assumed a proportional hazard modeling of the short-term effect, so that in the general formulation (1) given here, A(t, β2i) = A(t)eβ2i. Here, we consider another particular case of (1) where a non-proportional hazards model is assumed for the short-term effect by letting: A(t, β2i) = exp [− K(t)eβ2i] where K(t) is a positive function increasing with time from zero to infinity. This is an obvious semi-parametric generalization of the case of two Weibull distributions differing in their shape parameters. The resulting model

Si(t)=exp{θeβ1i[1exp(K(t)eβ2i)]}
(2)

has the following property. In case of no long-term effect (β1 = 0) and with a short-term effect such as β2 < 0, the survival functions S0(t) and S1(t) cross before converging to the same long-term survivor fraction.

3. Proposed Test Statistics

In this section, statistics are derived for testing (β1 = 0) and/or (β2 = 0) in model (2). The derivation is achieved by using a proportional hazards model with a time-dependent covariate which approximates (2) about (β2 = 0 and which serves as a basis for computing the desired statistic. They are easily computed as score statistics from the usual partial likelihood. The null hypotheses to be tested are: H0: (β1 = β2 = 0); H00: (β2 = 0) and H000: (β1 = 0).

3.1. Method for Deriving the Test Statistics

Now, we define the following quantity: D(t,β2i)=tA(t,β2i) which refers to the density function related to A(t, β2i). The general model (1) can be written in terms of the hazard functions λ0(t) and λ1(t):

log[λ1(t)λ0(t)]=β1+log[D(t,β2)D(t,0)]
(3)

Expanding log(D(t, β2)) about β2 = 0 in (3) gives the following first-order approximation:

log[λ1(t)λ0(t)]=β1+β2w(t)
(4)

with

w(t)=[β2logD(t,β2)β2=0]

Under the improper short-term accelerated failure time model (2), w(t) is equal to

w(t)=1+log[logA(t)]+[logA(t)]log[logA(t)]

where

A(t)=[1Λ0(t)θ].
(5)

In case of improper short-term proportional hazard model, w(t) = 1 + [log A(t)]. Substituting Λ0(t) and θ in (5) by efficient estimators under the null hypothesis to be tested provides estimates ŵ(t) of w(t). These estimators are presented for each null hypothesis H0, H00 and H000 in the next subsection. Replacing w(t) by ŵ(t) in (4), defines a time-dependent proportional hazards model with the internal time-dependent covariate ŵ(t) (Kabfleisch and Prentice, 1980). The proposed statistics for testing (β1 = 0) and/or (β2 = 0) can be easily derived as the score statistics from this time-dependent proportional hazards model through the corresponding partial likelihood. It can be easily shown that the resulting score statistics for testing the lack of short-term effect with or without a long-term effect are the same as in model (2), which would not be the case for the likelihood ratio or Wald tests. Moreover, the proposed score statistic for testing for no long-term effect can be easily derived while similar derivation from model (2) would be at least burdensome.

The resulting score statistics depend on the unknown parameters Λ0(t) and θ. Replacing Λ0(t) and θ by efficient estimators and applying the results of Pierce (Pierce, 1982) to our setting as presented in an earlier work (Broët et al., 2001) for improper short-term proportional hazards model allows us to obtain the asymptotic distributions of the proposed statistics.

Score statistics for testing H0 and H00 are asymptotically distributed as chi-squares with two degrees and one degree of freedom, respectively. Concerning H000, it should be noted that the corresponding score statistic depends on an estimate of β2 as seen in the next section. As the score statistic is derived from model 3 (based on a first-order approximation) which is valid under β2 = 0, the score statistic is approximately distributed as a Χ2 with one degree of freedom for small values of β2. This is not the case for the two other tests that do not depend on β2 under their corresponding null hypothesis. For the validity of the results it is required that the upper bound of the domain for which the survival distribution of the survival time variable is greater than zero, be less than the upper bound of the censoring distribution. In practice, this condition expresses the fact that the susceptible subjects should experience the event within the maximum length of follow-up. It should be stressed that the distribution of the score statistics for testing H0 and H00 is a chi-square distribution no matter whether the sufficient follow-up condition holds true or not. Indeed, the null hypotheses H0 and H00 do not involve A(t) and are identical under the two models.

3.2. Score Tests

3.2.1. Testing the Lack of Short and Long-Term effect

The components of the score vector for testing H0: β1 = β2 = 0 can be written as follows:

V^H0,1=logLβ1=j=1nδj{Zjk=1nYk(tj)Zkk=1nYk(tj)}V^H0,2=logLβ2=j=1nδjw^(tj){Zjk=1nYk(tj)Zkk=1nYk(tj)}

In VH0,2, ŵ(t) is computed as indicated in Section 3.1 using the left-continuous version of the Nelson–Aalen estimator (Nelson, 1972; Aalen, 1978) for Λ^0(t) and using its value computed at the last observed failure time for θ^. The corresponding observed information matrix ÎH0 under H0 is given in the Appendix.

Under H0, the statistic SH0=[V^H0,1,V^H0,2]I^H01[V^H0,1,V^H0,2] is asymptotically distributed as a chi-square with two degrees of freedom.

3.2.2. Testing the Lack of Short-Term Effect

The components of the score vector for testing H00: β2 = 0, for any β1 can be written as follows:

V^H00,1=logLβ1=0V^H00,2=logLβ2=j=1nδjw^(tj){Zjk=1nYk(tj)eβ^1ZkZkk=1nYk(tj)eβ^1Zk}

In VH00,2, V^H00,2,β^1 is the usual partial likelihood estimator of β1 under H00; ŵ(t) is computed by using the left-continuous version of the Breslow's estimator [Breslow, 1972,1974] for Λ^0(t) under H00 and for θ^ its value computed at the last observed failure time. The Breslow's estimator for Λ0(t) under H00 is given by

k=1nδk[j=1nYj(tk)]1.

The corresponding observed information matrix ÎH00 under H00 is given in the Appendix.

Under H00, the statistic SH00=[0,V^H00,2]I^H001[0,V^H00,2] is asymptotically distributed as a Χ2 with one degree of freedom.

3.2.3. Testing the Lack of Long-Term Effect

The components of the score vector for testing H000: β1 = 0, for any β2 can be written as follows:

V^H000,1=logLβ1=j=1nδj{Zjk=1nYk(tj)eβ^2w^(tj)ZkZkk=1nYk(tj)eβ^2w^(tj)Zk}V^H000,2=logLβ2=0

where VH000,1 is obtained by using ŵ(t) as given in Section 3.2.

A derivation of ŵ(t) and β^2 could be achieved through an iterative procedure. For computational simplicity, the first-step estimators are used in the proposed score statistic. The procedure is as follows. At first step, ŵ(t) is taken under H0 where the cumulative baseline hazard is replaced by the Nelson–Aalen estimator and β^2 is taken as the partial likelihood estimator obtained in the corresponding time-dependent model. At second step, β^2 thus obtained, is used to update ŵ(t) usingthe left-continuous version of a Breslow's type estimator given by

k=1nδk[j=1nYj(tk)e[Zjβ^2w^(tk)]]1

for Λ^0(t). As mentioned above, the proposed statistic is computed by using the first-step estimator β^2 together with ŵ(t). This implies that V^H00,2 does not vanish but is taken to be null in the computation. The corresponding observed information matrix ÎH000 is given in the Appendix.

For small values of β2, under H000, the statistic SH00=[V^H00,1,0]I^H001[V^H00,1,0] is approximately distributed as a Χ2 with one degree of freedom.

4. Simulation Study

4.1. Method

A simulation study was performed to investigate the power properties of the proposed tests in comparison with classical tests such as the Logrank test (LR) (Peto and Peto, 1972) and the Peto–Prentice–Wilcoxon test (PPW) (Kabfleisch and Prentice, 1980). The proposed tests of H0, H00 and H000 are denoted SLT, ST and LT, respectively. We also consider the test for no short- and no long-term effect (SLT-PH) designed for improper short-term proportional hazard alternatives [Broët et al., 2001]. In addition, the product-limit test (PL) which is a non-parametric test of no difference in the cure fraction (Sposto, Sather and Baker, 1992) is also considered.

Data were generated to mimic a simple randomized clinical trial with two different models: (A) improper short-term accelerated failure time model, (B) improper short-term proportional hazard model. Survival times were generated according to model (1) with A(t, β2i) = exp(−teβ2i) and A(t, β2i) = exp(−teβ2i) for the proportional hazards and accelerated failure time model, respectively. Censoring times were independently generated from a uniform distribution over [0, u]. It worth noting that in the uniform censoring case, a susceptible subject may not experience the event of interest within the follow-up time u. For each set of parameter values u can be easily computed so as to ensure a given percentage of censoring. The percentage of censoring refers only to the percentage of censored observations without the cure fraction exp(−θ). The number of subjects per group was chosen to be 100. The following configurations were considered: exp(−θ) = 0.3, 0.5, 0.7, 0%, 20% and 40% censoring; eβ1 = 2/3, 1, 3/2 and eβ2 = 0.5, 1, 2. For 20% censoring as specified above, the actual rate of censoring was 44%, 60% and 76% for each plateau value. For 40% censoring, the actual rate of censoring was 58%, 70% and 82% for each plateau value. For each configuration, 1,000 replications were performed and the levels and powers of all tests were estimated at the nominal level of 0.05.

4.2. Results

Tables 1(a–c) display the results for model (A) whereas Tables 2(a–c) display those for model (B).

Table 1
Simulation results for the improper short-term accelerated failure time model with (a) no censoring, (b) 20% censoring and (c) 40% censoring.
Table 2
Simulation results for the improper short-term proportional hazard model with (a) no censoring, (b) 20% censoring and (c) 40% censoring.

Table 1(a) shows the results obtained in the uncensored case. Except for the LT test, the estimated level of each test under its proper null hypothesis is within the binomial range [0.036, 0.064]. In the presence of a short-term effect, the observed levels for the LT test increase up to 10%. The test of no short-term and long-term effect (SLT) shows a strongly increased power relative to LR, PPW and SLT-PH in the presence of a short-term effect. The power gains are striking for no, or small differences in the long-term effects. As compared to LR, the SLT test is in some cases 10 times more powerful. However, it is well known that the LR test is not suited for such situations where survival curves cross. In case of no difference in short-term effects, the power of this latter test is slightly decreased relative to that of the LR. In any case it is less than 12% lower than that of the LR test. Power values of the ST test are very close to the SLT. Regarding the long-term effect, the PL test is more powerful than LT and LR. Power of these two latter tests is quite close.

Table 1(b) shows the results obtained with a 20% censoring rate. The observed levels of the SLT and ST tests do not exceed the binomial bounds. This is not the case for LT and PL where the observed level is increased up to 9% in case a short-term effect exists. Concerning the power, it appears that the trends observed in the uncensored case remain almost unchanged. Power gains for the ST and SLT relative to LR are lower than in the uncensored case, but still remain impressive as compared to LR. For the LT, the magnitude of the power values is lower than in the uncensored case and is always under those of the PL test.

The results obtained at a 40% censoring rate are shown in table 1(c). Regarding the SLT and ST tests, empirical significance levels appear to be close to the nominal level, and power gains are less pronounced than at lower censoring rates. However, with a short-term effect, the magnitude of the power gain remains high. Concerning the LT test, the observed levels are appreciably higher than the nominal level in the presence of a short-term effect. For the PL test, the observed type I error rate is slightly increased up to 8% in case of an existing non-null short-term effect and power gains are less than those observed at a lower censoring rates.

Tables 2(a–c), show the results for model (B). Estimated type I error was very close to the nominal significance level of 0.05 for the SLT and the ST test in every configuration. This is not the case for the LT test which always yields higher observed levels than the nominal level with values markedly increased in some situations. To a lesser extent, a similar trend is observed for the PL test for which observed level is increased up to 18% in case of an existing non-null short-term effect and a high censoring rate. It is worth noting that in this short-term proportional hazards situation, the loss of power of the SLT test remains small as compared to the LR.

Concerning the type I error of the proposed tests, it should be stressed that the null hypotheses H0 and H00 involve neither θ^ nor ŵ(t). As a result, the SLT and ST tests maintain a correct type I error which is not the case for the LT test under the corresponding null hypothesis when β2 is not null and the model is not the correct one. In the case of no short-term effect where the estimated level is close to the nominal one it appears that the power of the LT test is not dramatically decreasing as compared to the other tests even if in this case the uniform censoring is likely to hinder the long-term effect.

We performed additional simulations with small sample size (not shown here) and, as expected, it leads to a decrease in power which is more pronounced with a high censoring rate.

5. Application

In this section, we consider a clinical trial on breast cancer disease.

5.1. Primary Chemotherapy Trial

The aim of the present analysis was to investigate short-term and long-term effects of primary chemotherapy on disease recurrence by the proposed tests in a mature trial with more than ten years of follow- up. The so-called ‘S6-trial’ (Scholl et al., 1994) was conducted to assess whether primary chemotherapy improved survival, as compared to the same chemotherapy scheduled to follow the local regional treatment (adjuvant chemotherapy). Premenopausal breast cancer patients were included between October 1986 and June 1990, and randomized to receive either primary or adjuvant chemotherapy. The criteria for inclusion were as follows: non-metastatic operable breast tumors, largest tumor diameter between 3 and 7 cm, axillary lymph nodes not involved clinically, or involved but not adherent, no prior cancer, no serious concomitant illness. Bilateral, inflammatory or locally advanced breast cancers were not eligible. Two hundred breast cancer patients received primary chemotherapy and 190 adjuvant chemotherapy. Chemotherapy was started either after completion of the initial assessment (primary) or within 2 weeks of ending the local regional therapy (adjuvant). It consisted of four monthly cycles of intravenous cyclophosphamide, doxorubicin and 5-fluorouracil. Following random assignment to primary or adjuvant chemotherapy, patients were reviewed every 3 months for a year, then every 6 months during the first 5 years following the treatment and at least annually thereafter.

In what follows, we focus on the recurrence-free interval and not on the overall survival which was considered in a previous paper (Scholl et al., 1994). The recurrence-free interval is defined as the time from randomization until progression on the first observation of tumor recurrence (local, regional, distant).

5.2. Results

The median follow-up was 105 months. The 5-year recurrence-free interval rates were 60% [53–67] for patients treated with primary chemotherapy and 55% [48–63] for those treated with adjuvant chemotherapy. The 10-year survival rates were 40% [32–51] for patients treated with primary chemotherapy and 42% [35–50] for those treated with adjuvant chemotherapy. At the end of follow-up and for the 390 patients under study, 208 patients experienced a recurrence of the disease.

Figure 1 displays the Kaplan–Meier estimates of the recurrence-free interval by the treatment group. It shows a plateau value (i.e. long-term fraction) in the survival curves after 10 years, which is not surprising since most of the local and distant recurrences occur in the first decade (Bland and Copeland, 1998). Thus, an improper model appears well suited for these data.

Figure 2 displays the estimated survival function Ai(t). The empirical estimate for Ai(t) = [1 – (Λi(t)/θi)] in each group is obtained by replacing Λi(t) and θi by the Nelson–Aalen estimator and its value at the last observed failure time, respectively. This plot provides an informal assessment of the proportional hazards hypothesis for the short-term effect. It appears that the two survival functions cross, clearly indicating a non-proportional short-term effect.

Figure 2
Estimated survival function A(t) according to the group of treatment.

In what follows, we present the results of the proposed statistics together with those obtained with the classical logrank statistic and the Peto–Prentice–Wilcoxon. We also provide the results of the SLT-PH test.

When testing for differences in recurrence-free interval, the logrank test (χ12<0.0001,P=0.99) the PPW (χ12=0.10,P=0.78) and the SLT-PH tests (χ22=0.59,P=0.75) are not significant. When testing for an overall effect with an accelerated failure time short-term effect, the SLT test is close to the significance (χ22=4.14,P=0.13). When testing for a short-term effect, the ST test is significant (χ12=4.13,P=0.94). No short-term effect was detected when using the LT test (χ12<0.01,P=0.94) or the plateau test (PL:χ12=0.08,P=0.77). These latter results agree with figure 2 which indicates a non-constant short-term effect. From these results, the diseases's recurrences have been significantly delayed by primary chemotherapy but without a benefit on long-term recurrence rate as compared to classical adjuvant chemotherapy.

6. Discussion

Survival data with long-term survivors requires extension of existing test statistics for analyzing short and long-term effects of a prognostic factor. In this paper, we propose new score tests well suited for different types of departures from equality of survival distributions with long-term survivors. These tests are related to improper short-term accelerated failure time alternatives and are obtained as score statistics from a time-dependent Cox model.

An interesting feature of these tests is that they are simple to use since they can be very easily obtained from standard Cox model procedures implemented in most statistical software packages. The test of no long-term effect should be particularized since its limiting distribution is obtained in the presence of a negligible short-term effect. This drawback would not exist if a test was derived from the original model, but this does not seem to be computationally realistic in usual practice. It must be kept in mind that using this test also requires that the maximum value of the cumulative hazard be estimated consistently. The theoretical condition underlying this assumption is that sufficient number of patients should be followed up to a time after which the risk of developing the event of interest is negligible. Such drawbacks do not concern the two other tests.

Simulation results show that SLT and ST tests maintain a correct type I error in case of high censoring rate and a misspecified model. In contrast, the proposed LT test is very sensitive to model misspecification and high censoring rates in the presence of a short-term effect. Regarding the power, the SLT and ST show interesting power performances for assessing a short-term effect with or without a long-term effect as compared to classical tests such as the logrank test or the Peto–Prentice–Wilcoxon test. Power gains decrease with censoring which could be explained by the fact that the cumulative hazard is not consistently estimated under the alternative hypothesis. Indeed, the presence of uniform censoring yields to a violation of the sufficient follow-up condition even if the model is correctly specified under the alternative hypothesis.

In practice, SLT and ST tests could be recommended for routine use when a non-constant short-term effect is expected. This could be the case when comparing treatments that modify the speed of progression of the disease in a population where a long-term survivor fraction is commonly encountered. As seen from the simulation results, it seems that with moderate censoring the product-limit test is a more reasonable alternative to the long-term effect tests when a short-term effect is expected.

The proposed score tests are particularly well suited to the study presented in this article since a large proportion of the patients will never recur from the disease and a long-term follow-up is provided. According to our analysis, the recurrence of the disease appeared to have been significantly delayed by primary chemotherapy but without a benefit on long-term recurrence rate as compared to the post-operative chemotherapy. It is tempting to speculate that early and effective targeting of active micrometastasic disease may have delayed the occurrence of disease recurrence. Based on these results, we should emphasize that using the proposed score tests provide some interesting findings for primary chemotherapy that would have been overlooked by only considering results from the classical logrank test. Moreover, we are able to attribute this difference to the short-term effect. Finally, our approach offers a new insight on the different aspects of treatment effects and may be recommended for widespread use in long-term survival studies.

In addition, it should be noted that these tests can obviously be extended for taking into account other factors by using a stratified time-dependent Cox model. Further works are ongoing to extend this family for taking into account other complex time-varying short-term patterns.

In conclusion, the proposed tests, which are very simple to use, seem particularly appealing when testing time-related effects of new markers or therapies in censored data with long-term survivors.

Acknowledgments

The authors would like to acknowledge Prof. A. Yakovlev (University of Utah) and Dr. B. Asselain (Institut Curie) for stimulating discussions during the evolution of this article. We thank all the members of the Breast Cancer Group of the Curie Institute for their cooperation. We are indebted to Mrs. C. Gautier for her kind technical assistance in carrying out the data management of the S6-trial. This work was supported in part by the ‘Société Française du Cancer’, NIH/NCI U01 CA 97414 and DAMD 17-03-1-0034.

Appendix. Partial Second Derivatives Based Upon the Working Model

For the following derivation it is convenient to introduce the notations:

S(0)[β1,β2,w^(t),t]=1nk=1nYk(t)eβ1Zkexp{Zkβ2[w^(t)]}S(1)[β1,β2,w^(t),t]=1nk=1nZkYk(t)eβ1Zkexp{Zkβ2[w^(t)]}S(2)[β1,β2,w^(t),t]=1nk=1nZk2Yk(t)eβ1Ziexp{Zkβ2[w^(t)]}

It follows that the partial second derivatives are as follows:

2logLβ1β1=j=1nδj{{S(1)[β1,β2,w^(tj),tj]S(0)[β1,β2,w^(t),tj]}2S(2)[β1,β2,w^(tj),tj]S(0)[β1,β2,w^(tj),tj]}2logLβ2β2=j=1nδj[w^(tj)]2{{S(1)[β1,β2,w^(tj),tj]S(0)[β1,β2,w^(t),tj]}2S(2)[β1,β2,w^(tj),tj]S(0)[β1,β2,w^(tj),tj]}2logLβ2β2=j=1nδj[w^(tj)]{{S(1)[β1,β2,w^(tj),tj]S(0)[β1,β2,w^(t),tj]}2S(2)[β1,β2,w^(tj),tj]S(0)[β1,β2,w^(tj),tj]}

The elements of the information matrix are computed under the null hypothesis H0 by using ŵ(t) as given in Section 3.1 and replacing S(l)(β1, β2, ŵ(t), t) by S(l)(0, 0, ŵ(t), t).

Under the (null hypothesis H00, the corresponding elements are computed by using ŵ(t) as given in Section 3.1 and replacing S(l)(β1, β2, ŵ(t), t) by S(l)(0,0,w^(t),t).

Finally, under the null hypothesis H000, the corresponding elements are computed by using ŵ(t) as given in Section 3.1 and replacing S(l)(β1, β2, ŵ(t), t) by S(l)(0,β^2,w^(t),t).

Contributor Information

PHILIPPE BROËT, Department of Public Health and National Institute for Health and Medical Research (INSERM U472) Paul Brousse Hospital, Villejuif, France and Institut Curie, 26 rue d'Ulm, Paris, France ; rf.mresni.fjv@teorb.

ALEXANDER TSODIKOV, Huntsman Cancer Institute 2000 Circle of Hope, Salt Lake City, Utah 84112-5550, USA.

YANN DE RYCKE, Institut Curie, 26 rue d'Ulm, Paris, France.

THIERRY MOREAU, National Institute for Health and Medical Research (INSERM U472) Paul Brousse Hospital, Villejuif, France.

References

  • Aalen OO. Nonparametric inference for a family of counting processes. Ann. Stat. 1978;6:701–726.
  • Aalen OO. Modelling heterogeneity in survival analysis by the compound Poisson distribution. Ann. Appl. Prob. 1992;2:951–972.
  • Bland KI, Copeland EM. The Breast: Comprehensive Management of Benign and Malignant Diseases. 2nd edn Saunders; Philadelphia: 1998.
  • Breslow NE. Contribution to the discussion on the paper by D.R. Cox’ Regression and life tables. J. Roy. Statist. Soc., B. 1972;34:216–217.
  • Breslow NE. Covariance analysis of censored survival datas. Biometrics. 1974;30:89–99. [PubMed]
  • Broët P, De Rycke Y, Tubert-Bitter P, Lellouch J, Asselain B, Moreau T. A semi-parametric approach for the two-sample comparison of survival times with long-term survivors. Biometrics. 2001;57:844–852. [PubMed]
  • Cantor AB, Shuster JJ. Parametric versus non-parametric methods for estimating cure rates based on censored survival data. Stat. Med. 1992;11:931–937. [PubMed]
  • Fleming TR, Harrington DP. Counting Processes and Survival Analysis. John Wiley; New York: 1991.
  • Gray RJ, Tsiatis AA. A linear rank test for use when the main interest is in differences in cure rates. Biometrics. 1989;45:899–904. [PubMed]
  • Ibrahim JG, Chen MH, Sinha D. A new Bayesian model for survival data with a surviving fraction. J. Amer. Statist. Assoc. 1999;94:909–919.
  • Ibrahim JG, Chen MH, Sinha D. Bayesian semiparametric models for survival data with a cure fraction. Biometrics. 2001;57:383–388. [PubMed]
  • Kabfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. Wiley; New York: 1980.
  • Kaplan E, Meier P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958;53:457–581.
  • Kuk AYC, Chen C-W. A mixture model combining logistic regression with proportional hazards regression. Biometrika. 1992;79:531–541.
  • Laska EM, Meisner MJ. Nonparametric estimator and testing in a cure models. Biometrics. 1992;48:1223–1234. [PubMed]
  • Lee JW. Two-sample rank tests for acceleration in cure models. Stat. Med. 1995;14:2111–2118. [PubMed]
  • Maller R, Zhou X. Survival Analysis with Long-Term Survivors. John Wiley; New York: 1996.
  • Nelson W. Theory and applications of hazard plotting for censored failure data. Technometrics. 1972;14:945–965.
  • Peto R, Peto J. Asymptotically efficient rank invariant test procedures (with discussion) J. Roy. Stat. Soc. Ser. A. 1972;135:185–198.
  • Pierce DA. The asymptotic effect of substituting estimators for parameters in certain types of statistics. Ann. Stat. 1982;10:475–478.
  • Scholl SM, Fourquet A, Asselain B, Pierga J-Y, Vilcoq JR, Durand JC, Dorval T, Palangié T, Jouve M, Beuzeboc P, Garcio-Giralt E, Salmon RJ, De la Rochefordiére A, Campana F, Pouillart P. Neoadjuvant versus adjuvant chemotherapy in premenopausal patients with tumours considered as too large for breast conserving surgery: Preliminary results of a randomised trial: S6. Eur. J. Cancer. 1994;30A:645–652. [PubMed]
  • Sposto R, Sather HN, Baker SA. A comparison of tests of the difference in the proportion of patients who are cured. Biometrics. 1992;48:87–99. [PubMed]
  • Taylor J. Semi-parametric estimation in failure time mixture models. Biometrics. 1995;51:899–907. [PubMed]
  • Tsodikov A. A proportional hazards model taking account of long-term survivors. Biometrics. 1998;54:1508–1516. [PubMed]
  • Tsodikov A. Semi-parametric models of long- and short-term survival: an application to the analysis of breast cancer survival in Utah by age and stage. Stat. Med. 2002;30:895–920. [PubMed]
  • Yakovlev A, Tsodikov A. Stochastic Models of Tumor Latency and their Biostatistical Applications. World Scientific; Singapore: 1996.