Statistical models for survival data with a surviving or cure fraction, often called cure models, have received a great deal of attention in the last decade. There are a variety of cure models proposed in the literature based on different assumptions or different perspectives of the cure mechanism. In this paper, we focus on the popular mixture cure models where the population is considered as a mixture of cured patients and uncured patients. Let Y
be the indicator variable for an uncured patient with Y
= 1 if the patient is uncured and 0 if cured, T
be the failure time of a patient. Define π = P
= 1), S
) = P
) and Su
) = P
= 1). That is, π is the probability of being uncured, and S
) and Su
) are the survival functions of the failure time of a patient and the failure time of an uncured patient respectively. The mixture cure model is given by
are two sets of covariates that have effects on π and Su
). The use of the mixture cure model dates back to Berkson and Gage [1
]. The advantage of the mixture cure model is that the proportion of cured patients and the survival distribution of uncured patients are modeled separately and the interpretation of the parameters of x
in the model is straightforward.
The most common method to specify the effects of z
on π is via a logit link function:
where γ is a vector of unknown parameters. Other link functions may be considered, such as the complementary log-log and the probit link functions in the generalized linear models for binary data. In this paper, we will use the logit link function only because of its simplicity and popularity.
Similar to the classical survival models, there are a number of methods to specify the effects of x
). Let Su0
) be an arbitrary baseline survival function. Similar to the proportional hazards model in survival analysis, one can assume
) and hu0
) are the corresponding hazard functions of Su
) and Su0
). This model is referred to as the proportional hazards mixture cure (PHMC) model. The model can be easily estimated if the baseline survival function Su0
) is specified up to a few unknown parameters. However, verifying a parametric assumption for the baseline distribution can be a challenging task. A semiparametric estimation method based on the partial likelihood approach becomes a well accepted method after the work of Kuk and Chen [2
]; Peng and Dear [3
]; Sy and Taylor [4
]. Large sample properties of estimators from the semiparametric PH mixture cure model were investigated in Fang et al. [5
An alternative to the proportional hazards assumption (3)
is the accelerated failure time (AFT) assumption to model the effects of x
). That is
This model is referred to as the accelerated failure time mixture cure (AFTMC) model. A parametric distribution with a few unknown parameters is often assumed for the baseline distribution and the parameters in the model is estimated by the maximum likelihood approach ([6
]). Recently several authors investigated semiparametric estimation methods. Li and Taylor [9
] employed the M-estimation method [10
] to estimate the unknown parameters in the AFTMC model. Zhang and Peng [11
] further adapted a rank estimation method [12
] to improve the semiparametric estimation method for the AFTMC model.
An unstated assumption of the two models is that the covariate effects on the hazard rate of uncured patients are immediate. Considering a case with a single covariate equal to 1 if a new treatment is used and 0 if a standard treatment is used for a cancer study, the covariate is considered in both x and z in the mixture cure model (1), and the hazard of patients in the standard treatment group satisfies hu0(0) > 0. For uncured patients, it is obvious to see that in the PHMC model (3) the hazard ratio of patients in the new treatment group versus that in the standard treatment group is eβTx at t = 0 and it remains the same for any t > 0. In the AFTMC model (4), even though the hazard ratio is no longer constant over time, it still starts with eβTx at t = 0. This immediate effect assumption may not be desirable in some cancer studies when a treatment effect increases gradually over time from zero. For example, in testing antidepression drugs, it is sometimes not practical to assume that the drug is effective at the early stage of the treatment but rather to assume no effect at t = 0 and a gradual effect at the later stage of the treatment.
To model a gradual treatment effect for data without a cure fraction, Chen and Wang [13
] and Chen [14
] proposed an accelerated hazard (AH) model
For the binary treatment covariate defined above, it is easy to see that the hazard functions of the new and the standard treatments are hu0
) and hu0
) respectively, and the difference of the two hazard functions starts at 0 when t
= 0. Thus the AH model assumes that the hazard does not change at time 0 and then change gradually with time. Unless hu0
constant or limt→0+ hu0
) = 0, the AH model provides a useful way to model the gradual effect of a treatment that other existing models cannot handle properly.
To better demonstrate the differences, we plot the hazard curves based on the three models in . We consider two groups with x = 0 for the control (baseline) group and x = 1 for the treatment group. The baseline hazard function is a U-shape function, which is often employed in health research. The value of β is set to −0.8. Comparing the hazard curves from the two groups, we can see that the PH model implies that the treatment decreases the hazard rate by e−0.8 = 0.45 for the whole period. In the AFT model, the relationship of hazard rates in the two groups is more complicated: the treatment has a smaller hazard rate at beginning, larger hazard rate in the middle and then smaller hazard rate after the two periods. The AH model, on the other hand, provides a simple scenario: the treatment starts at the same hazard rate as the control group, it has a higher hazard rate than the control group at the early period due to, say, the toxicity of the treatment. However, after certain time point, the positive effect of the treatment is demonstrated with a smaller hazard rate than the control group.
Hazard curves from the PH model, AFT model, and AH model
Chen and Wang [13
] proposed estimating equations to estimate the parameters semiparametrically in the AH model (5). When there is a cure fraction in the data, the model (5) is clearly not appropriate. It is unclear whether the model and the semiparametrically estimation method can be easily adapted to incorporate the cure fraction. This motivates the work in this paper on a cure model that allows a gradual effect of covariates on the hazard of uncured patients. In this paper, we propose a new mixture cure model that employs a AH model to model the effects of x
) in the mixture cure model (1). A semiparametrically method is proposed to estimate the parameters in the cure model. We demonstrate the performance of the proposed model and estimation method via simulation and apply the model and estimation method to a data set from Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute [15
The remaining paper is organized as follows. Section 2 presents an accelerated hazard mixture cure model. A semiparametric estimation method for the proposed model is also discussed in this section. Section 3 reports a simulation study to investigate the performance of proposed model and estimation method. Section 4 describes an application of the model to the breast cancer data set of Polk, Iowa from SEER. Finally conclusions and some discussions are given in Section 5.