Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Biometrics. Author manuscript; available in PMC 2011 March 8.
Published in final edited form as:
PMCID: PMC3050018

Regression on Quantile Residual Life


A time-specific log-linear regression method on quantile residual lifetime is proposed. Under the proposed regression model, any quantile of a time-to-event distribution among survivors beyond a certain time point is associated with selected covariates under right censoring. Consistency and asymptotic normality of the regression estimator are established. An asymptotic test statistic is proposed to evaluate the covariate effects on the quantile residual lifetimes at a specific time point. Evaluation of the test statistic does not require estimation of the variance-covariance matrix of the regression estimators, which involves the probability density function of the survival distribution with censoring. Simulation studies are performed to assess finite sample properties of the regression parameter estimator and test statistic. The new regression method is applied to a breast cancer data set with long-term follow-up to estimate the patients’ median residual lifetimes, adjusting for important prognostic factors.

Keywords: Breast cancer, Martingale, Minimum dispersion statistic

1. Introduction

In medical research, an investigator’s ultimate interest may be in inferring the remaining life years of a patient, given his/her genetic and/or environmental backgrounds. The remaining life years of a patient could be prolonged by treating or preventing a disease by a medical intervention. Existing statistical methods such as an adjusted version of the Kaplan-Meier method (Kaplan and Meier, 1958) or Cox proportional hazards model (Cox, 1972, 1975) may be adopted to indirectly infer the remaining lifetimes, but they are often cumbersome and not straightforward, especially when the remaining lifetimes need to be evaluated in the middle of an observation period. Also the remaining lifetimes at a specific time point estimated from the semiparametric models such as proportional hazards or accelerated failure time (AFT) (Cox and Oakes, 1984) may heavily depend on model assumptions that affect the entire observation period. Another advantage of using the concept of the remaining lifetime is that it may provide investigators and patients with more straightforward interpretations regarding the efficacy results from clinical studies, so that they can communicate more clearly about potential benefits of new drugs for ongoing or future trials.

Some recent phase III breast cancer clinical trials provide practical examples of studies that could take advantage of the proposed method (Goss et al., 2003; Coombes et al., 2004; Mamounas et al., 2008). In those studies, estrogen receptor-positive patients who had been treated with tamoxifen for up to 5 years without recurrence of the original disease were re-randomized to placebo or an aromatase inhibitor, either letrozole or exemestane. A patient who would be interested in participating in this type of study would want to know how much her lifetime could be extended by accepting the new therapy after she had already survived 5 years since her initial diagnosis and treatment. In this case, the important covariates to be included in a regression model would be the patient’s age at diagnosis, seriousness of nodal involvement, status of estrogen or progesterone receptor, pathological tumor size, and type of the initial therapy.

Quantile regression models were originally introduced by Koenker and Bassett (1978). Recent developments in this area for non-censored data include Jung (1996), Portnoy and Koenker (1997), Wei et al. (2006), Whang (2006) and Mu and He (2007). Jung (1996) implemented the quasi-likelihood approach to obtain the parameter estimates for median regression models, which allows for the observations to be dependent. Portnoy and Koenker (1997) introduced a new computational technique that permits a faster estimation process for general l1-type problems over the widely used simplex method for sufficiently large sample sizes. Wei et al. (2006) applied the quantile regression approach to analyze longitudinal growth charts data. Whang (2006) proposed to use the empirical likelihood approach for parameter estimation and construction of the confidence regions. Mu and He (2007) developed the power-transformed linear quantile regression model by using the equivariance property of the quantiles under monotone transformations.

For censored data, although much work has been done on the quantile regression model at the origin of the time axis, not much attention has been given to the regression model on quantile residual life function. Gelfand and Kottas (2003) proposed a Bayesian inference procedure for a median residual life regression model that was derived by the AFT regression model. Their work relies on correct specification of the AFT model for the underlying failure time distribution. In this paper, we develop a frequentist and time-specific regression method that can be used to directly infer the effect of covariates on the quantile residual lifetimes at any specific time point, and that does not require specifying a semiparametric model for the underlying failure time. The proposed method can be viewed as a generalization of the median regression model (Ying, Jung and Wei, 1995). The interpretation of the larger quantile residual lifetime in a treatment group at time t0 can be viewed as a positive effect of treatment on the quantile failure time among subjects who survived beyond t0. In general, evaluation of the asymptotic variance of the quantiles in survival data is not straightforward because it involves estimation of the probability density function of failure time distribution under censoring. Here we propose a method that deals directly with the estimating equation to avoid the problem.

In Section 2, some notations are introduced along with the definition of the quantile residual life function. In Section 3, a time-specific quantile residual life regression model is proposed. An estimating equation is formulated to estimate the regression parameters, and consistency and asymptotic normality of the regression estimator are established. An asymptotic χ2-statistic is also proposed to test significance of the regression parameters in this section. In Section 4, simulation studies are performed to assess the small sample properties of the regression method, such as bias of the estimator, and type I error and power of the test statistic. In Section 5, the proposed method is applied to a data set from one of the earliest clinical trials on breast cancer treatment performed by the National Surgical Adjuvant Breast and Bowel Project (NSABP). We conclude with discussions in Section 6.

2. Quantile Residual Life Function

Let Yi, i = 1, 2, …, n, denote the minimum of the failure time Ti and censoring time Ci. Let Δi be the censoring indicator, i.e., Δi = 1 if Yi = Ti and Δi = 0 if Yi = Ci. Suppose θξ|t defines the ξ –quantile residual life function at time t. Then, θξ|t = ξ-quantile(Tit|Tit) satisfies the relation P(Titθξ|t|Tit) = 1 − ξ, which is equivalent to


Note that θξ|t is the ξ–quantile of the remaining lifetimes among patients who survived beyond time t. The function θ ξ|t has been extensively studied by many authors, including Schmittlein and Morrison (1981), Arnold and Brockett (1983), Csörgö and Horvath (1983), Gupta and Langford (1984), Joe and Proschan (1984), Csörgö and Csörgö (1987), Alam and Kulasekera (1993), Song and Cho (1995), among others. Gupta and Langford (1984) noted that θξ|t does not uniquely determine S(t) = P(T > t), but in practice one can model S(t) first and then infer θξ|t (Gelfand and Kottas, 2003). Jeong, Jung, and Costantino (2008) developed a nonparametric two-sample statistic to compare the median residual lifetimes at any fixed time point during the follow-up period of a study. In this paper, we are particularly interested in associating the function θξ|t at a specific time t0, denoted as θξ|t0 in the sequel, with selected covariates.

Suppose that {(Yi, Δi), i = 1, …, n} are random samples from a single population. Let G(t) = P(Ct) denote the ‘survival function’ of censoring distribution, and I(·) the indicator function. Since E {I (Yt)} = S(t)G(t) under the independence of failure and censoring times, we can consider an unbiased estimating function for θξ|t0,


Here, G(t) is usually unknown, so that we replace it with its Kaplan-Meier estimator (Kaplan and Meier, 1958), Ĝ (t) by using the censoring indicator 1 − Δi, to obtain a consistent estimating function under censoring:


Consistency and asymptotic normality of the solution [theta w/ hat]ξ|t0 to the equation can be easily developed.

In the following section, we extend the estimating equation for the quantile residual life for a single sample to a regression setting.

3. Regression Model

We consider a linear regression model for a ξ –quantile of residual lifetimes at time t0, on a log-scale,


where βξ|t0 = (βξ|t0, βξ|t0,1, … βξ|t0,p)′ denotes a vector of the regression coefficients, and Zi = (1, X1i,…, Xpi)′ is a vector of covariates for a subject i. The model (3) specifies a linear relationship between the ξ –quantile residual lifetimes on a log-scale and the vector of covariates at a specific time t0, allowing for testing, say, a treatment effect adjusting for other covariates. Let βξt00 denotes the true value of βξ|t0. Without censoring the estimate of the true regression parameter can be obtained by minimizing i=1nlog(Tit0)βt0Zi=2i=1n{log(Tit0)βξt0Zi}[I{log(Tit0)βξt0Zi0}(1ξ)]. Therefore the estimating function for noncensored case is given by i=1nZi[I{Tit0+exp(βξt0Zi)}(1ξ)]. Assuming conditional independence between Ti and Ci given Zi and independence of Ci from Zi, the following is true for the censored data,


which is equivalent to


Under the model (3) and by definition of the ξ –quantile residual life function




and the following can be used as an estimating equation for the regression parameter βξ|t0,


Equation (4) mimics the least square estimating equations from the ordinary multiple linear regression model since


By the invariance property of the quantile with respect to monotone transformations, the estimating equation (4) can be evaluated on the original scale of the observed survival data, although the model (3) is based on a log-scale. This transformation invariance property of the quantile also allows for flexibility in data analysis. A solution [beta]ξ|t0 to (4) may be defined as a minimizer of the function ||Sξ|t0,n (βξ|t0)||, where || · || will be defined as the square root of sum of squares in our simulation study and real example. Under certain regularity conditions, it can be shown that [beta with macron above]ξ|t0 is a consistent estimator of its true value βξt00 (see APPENDIX A in Web Appendix). Note that in practice if G^{t0+exp(βξt0Zi)} and Ĝ (t0) are zeros in (4), I{Yit0+exp(βξt0Zi)}/G^{t0+exp(βξt0Zi)} and I (Yit0)/Ĝ (t0) are also set to be zeros.

Suppose that we are interested in testing the null hypothesis H0: βξ|t0 = β ξ|t0,0. Because the limiting variance-covariance matrix of the estimators βξ|t0 involves unknown conditional probability density functions of log(Tit0)βξt00Zi given Zi, we will use the estimating function Sξ|t0, n(βξ|t0) directly to test H0. In APPENDIX B in Web Appendix, we show that the distribution of n1/2Sξt0,n(βξt00) is approximately normal with mean zero and variance-covariance matrix Γξt0=limnn1i=1nτξt0,iτξt0,i, where


ΛG (·) is the cumulative hazard function for the censoring distribution, h(s)=limnn1i=1nI(Yis),q1(s)=limnn1i=1nZiI{t0+exp(βξt00Zi)min(s,Yi)}, and


A consistent estimator [Gamma]ξ|t0 for the limiting covariance matrix of n1/2Sξt0,n(βξt00) can be obtained as (see APPENDIX B in Web Appendix) Γ^ξt0=n1i=1nτ^ξt0,iτ^ξt0,i, where


A natural test statistic based on Sξ|t0,n (β ξ|t0,0) for testing H0 would be


which approximately follows a χ2-distribution with p + 1 degrees of freedom. A large observed value of this statistic suggests evidence against the null hypothesis H0. In APPENDIX C in Web Appendix, we show that the distribution of [beta]ξ|t0 is asymptotically normal for given Zi and fixed t0 based on the local linearity of Sξ|t0,n (βξ|t0).

Now we consider a partition of the regression coefficients, βξt0=(βξt0(1),βξt0(2)), where βξt0(1) is a r × 1 vector. Suppose that β^ξt0(1) and β^ξt0(2) are the corresponding estimates, and we are only interested in testing the hypothesis H0:βξt0(1)=βξt0,0(1), a specified vector, against a general alternative. For our test statistic, we will consider the minimum dispersion statistic (Basawa and Koul, 1988),


Note that evaluating this statistic does not require estimation of the probability density function of the survival distribution, which is needed for a Wald-type test statistic based on [beta]ξ|t0 By using similar arguments given in Wei, Ying and Lin (1990, Appendix 2) and Ying et al. (1995, APPENDIX C), it can be shown that (5) is approximately χ2-distributed with r degrees of freedom. We reject H0 for a large value of V(βξt0,0(1)). By inverting this test statistic, a 100 × (1 − α)% confidence region for βξt0(1) can be obtained as


where χr,1α2 is the 100 × (1 − α)th percentile of a χ2-distribution with r degrees of freedom.

By using this result we can also estimate the prediction interval. From (3), the ξ –quantile residual lifetime for a patient with a covariate vector z0 at time t0 can be directly predicted by η^(t0z0)=exp(β^ξt0z0). We estimate the associated 100(1− α)% confidence interval by { η:Q(η)<χ1,1α2}, where


4. Simulation Studies

To assess the finite sample properties of our estimator and test statistic, simulation studies were performed based on the median residual regression, i.e., ξ = 1/2. For an easy interpretation of the results, we considered a simple model


where x1i is a binary covariate, say, 0 for the control group and 1 for the treatment group. By the invariance property of the median with respect to monotone transformations, the model (8) is equivalent to


Here exp(βt0(0)) and exp(βt0(0)+βt0(1)) can be interpreted as the median residual lifetimes for the control and treatment groups at time t0, respectively, so that the difference in median residual lifetime between the two groups is given by


In this case, note that the slope parameter βt0(1) can be interpreted as the logarithm of the ratio of the two median residual lifetimes at time point t0.

Failure times Ti were assumed to follow a Weibull distribution with survival function S(t) = exp{ −(ρt)κ}. For this distribution, note that under H0:βt0(1)=0 the equations (1) and (9) give


Note that at the origin of time axis, i.e., t0 = 0, ρ0={log(2)}1/κ/exp(β0(0)), where exp(β0(0)) is the median of the failure distribution at t0 = 0. By using the probability integral transformation, failure times were generated for both groups from Ti = (1/ρ0) {− log(1 − ui)}1/κ, where ui is from a uniform random variable between 0 and 1. Censoring times Ci were generated from a uniform distribution between 0 and c, where constant c is for a certain censoring proportion. Finally the observed survival times Yi were determined as the minimum of Ti and Ci. We assumed that κ = 2 and exp(β0(0))=5 for convenience.

First, with a fixed total sample size of n = 200, 1000 simulations were performed for various combinations of censoring proportions and time points t0 to evaluate the empirical distribution of the regression parameters βt0(0) and βt0(1) via the mean and standard error (SE) of the parameter estimates. The grid search method was used to find the estimates that minimize the estimating equation (4). Table 1 summarizes the results. Although the proposed regression method needs to be used at one fixed time point, we have chosen various time points only for the purpose of comprehensive investigation in our simulation studies and real example. Under H0:βt0(1)=0, equation (11) gives the true values of βt0(0) as 1.61, 1.41, 1.22, and 1.04 at t0 = 0, 1, 2, 3, respectively. We assume that the shape parameter κ does not change in the surviving population. The true value of βt0(1) must be 0 for all t0 ≥ 0 because an identical survival distribution was assumed for both groups. In Table 1, we observe that the mean values of the estimates of βt0(0) and βt0(1) (no censoring) from the new method are very close to their true values for earlier fixed time points. Larger censoring proportions as well as later fixed time points tend to increase the SE of the parameter estimates.

Table 1
Mean (Standard Deviation) of the empirical estimates of the true regression parameters βt0(0)=1.61,1.41,1.22,1.04 at t0 = 0, 1, 2, 3 and β0(1)=0 when the total number of observations (n) is 200.

In order to investigate the performance of our semiparametric estimator relative to the maximum likelihood (ML) approach, we also conducted a simulation study for the ML approach. The median residual life function for a subject i with a covariate xi at time t0 is determined by θi,t0=(1/ρi,t0){log(2)+(ρi,t0t0)κ}1/κt0=exp(βt0Zi), where βt0=(βt0(0),βt0(1)) are the true values of the regression parameters at t0 and Zi=(1,xi). With a fixed value of κ, this implies


Denoting φ=(κ,βt0), the log-likelihood function to estimate the vector [var phi] is given by l(φ)=i=1n[Δilog{h(yi;φ,Zi)}+log{S(yi;φ,Zi)}], where log{h(yi;φ,Zi)}=log{log(2)}+log(κ)log[{exp(βt0Zi)+t0}κt0κ]+(κ1)log(yi) and log{S(yi;φ,Zi)}=log(2)[{exp(βt0Zi)+t0}κt0κ]1yiκ. We used the same simulation scenarios as before for a fair comparison. In Table 1, we observe that the bias of the regression estimators by ML method is negligible across all scenarios. Their corresponding standard errors of the estimates tend to increase in censoring proportion and t0. The bias and SE of the MLE are small overall. The bias of our semiparametric estimators is small with small t0 values, but slightly increases with t0 = 3. As expected, the MLE is more efficient than our estimator under the correct distributional assumption.

Next, with total sample sizes of n = 100, 200 or 1000, type I error probabilities for testing the null hypothesis H0:βt0(1)=0(t0=0,1,2,3) with α = 0.05 were compared between the proposed and ML methods at various time points with different average censoring proportions (Table 2). The results indicate that the test based on the minimum dispersion statistic tends to be slightly conservative, which is consistent with the observations by Su and Wei (1993), Ying et al. (1995), and Jeong et al. (2008).

Table 2
Type I error (1–95% coverage) probabilities for testing the null hypothesis H0:βt0(1)=0(t0=0,1,2,3) when the true parameter values are β0(1)=0 and the total sample sizes (n) are 100, 200, or 1000.

Similar settings were used for power analysis, except that the difference in median residual lifetimes between two groups were assumed to be 1, 3, and 5 at t0 = 0. Note that when β0(0)=1.61, equation (10) gives 1, 3, and 5 corresponding to β0(1)=0.18,0.47, and 0.69, respectively. From (12), recall that when t0 = 0 the scale parameter ρ from the Weibull distribution is given by


where β0(k)(k=0,1) is the true values of the regression parameters at t0 = 0. The covariate values of x1i being generated from a Bernoulli distribution with equal probability of success, failure times were simulated from Ti = (1/ρi,0){−log (1−ui)}1/κ. Note that the median of the failure distribution at t0 = 0 is determined by


Equation (14) gives 5 when x1i = 0, and it corresponds approximately to 6, 8, and 10, respectively, when x1i = 1 and β0(1)=0.18,0.47, and 0.69. Table 3 summarizes the probabilities of rejecting the null hypothesis of βt0(1)=0(t0=0,1,2,3) for each true value of β0(1). As expected, power tends to increase in n and β0(1), and to decrease in censoring proportion.

Table 3
Powers for testing the null hypothesis H0:βt0(1)=0(t0=0,1,2,3) when the true parameter values are β0(1)=.18,.47,.69 and the total sample sizes (n) are 100 or 200.

5. A Real Example

The NSABP B-04 data set has been chosen as a real data example because it has a long-term follow-up over 30 years, preserving important survival information among breast cancer patients without any chemo- or hormonal therapies after surgery. This information is not available from recent studies on breast cancer because a series of standard therapies have been established for the patients being randomized to the control group. In this section, we apply the proposed method to re-analyze the B-04 data to evaluate the effects of important prognostic factors in breast cancer on the median residual lifetimes.

In the B-04 data, there were 5 groups being compared, 3 groups in node-negative patients and 2 groups in node-positive patients. With more than a quarter century of follow-up, the proportion of patients still alive is very low as expected, i.e., less than 30% among 1,079 eligible node-negative patients with follow-up and less than 20% among 586 eligible node-positive patients with follow-up. Fisher et al. (2002) reported a 25-year follow-up update of the B-04 data. The original goal of the study was to prove that a less aggressive total mastectomy had an equivalent effect on patients’ survival to the traditional radical mastectomy. In this paper, the nodal status was considered as a covariate along with age at diagnosis of breast cancer and pathological tumor size to predict median residual lifetimes of breast cancer patients at any given time point.

First we consider a simple regression model that includes only one covariate of the nodal status, i.e., Zi = (1, Xnode,i)′, where Xnode,i was coded as 0 or 1 for node-negative and node-positive, respectively, in the regression model (3). Table 4(a) summarizes the estimates of βt0(intercept) and βt0(node)(t0=0,2,4,6,7,8,10) and the corresponding values of the minimum of the square root of sum of squares of two score functions, min S = min||Sξ|t0,n (βξ|t0)||. Even though the median follow-up time was close to 30 years in this data set, the results presented here are only through 10 years because the data get sparse later on. Figure 1 shows contour plots of the surface of minus the square root of sum of squares of two score functions at year t0 = 0, 4, and 8 with the intersection of the dotted horizontal and vertical lines being the regression parameter estimates found by the grid search method. Under this simple model, interpretation of θ^0,t0=exp(β^t0(intercept)) and θ^1,t0=exp(β^t0(intercept)+β^t0(node)) would be the estimated median residual lifetimes in node-negative and node-positive patients at time t0, respectively. Figure 2 shows a comparison of the estimated median residual lifetimes [theta w/ hat]0,t0 and [theta w/ hat]1,t0 from different approaches at various time points. Line type 1 represents the nonparametric estimates separately for each group by replacing the survival functions by their Kaplan-Meier estimators in (1). Line type 2 represents the estimates from the proposed regression model.

Figure 1
Contour plots of surfaces of minus the square root of sum of squares of two score equations for the median residual regression model applied to NSABP B-04 data set with the only covariate of nodal status; (a) When t0 = 0, (b) When t0 = 4, and (c) When ...
Figure 2
Comparison of the estimated median residual lifetimes by nodal status from various methods (NSABP B-04 data); upper 3 lines are for node-negative group and lower 3 lines are for node-positive group.
Table 4
Fitted simple and multiple regression models (NSABP B-04 data); (a), Regression parameter estimates from the median residual life regression model with a single covariate (simple model) and associated p-values and 95% confidence intervals from the proposed ...

A comparison may be also considered between the proposed and proportional hazards regression models (Cox, 1972, 1975). Suppose we have a proportional hazards model S(t; Zi) = S0(t)ηi, where ηi = exp(βZi) and Zi is an indicator for the nodal status. Under this model, the ξ–quantile residual lifetime at t0 can be evaluated by θξt0=S01{(1ξ)1/ηiS0(t0)}t0, by noting that S1(y)=S01(y1/ηi) and using the fact θξ|t0 = S−1 {(1 − ξ)S(t0)} − t0. Line type 3 in Figure 2 represents such estimates from the NSABP B-04 data. The comparison indicates that the estimates from the proposed model seems to follow the path of the nonparametric one-sample estimates closer in each group than the Cox model does. The indication of lack of fit in the Cox model may be due to violation of the proportional hazards assumption in the B-04 data (p = 0.0002 by Grambsch and Therneau (1994)).

In Figure 2, we observe that the median residual lifetime of the node-positive group is shorter than that of the node-negative group during the first 10 years. The median residual lifetime in the node-negative group is almost constant for the range of t0 values. For the node-positive group, however, the median residual lifetime slightly increases in t0, which may imply that the node-positive patients tend to survive longer once they pass the early high-risk period. P-values and 95% confidence intervals from the minimum dispersion statistic (5) are provided in Table 4(a) to compare median residual lifetimes between the two nodal groups at each fixed time point t0. The results indicate that the nodal status has a significant effect on the median residual lifetimes in early time points, but the effect diminishes once patients survive more than 7 years. This is similar to the findings in Jeong et al. (2008) by using a two-sample test statistic without adjusting for other prognostic factors.

Since there are no other existing methods that can be directly compared with the proposed method, first we have adopted the bootstrap resampling method (Efron, 1979) to assess our results. We have resampled 1000 times from the original sample with replacement and calculated standard errors of those 1000 estimates, which were 0.07, 0.10, 0.13, 0.14, 0.12, 0.11, and 0.16 for time points t = 0, 2, 4, 6, 7, 8, and 10, respectively. At each time point, these standard error estimates were used to construct 95% confidence intervals for βt0(node) as well as to obtain p-values via the Wald test. The proportional hazards model (Cox, 1972, 1975) was also chosen for a comparison of the observed significance of the slope parameter. In this case, the significance of the effect of nodal status was evaluated among surviving patients beyond each fixed time point. The results from the proportional hazards analysis are presented only through p-values for the comparison, because the quantities being evaluated are different under the proportional hazards and time-specific median residual life models. Note that the interpretation of the slope parameter is the logarithm of the hazard ratio under the proportional hazards model and the logarithm of the ratio of two median residual lifetimes at time t0 under the proposed model, so that confidence intervals for the slope parameter from the two models cannot be compared directly. From Table 4(a), we observe that the results from the three approaches are compatible in p-values, although ones from the proposed method tend to be slightly larger. We also see that the resampling and proposed methods give similar 95% confidence intervals.

Now we consider the multiple regression model, where Zi = (1, Xnode,i, Xage,i, Xtsize,i)′, Xnode,i = 0 (node-negative) or 1 (node-positive), and both age (Xage,i) and pathological tumor size (Xtsize,i) are continuous covariates. The age and tumor size values were rescaled simply by multiplying by 0.01, so that the regularity condition holds in the estimates Ĝ(·) in the estimating equation (4). Table 4(b) shows the estimated regression parameters and corresponding 95% confidence intervals (cov) from the minimum dispersion statistic for testing H0:βt0(cov)=0(t0=0,2,4,6,8,10,cov=node,age,ortsize). The results indicate that the negative effect of age at a diagnosis of cancer on the median residual lifetimes tends to become stronger for later time points, whereas the effects of nodal status and pathological tumor size are significant only up to first 4 or 5 years, when adjusting for other covariates.

These results can be used to predict a patient’s median residual lifetime at a given time t0. For example, for a patient with positive lymph nodes, age at diagnosis of 56 (median), and pathological tumor size of 30mm (median), the median residual lifetime (in years) 6 years after the original diagnosis of cancer can be predicted as exp {3.92 −1 × 0.28 − 2.11 × (0.01 × 56.0) − 0.48 × (0.01 × 30.0)} = 10.1. Similarly for node-negative patients, the predicted median residual lifetime is 13.4 years. From (7), the corresponding 95% prediction intervals are (7.3,12.9) and (11.2,16.4), respectively. These estimates may be used as baseline mortality information for a future clinical study to evaluate a new therapy in terms of prolonging the patients’ remaining lifetimes.

6. Discussion

In this paper, we proposed a regression model on quantile residual lifetimes, where the covariate effects can be directly associated with the quantile failure time among survivors beyond any specific time point, rather than inferring the quantile residual lifetime under some global regression model assumptions such as proportional hazards or accelerated failure time. An estimating equation for the regression coefficients has been formulated and justified under the proposed model. Asymptotic distributions of the proposed regression parameter estimator and test statistic to evaluate the covariate effects have been derived, and it was shown that they had reasonable finite sample properties through simulation studies. The grid search method that was used to find the roots of the estimating equation also provided reasonable solutions in our simulation studies and real example. The variance of the proposed estimators depends on the probability density function of the survival variables. However, the variance of the estimating function can be consistently estimated without specification of the survival distribution. Hence, we used the minimum dispersion statistic (Basawa and Koul, 1988) based on the distribution of the estimating function. The variance of the regression coefficient estimator may be estimated by using either the proposed confidence interval approach or the bootstrap resampling method.

Our theoretical results in Section 3 are based on the uniform consistency of the Kaplan-Meier estimator Ŝ(t) over 0 < t < ζ = sup{t: S(t)G(t) > 0}, which leads to a condition that t0+exp(βξt0Z)<ζ in our case. Here ζ may be considered as the maximum follow-up time, which has the lower bound t0+exp(βξt0Z).

As noted in Ying et al. (1995), to check the model assumption of (3) at a fixed time t0, a zero-mean Gaussian process of cumulative sums of median residuals can be defined as W(s)=n1/2i=1neiI(β^ξt0Zis), where


A plot of W(s) against the predicted ξ–quantile residual lifetimes may be used as a graphical tool to check the model assumption at time t0.

We have assumed that survival and censoring variables are conditionally independent given covariates. By using a censoring distribution model that is independent of covariates, however, the conditional independence assumption is reduced to an unconditional independence assumption, as a referee pointed out. The latter stronger assumption holds for most well conducted clinical trials, including the NSABP trial discussed in this paper, where the major cause of censoring is administrative. If the censoring distribution depends on a discrete covariate, we may partition the data into a number of strata defined by the covariate values so that the censoring variables have an identical distribution within each stratum. If the censoring distribution depends on a continuous covariate, we may consider modeling the dependency of censoring distribution on the covariate. For example, for a continuous covariate zi, suppose that Ci satisfies a proportional hazards model hi(t) = h0(t) exp(γzi), where hi(t) is the hazard function of Ci and h0(t) is the baseline hazard function. Then, Ĝ (t) in (4) will be replaced by Ĝ (t|zi) = exp{−ĥ0(t) exp([gamma with circumflex]zi)}, where [gamma with circumflex] is the partial MLE (Cox, 1972) for the censoring distribution and ĥ0(t) is the Breslow’s (1974) estimator of h0(t). If the assumed censoring model is correct, the consistency and asymptotic normality of [beta]ξ|t0 can be derived using the properties of [gamma with circumflex].

An associate editor raised an issue regarding how to choose quantiles and how to interpret the results if the covariate effects differ across quantiles. In theory, any quantile ξ such that ξ < {Ŝ (t0) − Ŝ(tmax)}/Ŝ (t0), where tmax = max(Xi) and Xi = min(Ti, Ci), can be chosen, but one closer to the middle is recommended due to less variability. In the same context, the interpretation with one toward the middle should be considered more reliable. In practice, however, determination of which quantiles to be investigated would also depend on the scientific objectives of the study. For example, if the investigator is interested in inferring a patient population with longer survival, high quantiles such as 80 or 90 percentile, rather than the median, would be more appropriate. We have reanalyzed the NSABP B-04 data across different quantiles at t0 = 0 in the simple regression model. For the 10th, 25th, 50th and 75th percentile models, the estimates of the slope parameter were −0.620, −0.623, −0.620, and −0.351, respectively, all of which were associated with the p-value smaller than 0.0001. Thus the interpretation of the effect of the nodal status on patients’ survival does not change for the use of different quantiles in this example.

Supplementary Material

Supp Data


We are grateful to an associate editor and a referee for their valuable comments. Dr. Jong-Hyeon Jeong’s work was supported in part by the Department of Defense (DOD) Grant W81XWH-04-1-0605. Dr. Jong-Hyeon Jeong and Dr. Hanna Bandos’s work was supported in part by the National Health Institute (NIH) Grants 5-U10-CA69974-09 and 5-U10-CA69651-11.


Supplementary Materials

The proofs of consistency and asymptotic normality of [beta]ξ|t0 and asymptotic normality of n1/2Sξt0,n(βξt00) in Section 3 are available under the Paper Information link at the Biometrics website


  • Alam K, Kulasekera KB. Estimation of the quantile function of residual life time distribution. Journal of Statistical Planning and Inference. 1993;37:327–337.
  • Arnold BC, Brockett PL. When does the βth percentile residual life function determine the distribution? Operations Research. 1983;31:391–396.
  • Basawa IV, Koul HL. Large-sample statistics based on quadratic dispersion. International Statistical Review. 1988;56:199–219.
  • Breslow NE. Covariate analysis of censored survival data. Biometrics. 1974;30:89–99. [PubMed]
  • Coombes RC, Hall E, Gibson LJ, et al. A randomized trial of exemestane after two to three years of tamoxifen therapy in postmenopausal women with primary breast cancer. The New England Journal Of Medicine. 2004;350:1081–1092. [PubMed]
  • Cox DR. Regression models and life-tables (with discussion) Journal of the Royal Statistical Society Series B. 1972;34:187–220.
  • Cox DR. Partial likelihood. Biometrika. 1975;62:269–276.
  • Cox DR, Oakes D. Analysis of Survival Data. London: Chapman & Hall; 1984.
  • Csörgö M, Csörgö S. Estimation of percentile residual life. Operations Research. 1987;35:598–606.
  • Csörgö S, Horvath L. The rate of strong uniform consistency for the product-limit estimator. Z Wahrscheinlichkeitstheorie verw Gebiete. 1983;62:411–426.
  • Efron B. Bootstrap methods: another look at the jackknife. Annals of Statistics. 1979;7:1–26.
  • Fisher B, Jeong J, Anderson S, et al. Twenty-five year findings from a randomized clinical trial comparing radical mastectomy with total mastectomy and with total mastectomy followed by radiation therapy. The New England Journal Of Medicine. 2002;347:567–575. [PubMed]
  • Fleming TR, Harrington DP. Counting Processes and Survival Analysis. New York: Wiley; 1991.
  • Gelfand AE, Kottas A. Bayesian semiparametric regression for median residual life. The Scandinavian Journal of Statistics. 2003;30:651–665.
  • Goss PE, Ingle JN, Martino S, et al. A randomized trial of letrozole in postmenopausal women after five years of tamoxifen therapy for early-stage breast cancer. The New England Journal Of Medicine. 2003;349:1793–1802. [PubMed]
  • Grambsch P, Therneau T. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994;81:515–26.
  • Gupta RC, Langford ES. On the determination of a distribution by its median residual life function: a functional equation. Journal of Applied Probability. 1984;21:120–128.
  • Jeong J-H, Jung S-H, Costantino JP. Nonparametric inference on median residual life function. Biometrics. 2008;64:157–163. [PubMed]
  • Joe H, Proschan F. Percentile residual life functions. Operations Research. 1984;32:668–678.
  • Jung S-H. Quasi-likelihood for median regression models. Journal of the American Statistical Association. 1996;91:251–257.
  • Kaplan EL, Meier P. Nonparametric estimator from incomplete observations. Journal of the American Statistical Association. 1958;53:457–481.
  • Koenker R, Bassett GJ. Regression quantiles. Econometrica. 1978;46:33–50.
  • Mamounas EP, Jeong J-H, Wickerham L, et al. Benefit from exemestane as extended adjuvant therapy after 5 years of adjuvant tamoxifen: Intention-to-treat analysis of the National Surgical Adjuvant Breast and Bowel Project B-33 trial. Journal of Clinical Oncology. 2008;26:1965–1971. [PubMed]
  • Mu Y, He X. Power transformation toward a linear regression quantile. Journal of the American Statistical Association. 2007;102:269–279.
  • Portnoy S, Koenker R. The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Statistical Science. 1997;12:279–296.
  • Schmittlein DC, Morrison DG. The median residual lifetime: a characterization theorem and an application. Operations Research. 1981;29:392–399.
  • Song J, Cho G. A note on percentile residual life. Sankhya. 1995;57:333–335.
  • Su JQ, Wei LJ. Nonparametric estimation for the difference or ratio of median failure times. Biometrics. 1993;49:603–607. [PubMed]
  • Ying Z, Jung S-H, Wei LJ. Survival analysis with median regression models. Journal of the American Statistical Association. 1995;90:178–184.
  • Wei LJ, Ying Z, Lin DY. Linear regression analysis of censored survival data based on rank tests. Biometrika. 1990;19:845–851.
  • Wei Y, Pere A, Koenker R, He X. Quantile regression methods for reference growth charts. Statistics in Medicine. 2006;25:1369–1382. [PubMed]
  • Whang YJ. Smoothed empirical likelihood methods for quantile regression models. Econometric Theory. 2006;22:173–205.