Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3050018

Formats

Article sections

- Summary
- 1. Introduction
- 2. Quantile Residual Life Function
- 3. Regression Model
- 4. Simulation Studies
- 5. A Real Example
- 6. Discussion
- Supplementary Material
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2011 March 8.

Published in final edited form as:

PMCID: PMC3050018

NIHMSID: NIHMS88101

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

A time-specific log-linear regression method on quantile residual lifetime is proposed. Under the proposed regression model, any quantile of a time-to-event distribution among survivors beyond a certain time point is associated with selected covariates under right censoring. Consistency and asymptotic normality of the regression estimator are established. An asymptotic test statistic is proposed to evaluate the covariate effects on the quantile residual lifetimes at a specific time point. Evaluation of the test statistic does not require estimation of the variance-covariance matrix of the regression estimators, which involves the probability density function of the survival distribution with censoring. Simulation studies are performed to assess finite sample properties of the regression parameter estimator and test statistic. The new regression method is applied to a breast cancer data set with long-term follow-up to estimate the patients’ *median* residual lifetimes, adjusting for important prognostic factors.

In medical research, an investigator’s ultimate interest may be in inferring the remaining life years of a patient, given his/her genetic and/or environmental backgrounds. The remaining life years of a patient could be prolonged by treating or preventing a disease by a medical intervention. Existing statistical methods such as an adjusted version of the Kaplan-Meier method (Kaplan and Meier, 1958) or Cox proportional hazards model (Cox, 1972, 1975) may be adopted to indirectly infer the remaining lifetimes, but they are often cumbersome and not straightforward, especially when the remaining lifetimes need to be evaluated in the middle of an observation period. Also the remaining lifetimes at a *specific* time point estimated from the semiparametric models such as proportional hazards or accelerated failure time (AFT) (Cox and Oakes, 1984) may heavily depend on model assumptions that affect the entire observation period. Another advantage of using the concept of the remaining lifetime is that it may provide investigators and patients with more straightforward interpretations regarding the efficacy results from clinical studies, so that they can communicate more clearly about potential benefits of new drugs for ongoing or future trials.

Some recent phase III breast cancer clinical trials provide practical examples of studies that could take advantage of the proposed method (Goss *et al.*, 2003; Coombes *et al.*, 2004; Mamounas *et al.*, 2008). In those studies, estrogen receptor-positive patients who had been treated with tamoxifen for up to 5 years without recurrence of the original disease were re-randomized to placebo or an aromatase inhibitor, either letrozole or exemestane. A patient who would be interested in participating in this type of study would want to know how much her lifetime could be extended by accepting the new therapy after she had already survived 5 years since her initial diagnosis and treatment. In this case, the important covariates to be included in a regression model would be the patient’s age at diagnosis, seriousness of nodal involvement, status of estrogen or progesterone receptor, pathological tumor size, and type of the initial therapy.

Quantile regression models were originally introduced by Koenker and Bassett (1978). Recent developments in this area for *non-censored* data include Jung (1996), Portnoy and Koenker (1997), Wei *et al.* (2006), Whang (2006) and Mu and He (2007). Jung (1996) implemented the quasi-likelihood approach to obtain the parameter estimates for median regression models, which allows for the observations to be dependent. Portnoy and Koenker (1997) introduced a new computational technique that permits a faster estimation process for general *l*_{1}-type problems over the widely used simplex method for sufficiently large sample sizes. Wei *et al.* (2006) applied the quantile regression approach to analyze longitudinal growth charts data. Whang (2006) proposed to use the empirical likelihood approach for parameter estimation and construction of the confidence regions. Mu and He (2007) developed the power-transformed linear quantile regression model by using the equivariance property of the quantiles under monotone transformations.

For *censored data*, although much work has been done on the quantile regression model at the *origin* of the time axis, not much attention has been given to the regression model on quantile *residual* life function. Gelfand and Kottas (2003) proposed a Bayesian inference procedure for a *median* residual life regression model that was derived by the AFT regression model. Their work relies on correct specification of the AFT model for the underlying failure time distribution. In this paper, we develop a frequentist and time-specific regression method that can be used to directly infer the effect of covariates on the quantile residual lifetimes at any specific time point, and that does not require specifying a semiparametric model for the underlying failure time. The proposed method can be viewed as a generalization of the median regression model (Ying, Jung and Wei, 1995). The interpretation of the larger quantile residual lifetime in a treatment group at time *t*_{0} can be viewed as a positive effect of treatment on the quantile failure time among subjects who survived beyond *t*_{0}. In general, evaluation of the asymptotic variance of the quantiles in survival data is not straightforward because it involves estimation of the probability density function of failure time distribution under censoring. Here we propose a method that deals directly with the estimating equation to avoid the problem.

In Section 2, some notations are introduced along with the definition of the quantile residual life function. In Section 3, a time-specific quantile residual life regression model is proposed. An estimating equation is formulated to estimate the regression parameters, and consistency and asymptotic normality of the regression estimator are established. An asymptotic *χ*^{2}-statistic is also proposed to test significance of the regression parameters in this section. In Section 4, simulation studies are performed to assess the small sample properties of the regression method, such as bias of the estimator, and type I error and power of the test statistic. In Section 5, the proposed method is applied to a data set from one of the earliest clinical trials on breast cancer treatment performed by the National Surgical Adjuvant Breast and Bowel Project (NSABP). We conclude with discussions in Section 6.

Let *Y _{i}*,

$$P({T}_{i}\ge t+{\theta}_{\xi \mid t})=(1-\xi )P({T}_{i}\ge t).$$

(1)

Note that *θ _{ξ}*

Suppose that {(*Y _{i}*, Δ

$${s}_{n}({\theta}_{\xi \mid {t}_{0}})=\sum _{i=1}^{n}\left\{\frac{I({Y}_{i}\ge {t}_{0}+{\theta}_{\xi \mid {t}_{0}})}{G({t}_{0}+{\theta}_{\xi \mid {t}_{0}})}-(1-\xi )\frac{I({Y}_{i}\ge {t}_{0})}{G({t}_{0})}\right\}.$$

Here, *G*(*t*) is usually unknown, so that we replace it with its Kaplan-Meier estimator (Kaplan and Meier, 1958), *Ĝ* (*t*) by using the censoring indicator 1 − Δ* _{i}*, to obtain a consistent estimating function under censoring:

$${S}_{n}({\theta}_{\xi \mid {t}_{0}})=\sum _{i=1}^{n}\left\{\frac{I({Y}_{i}\ge {t}_{0}+{\theta}_{\xi \mid {t}_{0}})}{\widehat{G}({t}_{0}+{\theta}_{\xi \mid {t}_{0}})}-(1-\xi )\frac{I({Y}_{i}\ge {t}_{0})}{\widehat{G}({t}_{0})}\right\}.$$

(2)

Consistency and asymptotic normality of the solution _{ξ|t0} to the equation can be easily developed.

In the following section, we extend the estimating equation for the quantile residual life for a single sample to a regression setting.

We consider a linear regression model for a *ξ* –quantile of residual lifetimes at time *t*_{0}, on a log-scale,

$$\xi -\text{quantile}\{log({T}_{i}-{t}_{0})\mid {T}_{i}\ge {t}_{0},{\mathbf{Z}}_{i}\}={\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i},$$

(3)

where *β*_{ξ|t0} = (*β*_{ξ|t0}, *β*_{ξ|t0,1}, … *β*_{ξ|t0,p})′ denotes a vector of the regression coefficients, and **Z*** _{i}* = (1,

$$\begin{array}{l}E[I\{log({Y}_{i}-{t}_{0})\ge {\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i}\}\mid {\mathbf{Z}}_{i}]=P\{{Y}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\mid {\mathbf{Z}}_{i}\}\\ =P\{{T}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\mid {\mathbf{Z}}_{i}\}P\{{C}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\mid {\mathbf{Z}}_{i}\},\end{array}$$

which is equivalent to

$$\frac{P\{{T}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\mid {\mathbf{Z}}_{i}\}}{P({T}_{i}\ge {t}_{0}\mid {\mathbf{Z}}_{i})}P({T}_{i}\ge {t}_{0}\mid {\mathbf{Z}}_{i})P\{{C}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\}.$$

Under the model (3) and by definition of the *ξ* –quantile residual life function

$$\frac{P\{{T}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\mid {\mathbf{Z}}_{i}\}}{P({T}_{i}\ge {t}_{0}\mid {\mathbf{Z}}_{i})}=1-\xi .$$

Therefore,

$$\begin{array}{l}E[I\{log({Y}_{i}-{t}_{0})\ge {\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i}\}\mid {\mathbf{Z}}_{i}]=(1-\xi )\frac{G\{{t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\}}{G({t}_{0})}P({Y}_{i}\ge {t}_{0}\mid {\mathbf{Z}}_{i})\\ =E\left[(1-\xi )\frac{G\{{t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\}}{G({t}_{0})}I({Y}_{i}\ge {t}_{0})|{\mathbf{Z}}_{i}\right],\end{array}$$

and the following can be used as an estimating equation for the regression parameter *β*_{ξ|t0},

$${\mathbf{S}}_{\xi \mid {t}_{0},n}({\mathit{\beta}}_{\xi \mid {t}_{0}})=\sum _{i=1}^{n}{\mathbf{Z}}_{i}\left[\frac{I\{{Y}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}}{\widehat{G}\{{t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}}-(1-\xi )\frac{I({Y}_{i}\ge {t}_{0})}{\widehat{G}({t}_{0})}\right]\approx 0.$$

(4)

Equation (4) mimics the least square estimating equations from the ordinary multiple linear regression model since

$$E\left[\frac{I\{{Y}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{0\prime}{\mathbf{Z}}_{i})\}}{G\{{t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{0\prime}{\mathbf{Z}}_{i})\}}|{\mathbf{Z}}_{i}\right]=E\{(1-\xi )\frac{I({Y}_{i}\ge {t}_{0})}{G({t}_{0})}|{\mathbf{Z}}_{i}\}.$$

By the invariance property of the quantile with respect to monotone transformations, the estimating equation (4) can be evaluated on the original scale of the observed survival data, although the model (3) is based on a log-scale. This transformation invariance property of the quantile also allows for flexibility in data analysis. A solution _{ξ|t0} to (4) may be defined as a minimizer of the function ||**S**_{ξ|t0,n} (*β*_{ξ|t0})||, where || · || will be defined as the square root of sum of squares in our simulation study and real example. Under certain regularity conditions, it can be shown that _{ξ|t0} is a consistent estimator of its true value
${\mathit{\beta}}_{\xi \mid {t}_{0}}^{0}$ (see APPENDIX A in Web Appendix). Note that in practice if
$\widehat{G}\{{t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}$ and *Ĝ* (*t*_{0}) are zeros in (4),
$I\{{Y}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}/\widehat{G}\{{t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}$ and *I* (*Y _{i}* ≥

Suppose that we are interested in testing the null hypothesis *H*_{0}: *β*_{ξ|t0} = *β*_{ξ|t0,0}. Because the limiting variance-covariance matrix of the estimators *β*_{ξ|t0} involves unknown conditional probability density functions of
$log({T}_{i}-{t}_{0})-{\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i}$ given **Z*** _{i}*, we will use the estimating function

$${\mathit{\tau}}_{\xi \mid {t}_{0},i}=\left[\frac{I\{{Y}_{i}\ge {t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\}}{G\{{t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\}}-(1-\xi )\frac{I({Y}_{i}\ge {t}_{0})}{G({t}_{0})}\right]{\mathbf{Z}}_{i}+{\int}_{-\infty}^{\infty}{G}^{-1}(s){\int}_{-\infty}^{s}{h}^{-1}(v)\{dI({Y}_{i}\le v,{\mathrm{\Delta}}_{i}=0)-I({Y}_{i}\ge v)d{\mathrm{\Lambda}}_{G}(v)\}d{\mathbf{q}}_{1}(s)-{\mathbf{q}}_{2}({t}_{0}){\int}_{-\infty}^{{t}_{0}}{h}^{-1}(s)\{dI({Y}_{i}\le s,{\mathrm{\Delta}}_{i}=0)-I({Y}_{i}\ge s)d{\mathrm{\Lambda}}_{G}(s)\},$$

Λ* _{G}* (·) is the cumulative hazard function for the censoring distribution,
$h(s)={lim}_{n\to \infty}{n}^{-1}{\sum}_{i=1}^{n}I({Y}_{i}\ge s),{\mathbf{q}}_{1}(s)={lim}_{n\to \infty}{n}^{-1}{\sum}_{i=1}^{n}{\mathbf{Z}}_{i}I\{{t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{{0}^{\prime}}{\mathbf{Z}}_{i})\le min(s,{Y}_{i})\}$, and

$${\mathbf{q}}_{2}({t}_{0})=\underset{n\to \infty}{lim}G{({t}_{0})}^{-1}\left(\frac{1-\xi}{n}\right)\sum _{i=1}^{n}I({Y}_{i}\ge {t}_{0}){\mathbf{Z}}_{i}.$$

A consistent estimator _{ξ|t0} for the limiting covariance matrix of
${n}^{-1/2}{\mathbf{S}}_{\xi \mid {t}_{0},n}({\mathit{\beta}}_{\xi \mid {t}_{0}}^{0})$ can be obtained as (see APPENDIX B in Web Appendix)
${\widehat{\mathbf{\Gamma}}}_{\xi \mid {t}_{0}}={n}^{-1}{\sum}_{i=1}^{n}{\widehat{\mathit{\tau}}}_{\xi \mid {t}_{0},i}{\widehat{\mathit{\tau}}}_{\xi \mid {t}_{0},i}^{\prime}$, where

$${\widehat{\mathit{\tau}}}_{\xi \mid {t}_{0},i}=\left[\frac{I\{{Y}_{i}\ge {t}_{0}+exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}}{\widehat{G}\{{t}_{0}+exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}}-(1-\xi )\frac{I({Y}_{i}\ge {t}_{0})}{\widehat{G}({t}_{0})}\right]{\mathbf{Z}}_{i}+\sum _{l=1}^{n}\left[{\mathbf{Z}}_{l}\frac{I\{{t}_{0}+exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{l})\le {Y}_{l}\}}{\widehat{G}\{{t}_{0}+exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{l})\}}\right]\phantom{\rule{0.16667em}{0ex}}[\frac{(1-{\mathrm{\Delta}}_{i})I\{{Y}_{i}\le {t}_{0}+exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{l})\}}{{\sum}_{m=1}^{n}I({Y}_{m}\ge {Y}_{i})}-\sum _{j=1}^{n}\frac{(1-{\mathrm{\Delta}}_{j})I[{Y}_{j}\le min\{{t}_{0}+exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{l}),{Y}_{i}\}]}{{\{{\sum}_{m=1}^{n}I({Y}_{m}\ge {Y}_{j})\}}^{2}}]-\sum _{l=1}^{n}\left\{{\mathbf{Z}}_{l}\frac{(1-\xi )I({Y}_{l}\ge {t}_{0})}{n\widehat{G}({t}_{0})}\right\}\phantom{\rule{0.16667em}{0ex}}\left[\frac{(1-{\mathrm{\Delta}}_{i})I({Y}_{i}\le {t}_{0})}{{\sum}_{m=1}^{n}I({Y}_{m}\ge {Y}_{i})}-\sum _{j=1}^{n}\frac{(1-{\mathrm{\Delta}}_{j})I\{{Y}_{j}\le min({t}_{0},{Y}_{i})\}}{{\{{\sum}_{m=1}^{n}I({Y}_{m}\ge {Y}_{j})\}}^{2}}\right].$$

A natural test statistic based on **S**_{ξ|t0,n} (*β*_{ξ|t0,0}) for testing *H*_{0} would be

$${n}^{-1}{\mathbf{S}}_{\xi \mid {t}_{0},n}^{\prime}({\mathit{\beta}}_{\xi \mid {t}_{0},0}){\widehat{\mathbf{\Gamma}}}_{\xi \mid {t}_{0}}^{-1}{\mathbf{S}}_{\xi \mid {t}_{0},n}({\mathit{\beta}}_{\xi \mid {t}_{0},0}),$$

which approximately follows a *χ*^{2}-distribution with *p* + 1 degrees of freedom. A large observed value of this statistic suggests evidence against the null hypothesis *H*_{0}. In APPENDIX C in Web Appendix, we show that the distribution of _{ξ|t0} is asymptotically normal for given **Z*** _{i}* and fixed

Now we consider a partition of the regression coefficients,
${\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}=({{\mathit{\beta}}_{\xi \mid {t}_{0}}^{(1)}}^{\prime},{{\mathit{\beta}}_{\xi \mid {t}_{0}}^{(2)}}^{\prime})$, where
${\mathit{\beta}}_{\xi \mid {t}_{0}}^{(1)}$ is a *r* × 1 vector. Suppose that
${\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{(1)}$ and
${\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{(2)}$ are the corresponding estimates, and we are only interested in testing the hypothesis
${\stackrel{\sim}{H}}_{0}:{\mathit{\beta}}_{\xi \mid {t}_{0}}^{(1)}={\mathit{\beta}}_{\xi \mid {t}_{0},0}^{(1)}$, a specified vector, against a general alternative. For our test statistic, we will consider the minimum dispersion statistic (Basawa and Koul, 1988),

$$V({\mathit{\beta}}_{\xi \mid {t}_{0},0}^{(1)})=\underset{{\mathit{\beta}}_{\xi \mid {t}_{0}}^{(2)}}{min}\{{n}^{-1}{\mathbf{S}}_{{t}_{0},n}^{\prime}(({{\mathit{\beta}}_{\xi \mid {t}_{0},0}^{(1)}}^{\prime},{{\mathit{\beta}}_{\xi \mid {t}_{0}}^{(2)}}^{\prime})){\widehat{\mathbf{\Gamma}}}_{\xi \mid {t}_{0}}^{-1}{\mathbf{S}}_{{t}_{0},n}(({{\mathit{\beta}}_{\xi \mid {t}_{0},0}^{(1)}}^{\prime},{{\mathit{\beta}}_{\xi \mid {t}_{0}}^{(2)}}^{\prime}))\}.$$

(5)

Note that evaluating this statistic does not require estimation of the probability density function of the survival distribution, which is needed for a Wald-type test statistic based on _{ξ|t0} By using similar arguments given in Wei, Ying and Lin (1990, Appendix 2) and Ying *et al.* (1995, APPENDIX C), it can be shown that (5) is approximately *χ*^{2}-distributed with *r* degrees of freedom. We reject _{0} for a large value of
$V({\mathit{\beta}}_{\xi \mid {t}_{0},0}^{(1)})$. By inverting this test statistic, a 100 × (1 − *α*)% confidence region for
${\mathit{\beta}}_{\xi \mid {t}_{0}}^{(1)}$ can be obtained as

$$\{{\mathit{\beta}}_{\xi \mid {t}_{0}}^{(1)}:V({\mathit{\beta}}_{\xi \mid {t}_{0}}^{(1)})<{\chi}_{r,1-\alpha}^{2}\},$$

(6)

where
${\chi}_{r,1-\alpha}^{2}$ is the 100 × (1 − *α*)* ^{th}* percentile of a

By using this result we can also estimate the prediction interval. From (3), the *ξ* –quantile residual lifetime for a patient with a covariate vector **z**_{0} at time *t*_{0} can be directly predicted by
$\widehat{\eta}({t}_{0}\mid {\mathbf{z}}_{0})=exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{z}}_{0})$. We estimate the associated 100(1− *α*)% confidence interval by {
$\eta :Q(\eta )<{\chi}_{1,1-\alpha}^{2}$}, where

$$Q(\eta )=\underset{\{\mathit{\beta}:{\mathit{\beta}}^{\prime}{\mathbf{z}}_{0}-log(\eta )\}}{min}{n}^{-1}{\mathbf{S}}_{\xi \mid {t}_{0},n}^{\prime}(\mathit{\beta}){\widehat{\mathbf{\Gamma}}}_{\xi \mid {t}_{0}}^{-1}{\mathbf{S}}_{\xi \mid {t}_{0},n}(\mathit{\beta}).$$

(7)

To assess the finite sample properties of our estimator and test statistic, simulation studies were performed based on the *median* residual regression, i.e., *ξ* = 1/2. For an easy interpretation of the results, we considered a simple model

$$\text{med}\{log({T}_{i}-{t}_{0})\mid {T}_{i}\ge {t}_{0},{x}_{1i}\}={\beta}_{{t}_{0}}^{(0)}+{\beta}_{{t}_{0}}^{(1)}{x}_{1i},$$

(8)

where *x*_{1}* _{i}* is a binary covariate, say, 0 for the control group and 1 for the treatment group. By the invariance property of the median with respect to monotone transformations, the model (8) is equivalent to

$$\text{med}({T}_{i}-{t}_{0}\mid {T}_{i}\ge {t}_{0},{x}_{1i})=exp({\beta}_{{t}_{0}}^{(0)}+{\beta}_{{t}_{0}}^{(1)}{x}_{1i}).$$

(9)

Here
$exp({\beta}_{{t}_{0}}^{(0)})$ and
$exp({\beta}_{{t}_{0}}^{(0)}+{\beta}_{{t}_{0}}^{(1)})$ can be interpreted as the median residual lifetimes for the control and treatment groups at time *t*_{0}, respectively, so that the difference in median residual lifetime between the two groups is given by

$$exp({\beta}_{{t}_{0}}^{(0)})\{exp({\beta}_{{t}_{0}}^{(1)})-1\}.$$

(10)

In this case, note that the slope parameter
${\beta}_{{t}_{0}}^{(1)}$ can be interpreted as the logarithm of the *ratio* of the two median residual lifetimes at time point *t*_{0}.

Failure times *T _{i}* were assumed to follow a Weibull distribution with survival function

$${\theta}_{{t}_{0}}\equiv exp({\beta}_{{t}_{0}}^{(0)})={S}^{-1}\{(1/2)S({t}_{0})\}-{t}_{0}=(1/{\rho}_{0}){\{log(2)+{({\rho}_{0}{t}_{0})}^{\kappa}\}}^{1/\kappa}-{t}_{0},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{t}_{0}\ge 0.$$

(11)

Note that at the origin of time axis, i.e., *t*_{0} = 0,
${\rho}_{0}={\{log(2)\}}^{1/\kappa}/exp({\beta}_{0}^{(0)})$, where
$exp({\beta}_{0}^{(0)})$ is the median of the failure distribution at *t*_{0} = 0. By using the probability integral transformation, failure times were generated for both groups from *T _{i}* = (1/

First, with a fixed total sample size of *n* = 200, 1000 simulations were performed for various combinations of censoring proportions and time points *t*_{0} to evaluate the empirical distribution of the regression parameters
${\beta}_{{t}_{0}}^{(0)}$ and
${\beta}_{{t}_{0}}^{(1)}$ via the mean and standard error (SE) of the parameter estimates. The grid search method was used to find the estimates that minimize the estimating equation (4). Table 1 summarizes the results. Although the proposed regression method needs to be used at one fixed time point, we have chosen various time points only for the purpose of comprehensive investigation in our simulation studies and real example. Under
${\stackrel{\sim}{H}}_{0}:{\beta}_{{t}_{0}}^{(1)}=0$, equation (11) gives the true values of
${\beta}_{{t}_{0}}^{(0)}$ as 1.61, 1.41, 1.22, and 1.04 at *t*_{0} = 0, 1, 2, 3, respectively. We assume that the shape parameter *κ* does not change in the surviving population. The true value of
${\beta}_{{t}_{0}}^{(1)}$ must be 0 for all *t*_{0} ≥ 0 because an identical survival distribution was assumed for both groups. In Table 1, we observe that the mean values of the estimates of
${\beta}_{{t}_{0}}^{(0)}$ and
${\beta}_{{t}_{0}}^{(1)}$ (no censoring) from the new method are very close to their true values for earlier fixed time points. Larger censoring proportions as well as later fixed time points tend to increase the SE of the parameter estimates.

Mean (Standard Deviation) of the empirical estimates of the true regression parameters
${\beta}_{{t}_{0}}^{(0)}=1.61,1.41,1.22,1.04$ at *t*_{0} = 0, 1, 2, 3 and
${\beta}_{0}^{(1)}=0$ when the total number of observations (*n*) is 200.

In order to investigate the performance of our semiparametric estimator relative to the maximum likelihood (ML) approach, we also conducted a simulation study for the ML approach. The median residual life function for a subject *i* with a covariate *x _{i}* at time

$${\rho}_{i,{t}_{0}}={\{log(2)\}}^{1/\kappa}{\left[{\{exp({\mathit{\beta}}_{{t}_{0}}^{\prime}{\mathbf{Z}}_{i})+{t}_{0}\}}^{\kappa}-{t}_{0}^{\kappa}\right]}^{-1/\kappa}.$$

(12)

Denoting
$\mathit{\phi}=(\kappa ,{\mathit{\beta}}_{{t}_{0}}^{\prime})$, the log-likelihood function to estimate the vector ** is given by
$l(\mathit{\phi})={\sum}_{i=1}^{n}\left[{\mathrm{\Delta}}_{i}log\{h({y}_{i};\mathit{\phi},{\mathbf{Z}}_{i})\}+log\{S({y}_{i};\mathit{\phi},{\mathbf{Z}}_{i})\}\right]$, where
$log\{h({y}_{i};\mathit{\phi},{\mathbf{Z}}_{i})\}=log\{log(2)\}+log(\kappa )-log\left[{\{exp({\mathit{\beta}}_{{t}_{0}}^{\prime}{\mathbf{Z}}_{i})+{t}_{0}\}}^{\kappa}-{t}_{0}^{\kappa}\right]+(\kappa -1)log({y}_{i})$ and
$log\{S({y}_{i};\mathit{\phi},{\mathbf{Z}}_{i})\}=-log(2){\left[{\{exp({\mathit{\beta}}_{{t}_{0}}^{\prime}{\mathbf{Z}}_{i})+{t}_{0}\}}^{\kappa}-{t}_{0}^{\kappa}\right]}^{-1}{y}_{i}^{\kappa}$. We used the same simulation scenarios as before for a fair comparison. In Table 1, we observe that the bias of the regression estimators by ML method is negligible across all scenarios. Their corresponding standard errors of the estimates tend to increase in censoring proportion and ***t*_{0}. The bias and SE of the MLE are small overall. The bias of our semiparametric estimators is small with small *t*_{0} values, but slightly increases with *t*_{0} = 3. As expected, the MLE is more efficient than our estimator under the correct distributional assumption.

Next, with total sample sizes of *n* = 100, 200 or 1000, type I error probabilities for testing the null hypothesis
${\stackrel{\sim}{H}}_{0}:{\beta}_{{t}_{0}}^{(1)}=0\phantom{\rule{0.16667em}{0ex}}({t}_{0}=0,1,2,3)$ with *α* = 0.05 were compared between the proposed and ML methods at various time points with different average censoring proportions (Table 2). The results indicate that the test based on the minimum dispersion statistic tends to be slightly conservative, which is consistent with the observations by Su and Wei (1993), Ying *et al.* (1995), and Jeong *et al.* (2008).

Type I error (1–95% coverage) probabilities for testing the null hypothesis
${H}_{0}:{\beta}_{{t}_{0}}^{(1)}=0\phantom{\rule{0.16667em}{0ex}}({t}_{0}=0,1,2,3)$ when the true parameter values are
${\beta}_{0}^{(1)}=0$ and the total sample sizes (*n*) are 100, 200, or 1000.

Similar settings were used for power analysis, except that the difference in median residual lifetimes between two groups were assumed to be 1, 3, and 5 at *t*_{0} = 0. Note that when
${\beta}_{0}^{(0)}=1.61$, equation (10) gives 1, 3, and 5 corresponding to
${\beta}_{0}^{(1)}=0.18,0.47$, and 0.69, respectively. From (12), recall that when *t*_{0} = 0 the scale parameter *ρ* from the Weibull distribution is given by

$${\rho}_{i,0}={\{log(2)\}}^{1/\kappa}exp(-{\beta}_{0}^{(0)}-{\beta}_{0}^{(1)}{x}_{1i}),$$

(13)

where
${\beta}_{0}^{(k)}\phantom{\rule{0.16667em}{0ex}}(k=0,1)$ is the true values of the regression parameters at *t*_{0} = 0. The covariate values of *x*_{1}* _{i}* being generated from a Bernoulli distribution with equal probability of success, failure times were simulated from

$$(1/{\rho}_{i,0}){\{log(2)\}}^{1/\kappa}=exp({\beta}_{0}^{(0)}+{\beta}_{0}^{(1)}{x}_{1i}).$$

(14)

Equation (14) gives 5 when *x*_{1}* _{i}* = 0, and it corresponds approximately to 6, 8, and 10, respectively, when

The NSABP B-04 data set has been chosen as a real data example because it has a long-term follow-up over 30 years, preserving important survival information among breast cancer patients *without* any chemo- or hormonal therapies after surgery. This information is not available from recent studies on breast cancer because a series of standard therapies have been established for the patients being randomized to the control group. In this section, we apply the proposed method to re-analyze the B-04 data to evaluate the effects of important prognostic factors in breast cancer on the *median* residual lifetimes.

In the B-04 data, there were 5 groups being compared, 3 groups in node-negative patients and 2 groups in node-positive patients. With more than a quarter century of follow-up, the proportion of patients still alive is very low as expected, i.e., less than 30% among 1,079 eligible node-negative patients with follow-up and less than 20% among 586 eligible node-positive patients with follow-up. Fisher *et al.* (2002) reported a 25-year follow-up update of the B-04 data. The original goal of the study was to prove that a less aggressive total mastectomy had an equivalent effect on patients’ survival to the traditional radical mastectomy. In this paper, the nodal status was considered as a covariate along with age at diagnosis of breast cancer and pathological tumor size to predict median residual lifetimes of breast cancer patients at any given time point.

First we consider a simple regression model that includes only one covariate of the nodal status, i.e., **Z*** _{i}* = (1,

Contour plots of surfaces of minus the square root of sum of squares of two score equations for the median residual regression model applied to NSABP B-04 data set with the only covariate of nodal status; (a) When *t*_{0} = 0, (b) When *t*_{0} = 4, and (c) When **...**

Comparison of the estimated median residual lifetimes by nodal status from various methods (NSABP B-04 data); upper 3 lines are for node-negative group and lower 3 lines are for node-positive group.

Fitted simple and multiple regression models (NSABP B-04 data); (a), Regression parameter estimates from the *median* residual life regression model with a single covariate (simple model) and associated *p*-values and 95% confidence intervals from the proposed **...**

A comparison may be also considered between the proposed and proportional hazards regression models (Cox, 1972, 1975). Suppose we have a proportional hazards model *S*(*t; Z _{i}*) =

In Figure 2, we observe that the median residual lifetime of the node-positive group is shorter than that of the node-negative group during the first 10 years. The median residual lifetime in the node-negative group is almost constant for the range of *t*_{0} values. For the node-positive group, however, the median residual lifetime slightly increases in *t*_{0}, which may imply that the node-positive patients tend to survive longer once they pass the early high-risk period. *P*-values and 95% confidence intervals from the minimum dispersion statistic (5) are provided in Table 4(a) to compare median residual lifetimes between the two nodal groups at each fixed time point *t*_{0}. The results indicate that the nodal status has a significant effect on the median residual lifetimes in early time points, but the effect diminishes once patients survive more than 7 years. This is similar to the findings in Jeong *et al.* (2008) by using a two-sample test statistic without adjusting for other prognostic factors.

Since there are no other existing methods that can be *directly* compared with the proposed method, first we have adopted the bootstrap resampling method (Efron, 1979) to assess our results. We have resampled 1000 times from the original sample with replacement and calculated standard errors of those 1000 estimates, which were 0.07, 0.10, 0.13, 0.14, 0.12, 0.11, and 0.16 for time points *t* = 0, 2, 4, 6, 7, 8, and 10, respectively. At each time point, these standard error estimates were used to construct 95% confidence intervals for
${\beta}_{{t}_{0}}^{(\mathit{node})}$ as well as to obtain *p*-values via the Wald test. The proportional hazards model (Cox, 1972, 1975) was also chosen for a comparison of the observed significance of the slope parameter. In this case, the significance of the effect of nodal status was evaluated among surviving patients beyond each fixed time point. The results from the proportional hazards analysis are presented only through *p*-values for the comparison, because the quantities being evaluated are different under the proportional hazards and time-specific median residual life models. Note that the interpretation of the slope parameter is the logarithm of the *hazard ratio* under the proportional hazards model and the logarithm of the *ratio of two median residual lifetimes* at time *t*_{0} under the proposed model, so that confidence intervals for the slope parameter from the two models cannot be compared directly. From Table 4(a), we observe that the results from the three approaches are compatible in *p*-values, although ones from the proposed method tend to be slightly larger. We also see that the resampling and proposed methods give similar 95% confidence intervals.

Now we consider the multiple regression model, where **Z*** _{i}* = (1,

These results can be used to predict a patient’s median residual lifetime at a given time *t*_{0}. For example, for a patient with positive lymph nodes, age at diagnosis of 56 (median), and pathological tumor size of 30*mm* (median), the median residual lifetime (in years) 6 years after the original diagnosis of cancer can be predicted as exp {3.92 −1 × 0.28 − 2.11 × (0.01 × 56.0) − 0.48 × (0.01 × 30.0)} = 10.1. Similarly for node-negative patients, the predicted median residual lifetime is 13.4 years. From (7), the corresponding 95% prediction intervals are (7.3,12.9) and (11.2,16.4), respectively. These estimates may be used as baseline mortality information for a future clinical study to evaluate a new therapy in terms of prolonging the patients’ remaining lifetimes.

In this paper, we proposed a regression model on quantile residual lifetimes, where the covariate effects can be directly associated with the quantile failure time among survivors beyond any specific time point, rather than inferring the quantile residual lifetime under some global regression model assumptions such as proportional hazards or accelerated failure time. An estimating equation for the regression coefficients has been formulated and justified under the proposed model. Asymptotic distributions of the proposed regression parameter estimator and test statistic to evaluate the covariate effects have been derived, and it was shown that they had reasonable finite sample properties through simulation studies. The grid search method that was used to find the roots of the estimating equation also provided reasonable solutions in our simulation studies and real example. The variance of the proposed estimators depends on the probability density function of the survival variables. However, the variance of the estimating function can be consistently estimated without specification of the survival distribution. Hence, we used the minimum dispersion statistic (Basawa and Koul, 1988) based on the distribution of the estimating function. The variance of the regression coefficient estimator may be estimated by using either the proposed confidence interval approach or the bootstrap resampling method.

Our theoretical results in Section 3 are based on the uniform consistency of the Kaplan-Meier estimator *Ŝ*(*t*) over 0 < *t* < *ζ* = sup{*t: S*(*t*)*G*(*t*) > 0}, which leads to a condition that
${t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}\mathbf{Z})<\zeta $ in our case. Here *ζ* may be considered as the maximum follow-up time, which has the lower bound
${t}_{0}+exp({\mathit{\beta}}_{\xi \mid {t}_{0}}^{\prime}\mathbf{Z})$.

As noted in Ying *et al.* (1995), to check the model assumption of (3) at a fixed time *t*_{0}, a zero-mean Gaussian process of cumulative sums of median residuals can be defined as
$W(s)={n}^{-1/2}{\sum}_{i=1}^{n}{e}_{i}I({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i}\le s)$, where

$${e}_{i}=\frac{I\{{Y}_{i}\ge {t}_{0}+exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}}{\widehat{G}\{{t}_{0}+exp({\widehat{\mathit{\beta}}}_{\xi \mid {t}_{0}}^{\prime}{\mathbf{Z}}_{i})\}}-(1-\xi )\frac{I({Y}_{i}\ge {t}_{0})}{\widehat{G}({t}_{0})}.$$

A plot of *W*(*s*) against the predicted *ξ*–quantile residual lifetimes may be used as a graphical tool to check the model assumption at time *t*_{0}.

We have assumed that survival and censoring variables are conditionally independent given covariates. By using a censoring distribution model that is independent of covariates, however, the conditional independence assumption is reduced to an unconditional independence assumption, as a referee pointed out. The latter stronger assumption holds for most well conducted clinical trials, including the NSABP trial discussed in this paper, where the major cause of censoring is administrative. If the censoring distribution depends on a discrete covariate, we may partition the data into a number of strata defined by the covariate values so that the censoring variables have an identical distribution within each stratum. If the censoring distribution depends on a continuous covariate, we may consider modeling the dependency of censoring distribution on the covariate. For example, for a continuous covariate *z _{i}*, suppose that

An associate editor raised an issue regarding how to choose quantiles and how to interpret the results if the covariate effects differ across quantiles. In theory, any quantile *ξ* such that *ξ* < {*Ŝ* (*t*_{0}) − *Ŝ*(*t _{max}*)}/

We are grateful to an associate editor and a referee for their valuable comments. Dr. Jong-Hyeon Jeong’s work was supported in part by the Department of Defense (DOD) Grant W81XWH-04-1-0605. Dr. Jong-Hyeon Jeong and Dr. Hanna Bandos’s work was supported in part by the National Health Institute (NIH) Grants 5-U10-CA69974-09 and 5-U10-CA69651-11.

Supplementary Materials

The proofs of consistency and asymptotic normality of _{ξ|t0} and asymptotic normality of
${n}^{-1/2}{\mathbf{S}}_{\xi \mid {t}_{0},n}({\mathit{\beta}}_{\xi \mid {t}_{0}}^{0})$ in Section 3 are available under the Paper Information link at the Biometrics website http://www.tibs.org/biometrics.

- Alam K, Kulasekera KB. Estimation of the quantile function of residual life time distribution. Journal of Statistical Planning and Inference. 1993;37:327–337.
- Arnold BC, Brockett PL. When does the
*β*th percentile residual life function determine the distribution? Operations Research. 1983;31:391–396. - Basawa IV, Koul HL. Large-sample statistics based on quadratic dispersion. International Statistical Review. 1988;56:199–219.
- Breslow NE. Covariate analysis of censored survival data. Biometrics. 1974;30:89–99. [PubMed]
- Coombes RC, Hall E, Gibson LJ, et al. A randomized trial of exemestane after two to three years of tamoxifen therapy in postmenopausal women with primary breast cancer. The New England Journal Of Medicine. 2004;350:1081–1092. [PubMed]
- Cox DR. Regression models and life-tables (with discussion) Journal of the Royal Statistical Society Series B. 1972;34:187–220.
- Cox DR. Partial likelihood. Biometrika. 1975;62:269–276.
- Cox DR, Oakes D. Analysis of Survival Data. London: Chapman & Hall; 1984.
- Csörgö M, Csörgö S. Estimation of percentile residual life. Operations Research. 1987;35:598–606.
- Csörgö S, Horvath L. The rate of strong uniform consistency for the product-limit estimator. Z Wahrscheinlichkeitstheorie verw Gebiete. 1983;62:411–426.
- Efron B. Bootstrap methods: another look at the jackknife. Annals of Statistics. 1979;7:1–26.
- Fisher B, Jeong J, Anderson S, et al. Twenty-five year findings from a randomized clinical trial comparing radical mastectomy with total mastectomy and with total mastectomy followed by radiation therapy. The New England Journal Of Medicine. 2002;347:567–575. [PubMed]
- Fleming TR, Harrington DP. Counting Processes and Survival Analysis. New York: Wiley; 1991.
- Gelfand AE, Kottas A. Bayesian semiparametric regression for median residual life. The Scandinavian Journal of Statistics. 2003;30:651–665.
- Goss PE, Ingle JN, Martino S, et al. A randomized trial of letrozole in postmenopausal women after five years of tamoxifen therapy for early-stage breast cancer. The New England Journal Of Medicine. 2003;349:1793–1802. [PubMed]
- Grambsch P, Therneau T. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994;81:515–26.
- Gupta RC, Langford ES. On the determination of a distribution by its median residual life function: a functional equation. Journal of Applied Probability. 1984;21:120–128.
- Jeong J-H, Jung S-H, Costantino JP. Nonparametric inference on median residual life function. Biometrics. 2008;64:157–163. [PubMed]
- Joe H, Proschan F. Percentile residual life functions. Operations Research. 1984;32:668–678.
- Jung S-H. Quasi-likelihood for median regression models. Journal of the American Statistical Association. 1996;91:251–257.
- Kaplan EL, Meier P. Nonparametric estimator from incomplete observations. Journal of the American Statistical Association. 1958;53:457–481.
- Koenker R, Bassett GJ. Regression quantiles. Econometrica. 1978;46:33–50.
- Mamounas EP, Jeong J-H, Wickerham L, et al. Benefit from exemestane as extended adjuvant therapy after 5 years of adjuvant tamoxifen: Intention-to-treat analysis of the National Surgical Adjuvant Breast and Bowel Project B-33 trial. Journal of Clinical Oncology. 2008;26:1965–1971. [PubMed]
- Mu Y, He X. Power transformation toward a linear regression quantile. Journal of the American Statistical Association. 2007;102:269–279.
- Portnoy S, Koenker R. The Gaussian hare and the Laplacian tortoise: computability of squared-error versus absolute-error estimators. Statistical Science. 1997;12:279–296.
- Schmittlein DC, Morrison DG. The median residual lifetime: a characterization theorem and an application. Operations Research. 1981;29:392–399.
- Song J, Cho G. A note on percentile residual life. Sankhya. 1995;57:333–335.
- Su JQ, Wei LJ. Nonparametric estimation for the difference or ratio of median failure times. Biometrics. 1993;49:603–607. [PubMed]
- Ying Z, Jung S-H, Wei LJ. Survival analysis with median regression models. Journal of the American Statistical Association. 1995;90:178–184.
- Wei LJ, Ying Z, Lin DY. Linear regression analysis of censored survival data based on rank tests. Biometrika. 1990;19:845–851.
- Wei Y, Pere A, Koenker R, He X. Quantile regression methods for reference growth charts. Statistics in Medicine. 2006;25:1369–1382. [PubMed]
- Whang YJ. Smoothed empirical likelihood methods for quantile regression models. Econometric Theory. 2006;22:173–205.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |