PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Ann Stat. Author manuscript; available in PMC 2010 June 29.
Published in final edited form as:
Ann Stat. 2010 June 1; 38(3): 1607–1637.
PMCID: PMC2893401
NIHMSID: NIHMS173372

QUANTILE CALCULUS AND CENSORED REGRESSION

Abstract

Quantile regression has been advocated in survival analysis to assess evolving covariate effects. However, challenges arise when the censoring time is not always observed and may be covariate-dependent, particularly in the presence of continuously-distributed covariates. In spite of several recent advances, existing methods either involve algorithmic complications or impose a probability grid. The former leads to difficulties in the implementation and asymptotics, whereas the latter introduces undesirable grid dependence. To resolve these issues, we develop fundamental and general quantile calculus on cumulative probability scale in this article, upon recognizing that probability and time scales do not always have a one-to-one mapping given a survival distribution. These results give rise to a novel estimation procedure for censored quantile regression, based on estimating integral equations. A numerically reliable and efficient Progressive Localized Minimization (PLMIN) algorithm is proposed for the computation. This procedure reduces exactly to the Kaplan–Meier method in the k-sample problem, and to standard uncensored quantile regression in the absence of censoring. Under regularity conditions, the proposed quantile coefficient estimator is uniformly consistent and converges weakly to a Gaussian process. Simulations show good statistical and algorithmic performance. The proposal is illustrated in the application to a clinical study.

AMS 2000 subject classifications: Primary 62N02, secondary 62N01
Keywords and phrases: Differential equation, estimating integral equation, piecewise-linear programming, PLMIN algorithm, quantile equality fraction, regression quantile, relative quantile, varying-coefficient model

1. Introduction

Quantile regression (Koenker & Bassett, 1978), concerning models for conditional quantile functions, has developed into a primary statistical methodology to investigate functional relationship between a response and covariates. Targeting the full spectrum of quantiles, it provides a far more complete statistical analysis than, say, classical linear regression. This technique has a long history in econometric applications. More recently, quantile regression has also been advocated for survival analysis to address evolving covariate effects which is a common phenomenon in demographic and clinical research among others. For instance, the aging process as well as the effects of its determinants can be vastly different at various stages of life (cf. Koenker & Geling, 2001). On the other hand, a clinical intervention can rarely be expected to have a constant effect, due to the time lag in reaching full effect and to drug resistance. The quantile regression model allows for varying regression coefficients and thus suits these applications well. However, a main challenge arises from censoring.

Denote the survival time by T and the censoring time by C. As a result of censoring, T is not directly observed but through follow-up time X = TC and censoring indicator Δ = I(TC), where ∧ is the minimization operator and I(·) is the indicator function. Of interest is the relationship between T and a p × 1 covariate vector Z with constant 1 as the leading component. Quantile function is an inverse of the distribution function FZ(t) [equivalent] Pr(Tt | Z). However, ambiguities arise in the presence of zero-density intervals; for example, zero mortality is not uncommon at the beginning of many clinical trials since new enrollees are relatively healthy. To be definitive, we adopt the cadlag inverse, i.e., the inverse function that is right-continuous with left-hand limits. The τ -th conditional quantile of T given Z is defined as

QZ(τ)sup{t:FZ(t)τ}   τ[0,1).
(1)

The quantile regression model postulates that

QZ(τ)=Zβ0(τ)   τ[0,1),
(2)

where β0(τ), referred to as the quantile coefficient, is a function of probability τ. This model is semiparametric in general, but nonparametric in the k-sample problem. The interest in evolving covariate effects necessitates the functional modeling of (2), which distinguishes itself from the modeling on a specific quantile as in, e.g., median regression; see Section 6 for further discussion. Note that the time scale may be, say, logarithm-transformed, and accordingly the supports of T, C and X are not necessarily restricted to the nonnegative half line. When all components of β0(τ) except the intercept are constant in τ, this model reduces to the accelerated failure time mode as studied by Buckley & James (1979) and Tsiatis (1990) among others. In this regard, the quantile regression model is a varying-coefficient generalization. To provide an interpretation, suppose for the moment that model (2) holds on the logarithmic time scale. Then, the effect of a covariate, other than the leading 1 of Z, is to stretch or compress the baseline survival time (on the original scale) with a quantile-dependent stretching or compressing factor.

With uncensored data, Koenker & Bassett (1978) generalized sample quantile and proposed regression quantile as a quantile coefficient estimator via a convex objective function. An adaptation of the well-known Barrodale–Roberts algorithm was later suggested by Koenker & D’Orey (1987) for the computation. The reliability and efficiency of this algorithm contribute to broader acceptance of the standard methodology. In the presence of censoring, Powell (1984, 1986) proposed an estimation procedure when censoring time C is always observed. His approach applies uncensored quantile regression to X as the τ-th quantile of X turns out to be QZ(τ) ∧C = Z[top top]β0(τ) ∧C.

However, in most survival studies, not only is the survival time subject to censoring but also the censoring time is unobserved for uncensored individuals. Taking the missing-data perspective of censoring, Ying, Jung & Wei (1995) and Honoré, Khan & Powell (2002) developed different methods but with the common consistent estimation requirement of the censoring distribution given covariates. This amounts to either an unconditional independence censoring mechanism, or a finite-number limitation on covariate values, or additional censoring-time modeling to achieve a root-n convergence rate of the estimated censoring distribution. Obviously, none of these is desirable. Ying et al. (2005) indicated that such a restriction may be relieved by employing smoothing techniques to nonparametrically estimate the conditional censoring distribution. As an alternative, Wang & Wang (2009) developed a method by nonparametrically estimating the conditional survival distribution via kernel smoothing. Nevertheless, Robins & Ritov (1997) argued that these smoothing-based methods may not be practical with moderate sample size in the presence of multiple continuously-distributed covariates; see also Portnoy (2003).

Our investigation focuses on the same preceding data structure, but aims to allow for generalities on both the censoring mechanism and covariates. Specifically, we consider conditional independence censoring mechanism:

TC|Z,
(3)

where [perpendicular] denotes statistical independence. This problem was first investigated by Portnoy (2003), who suggested the pivoting method employing the redistribution-to-the-right imputation scheme for censoring (Efron, 1967). The mass of censored observations is recursively redistributed to adopt standard uncensored quantile regression. However, one premise is the quantile monotonicity so that the “right”, or future, is unequivocal in the redistribution. In the k-sample case, the monotonicity holds in uncensored sample quantile, and the method reduces to the Kaplan–Meier method, i.e., taking an inverse of the Kaplan–Meier estimator. Unfortunately, uncensored quantile regression in general does not respect the monotonicity, leading to both algorithmic and analytic difficulties with Portnoy (2003). Indeed, the asymptotic properties of the estimator have not yet been established; see Neocleous, Vanden Branden & Portnoy (2006). As an alternative, Neocleous et al. (2006) advocated a closely related grid method. Most recently, Peng & Huang (2008) proposed a functional estimating function upon discovering a martingale structure, and developed a grid-based quantile coefficient estimator. Both uniform consistency and weak convergence have been established. As for the last two methods, the grid dependence, however, might not be completely satisfactory.

This article makes two main contributions to this problem. First of all, fundamental and general quantile calculus is developed on probability scale, establishing the probability-scale dynamics with allowance for zero-density intervals and discontinuities in a distribution. Second, from quantile calculus a well-defined estimator and a reliable and efficient algorithm for censored quantile regression naturally emerge on the basis of estimating integral equations. As compared with Portnoy (2003), Neocleous et al. (2006) and Peng & Huang (2008), this new approach entails neither algorithmic complications nor a probability grid. For the rest of this article, quantile calculus is presented in Section 2, and the proposed estimator and algorithm in Section 3. The asymptotic properties are investigated and an inference procedure suggested in Section 4. Section 5 presents simulation results on statistical and algorithmic performance, and an illustration with a clinical study. Section 6 concludes with discussion. All proofs are collected in the Appendices.

2. Quantile calculus

Given a survival distribution, a one-to-many mapping from probability to time scale may arise from zero-density intervals; adopting the cadlag definition of quantile function is a solution given in Section 1. Reciprocally, a one-to-many mapping from time to probability scale may also arise, resulting from distributional discontinuities. Thus, time-scale theories including counting-process martingales cannot be applied to probability scale, unless continuity restriction is imposed on the distribution. In uncensored quantile regression, this issue may be bypassed by formulating the estimation as an optimization problem. However, such an approach may not be feasible in censored quantile regression, which calls for the development of quantile calculus.

2.1. The one-sample case

Drop Z from the notation in this case. By definition, Pr{T < Q(τ)} ≤ τ ≤ Pr{TQ(τ)}. Thus, Q(τ) does not correspond to a unique probability τ when Pr{T = Q(τ)} > 0. To fill in the missing piece, we introduce the τ-th quantile equality fraction:

ξ(τ)=τPr{T<Q(τ)}Pr{T=Q(τ)},

which is the fraction of the probability mass at the quantile that brings the cumulative probability up to τ. Here and throughout, we define 0/0 [equivalent] 0. Elementary algebra then gives

Pr{T<Q(τ)}+Pr{T=Q(τ)}ξ(τ)=0T[Pr{TQ(ν)}Pr{T=Q(ν)}ξ(ν)]dν1ν   τ[0,1).
(4)

This result establishes the quantile dynamics on probability scale. More significantly, it can be readily exploited to accommodate censoring. Denote the limit of identifiability by [tau with overline] = sup{τ : Pr{C ≥ Q(τ)} > 0}.

Proposition 1. Suppose that T and C are independent and their distributions do not have jump points in common. Consider integral equation

E(Δ[I{X<q(τ)}+I{X=q(τ)}η(τ)])=E0T[I{Xq(ν)}I{X=q(ν)}η(ν)]dν1ν   τ[0,1),
(5)

where q(·) is a cadlag function and η(·) takes values in [0, 1].

  1. If q(·) = Q(·) and η(τ) = ξ(τ) for all τ such that Pr{T = Q(τ)} > 0, then equation (5) holds.
  2. If equation (5) holds, then
    q(τ)=Q(τ),
    (6)
    E[ΔI{X=q(τ)}η(τ)]=E[ΔI{X=Q(τ)}ξ(τ)],
    (7)
    for all τ [set membership] (0, [tau with overline]).

Remark 1. The condition that the distributions of T and C do not share jump points is necessary for the identifiability of the former and therefore the corresponding quantile function as well. The role of η(τ) is to split probability mass in the case of Pr{T = Q(τ)} > 0. Equation (5), however, does not determine η(τ) elsewhere. But instead of, say, setting η(τ) to 0 in those occasions, keeping the more general form would be advantageous for later developments.

2.2. Quantile coefficient dynamics

Similar to the one-sample case, we assume that

Assumption 1. The conditional distribution functions of T and C given Z do not have jump points in common for all values of Z.

Write ξZ(τ) as the τ -th quantile equality fraction for the distribution of T given Z. Generalize the definition of identifiability limit as

τ¯=sup{τ:E[Z2I{CZβ0(τ)}]isnonsingular},

where v[multiply sign in circle]2 [equivalent] vv[top top]. The one-sample result of Proposition 1 can then be extended.

Proposition 2. Suppose that quantile regression model (2) and censoring mechanism (3) hold along with Assumption 1. Consider integral equation

  E(ZΔ[I{X<Zβ(τ)}+I{X=Zβ(τ)}ηZ(τ)])=E0τZ[I{XZβ(ν)}I{X=Zβ(ν)}ηZ(ν)]dν1νλ[0,1),
(8)

where β(·) is a cadlag function and Z-dependent ηZ(·) takes values in [0, 1].

  1. If β(·) = β0(·) and, for any given Z, ηZ(τ) = ξZ(τ) for all τ such that Pr{T = QZ(τ) | Z} > 0, then equation (8) holds.
  2. In the case that both C and Z are discretely distributed, if equation (8) holds, then
    β(τ)=β0(τ),
    (9)
    E[ZΔI{X=Zβ(τ)}ηZ(τ)]=E[ZΔI{X=Zβ0(τ)}ξZ(τ)]
    (10)
    for all τ [set membership] (0, [tau with overline]).

Remark 2. The admission of β0(·) as a solution to integral equation (8) is general; no restriction on the survival and censoring distributions is imposed except for a necessary identifiability condition. But the uniqueness result of β0(·) is provided only for the case that C and Z are discretely distributed. It is also established under some other conditions in Section 4. However, we do not yet have a proof for the most general case.

Remark 3. Consider the censoring-absent special case with nonsingular E(Z[multiply sign in circle]2). Then, integral equation (8) reduces to

D(τ)=0τ{EZD(ν)}dν1ν,

where D(τ) is the left-hand side of (8). This equation has a unique and closed-form solution D(τ) = τEZ, or

E(Z[I{T<Zβ(τ)}+I{T=Zβ(τ)}ηZ(τ)])=τEZ.
(11)

Note that ηZ(τ) affects the left-hand side at a nonsmooth point only, i.e., when Pr{T = Z[top top]β(τ)} ≠ 0. With fixed ηZ(τ), the left-hand side may not be smooth in β(τ). Nonetheless, thanks to ηZ(τ), it can always be smooth in τ and therefore, the equality of (11) is attainable. As far as β(τ) is concerned, the equation is equivalent to the minimization problem with E[{TZ[top top]β(τ)}+τ{TZ[top top]β(τ)}], where a [equivalent]aI (a < 0). Thus, Proposition 2 reduces to a well-known result in uncensored quantile regression.

Remark 4. Quantile equality fraction ξZ(τ) is a nuisance parameter. When Pr(Z = z) = 0 for a given value z, Pr{T = QZ(τ) | Z = z} and thus ξz(τ) are not identifiable. Nevertheless, only quantity E[ZΔI{X = Z[top top]β0(τ)}ξZ(τ)] as a whole is relevant to integral equation (8) and it is identifiable. As evident from Remark 3, the notion of ξZ(τ) might not be necessary for uncensored quantile regression, by employing minimization. But it is instrumental for our development of censored quantile regression.

Remark 5. Proposition 2 is more general than the martingale result of Peng & Huang (2008, equation 4), whose validity is limited to the circumstance of continuous survival distribution. In that special case, the former may reduce to the latter since the mapping between time and probability scales becomes one to one and all terms involving ηZ(·) in integral equation (8) may vanish. Even so, the more general form of equation (8) is still desirable in order to derive a natural estimating integral equation by the plug-in principle. After all, an empirical distribution is always discrete, i.e., full of discontinuities and zero-density intervals.

2.3. Relative quantile

To facilitate probability-scale analysis both conceptually and algebraically, we introduce the notion of relative quantile. Anchored at the τ -th quantile, the {τ + λ(1 − τ)}-th quantile coefficient for λ [set membership] [0, 1) is referred to as the λ-th relative quantile coefficient, written as β0(λ, τ) [equivalent] β0{τ + λ(1 − τ)}. This notion provides a convenient vehicle to study quantile coefficient β0(·) forward from a given probability, similar in spirit to the concept of hazard in survival analysis.

An integral equation for β0(λ, τ) can be derived with given D(τ), the left-side of equation (8) at τ. By algebraic manipulation, integral equation (8) implies

  E(ZΔ[I{X<Zβ(λ,τ)}+I{X=Zβ(λ,τ)}ηZ(λ,τ)])D(τ)=E0τZ[I{XZβ(ν,τ)}I{X=Zβ(ν,τ)}ηZ(ν,τ)]dν1νλ[0,1),
(12)

where β(λ, τ) [equivalent] β{τ+λ(1−τ)} and ηZ(λ, τ) [equivalent] ηZ{τ+λ(1−τ)}. Apparently, this is a dual equation since equation (8) becomes a special case when τ = 0.

3. Proposed estimator and algorithm

3.1. Estimating integral equation

The data consist of (Xii,Zi), i = 1, (...), n, as n iid replicates of (X,Δ,Z). Proposition 2 leads naturally to our proposed estimation procedure based on the empirical version of integral equation (8):

i=1nZiΔi[I{Xi<Ziβ(τ)}+I{Xi=Ziβ(τ)}wi(τ)]=i=1n0τZi[I{XiZiβ(ν)}I{Xi=Ziβ(ν)}wi(ν)]dν1ν,
(13)

where wi(·) takes values in [0, 1]. Representing a convenient reparameterization of ηZ(τ) in equation (8), fraction wi(τ) serves the purpose of splitting the empirical probability mass associated with individual i when and only when Xi=Ziβ(τ). For an uncensored individual, this ensures the continuity of ϕi(τ)I{Xi<Ziβ(τ)}+I{Xi=Ziβ(τ)}wi(τ).

We shall say that b interpolates an observation (X,Δ,Z) if X = Z[top top]b.

Theorem 1. Suppose that i=1nZi2 is nonsingular. Estimating integral equation (13) admits a solution [beta](·) over τ [set membership] [0, 1) with the following properties: (i) [beta](·) is cadlag; and (ii) [beta](τ) interpolates at least p individuals and the covariate matrix for the interpolated set is of full rank, for each and every τ [set membership] [0, 1).

Remark 6. Estimating integral equation (13) also admits a solution in the case of singular i=1nZi2 by Theorem 1 upon eliminating parametrization redundancy of the quantile coefficient.

Remark 7. A subtle issue concerns the fact that identifiability limit [tau with overline] is unknown. Empirically, [tau with overline] cannot even be determined definitively to exceed any τ > 0, which may be easily seen in the one-sample case. Although it is possible to estimate [tau with overline], we do not terminate the estimation of β0(·) at such an estimate but rather provide [beta](τ) for all τ [set membership] [0, 1); the properties of [beta](·) would otherwise become more complicated. Precisely speaking, [beta](τ) is an estimator of β0(τ) provided τ < [tau with overline]. This strategy of separating the estimation of β0(τ) provided τ < [tau with overline] from that of [tau with overline] is similar to that adopted by Peng & Huang (2008). In contrast, Portnoy (2003) and Neocleous et al. (2006) terminated their estimation of β0(·) once the estimate becomes non-unique, which might partly explain the difficulties in their interval estimation.

Geometrically, [beta](τ) for each τ is a hyperplane, partitioning the sample into

{{i:Xi<Ziβ^(τ)}belowset{i:Xi=Ziβ^(τ)}interpolatedset.{i:Xi>Ziβ^(τ)}aboveset

Each interpolated individual on the hyperplane may be split in a ratio of wi(τ) : {1 − wi(τ)} to be associated with the below and above sets, respectively. This gives rise to a sample bi-partition indexed by τ, and estimating integral equation (13) governs its evolution.

3.2. Structuring the computation

Estimating integral equation (13) may be solved exactly with the proposed Progressive Localized Minimization (PLMIN) algorithm. The algorithm proceeds from the 0th quantile coefficient upward in a progressive fashion. Due to sample discreteness, [beta](·) is piecewise constant. We thus conveniently decompose the computation into sequential rounds, each involving that of a 0th relative quantile coefficient and a potential breakpoint.

Suppose that equation (13) is solved up to τ−, and thus ϕi(τ−) of every uncensored individual is available. Then, by continuity ϕi(τ) = ϕi(τ−) of uncensored individual i is determined; obviously ϕi(τ) = 0 in the case of τ = 0. Inherited from the relationship between integral equations (8) and (12), estimating integral equation (13) is equivalent to the following equation for relative quantile coefficient:

i=1nZiΔi[I{Xi<Ziβ(λ,τ)}+I{Xi=Ziβ(λ,τ)}wi(λ,τ)ϕi(τ)]=i=1n0λZi[I{XiZiβ(ν,τ))}I{Xi=Ziβ(ν,τ)}wi(ν,τ)]dν1ν,
(14)

where wi(λ, τ) [equivalent] wi{τ + λ(1 − τ)}. Since β(λ, τ) remains constant from λ = 0 up to a potential relative breakpoint, say, λb, H=i=1nZi[I{XiZiβ(λ,τ)}I{Xi=Ziβ(λ,τ)}wi(τ)] is locally constant, i.e., for λ [set membership] [0, λb). In the case that a censored individual becomes interpolated, adopt the convention that its wi(λ, τ) remains constant locally. Write L(λ) as the left-hand side of (14). Then, estimating integral equation (14) is locally equivalent to

L(λ)=0λ{HL(ν)}dν1ν    λ[0,λb),

which admits a unique solution L(λ) = λH or equivalently,

i=1nZiΔi[I{Xi<Ziβ(λ,τ)}+I{Xi=Ziβ(λ,τ)}wi(λ,τ)ϕi(τ)]=λi=1nZi[I{XiZiβ(λ,τ)}I{Xi=Ziβ(λ,τ)}wi(τ)].
(15)

Write [beta](λ, τ) [equivalent] [beta]{τ+λ(1−τ)}. Since [beta](·) is cadlag, [beta](0, τ) is the solution to the above equation with λ ↓ 0. Subsequently, λb is a λ, typically the supremum λ, such that the equation holds with β(λ, τ) = [beta](0, τ). Furthermore, wib−, τ) of every interpolated uncensored individual will be determined. Thus, solving equation (13) moves forward to τ + λb(1 − τ). The PLMIN algorithm is so named since the computation will be conveniently carried out via minimization.

3.3. Computing 0th relative quantile coefficient and potential breakpoint

With the same arguments following equation (11), solving (15) for [beta](0, τ) can be reformulated as a minimization problem:

β^(0,τ)=limλ0argminbi=1n(XiZib)    ×[I(XiZib)λ1Δi{I(Xi<Zib)ϕi(τ)}],

which no longer involves wi(·, τ). Further algebraic simplification gives

β^(0,τ)=argminbi=1n(XiZib)+
(16)

subjecttoXiZib   i𝔻{j:Δj=1,ϕj(τ)=1}Xi=Zib   i𝔻0{j:Δj=1,ϕj(τ)(0,1)},XiZib   i𝔻+{j:Δj=1,ϕj(τ)=0}

where a+ [equivalent] aI(a > 0). For the special case of the 0th quantile coefficient,

β^(0)=argminbi=1n(XiZib)+
(17)

subjecttoXiZib  i:Δi=1.

The minimization of (16) is a piecewise-linear programming problem with convex objective function, characterized by the following lemma to Theorem 1. Note that, once [beta](0, τ) is determined, so is wi(τ) of a [beta](0, τ)-interpolated uncensored individual by continuity of ϕi(τ).

Lemma 1. Suppose that the condition of Theorem 1 holds and that covariates Zi, i [set membership] D0, are linearly independent. There exists a minimizer [beta](0, τ) for problem (16) such that the covariate matrix for [beta](0, τ)-interpolated observations is of full rank. Furthermore, there exist (i) a p-member subset S of [beta](0, τ)-interpolated observations with D0 [subset or is implied by] S; and (ii) for any [beta](0, τ)-interpolated censored individual i,

wi(τ){  [0,1]ifi𝕊{0,1}otherwise

such that (iii) ZS = {Zi : i [set membership] S} is of full rank; and (iv) H^i=1nZi[I{XiZiβ^(0,τ)}I{Xi=Ziβ^(0,τ)}wi(τ)] as determined satisfies

i𝕊ZiΔiγi=H^
(18)

for some γi, where γi ≤ 0 for i [set membership] D and γi ≥ 0 for i [set membership] D+.

Piecewise-linear programming can be viewed as extended linear programming, although a [beta](0, τ)-interpolated individual may be a censored one and thus not involved in the constraints. We devise an algorithm aiming at the determination of the p-member interpolated subset S, the same strategy as the simplex method of linear programming (e.g., Gill, Murray & Wright, 1991). To locate a candidate member of S, the method of steepest descent is used. Note that a feasible value for [beta](0, τ) is readily available. In the case of τ = 0, any value with a sufficiently small intercept component is feasible. Subsequently, [beta](τ−) is feasible as necessary by continuity of ϕi(·) for uncensored individuals. The minimization along a given feasible direction is reached once an uncensored observation becomes interpolated, or potentially so if the interpolated observation is a censored one instead. The constrained space is of dimension p minus the size of D0. For [beta](0), there is no equality constraint and the dimension is p. For following 0th relative quantile coefficients, typically the dimension is 1 in which case the minimization involves only a line search. To deal with the possibility of more than p interpolated individuals, the perturbation anti-cycling technique in linear programming (e.g., Gill et al., 1991, Section 8.3.3) can be adapted. In the perturbation, one may follow a tie-breaking convention to let individuals in D+ precede censored ones, which in turn precede those in D. This minimization is numerically reliable and efficient.

The minimization determines [beta](0, τ), S, wi(τ) for each in S, and γi for each uncensored in S. Plugging them into equation (15) yields

i𝕊ZiΔi{wi(λ,τ)wi(τ)}=λi𝕊ZiΔiγi.
(19)

Simple algebra then gives the potential relative breakpoint

λb={mini𝕊:Δi=1,γi0I(γi>0)wi(τ)γi{i𝕊:Δi=1,γi0}1otherwise,
(20)

which is proper in the sense of 0 < λb ≤ 1. The lower bound of λb is obvious, whereas the upper bound can easily be established by analyzing the intercept component of equations (18) and (19). If λb = 1, the final quantile is reached. Otherwise, for those uncensored,

wi(λb,τ)=wi(τ)+λbγi  i𝕊:Δi=1.

At least one wib−, τ) above reaches 0 or 1, so is the corresponding ϕi{τ + λb(1 − τ)}. Note that λb is a breakpoint if [beta](τ) interpolates exactly p individuals; but not necessarily so otherwise. Nevertheless, of importance in both cases is that the solution moves forward in a sensible fashion.

When τ is small, S typically consists of uncensored individuals only. But as τ becomes larger, interpolated censored individuals could emerge when [beta](τ) might still be uniquely determined nonetheless. Eventually, the computation could reach a point beyond which [beta](τ) is no longer unique. Apparently, this phenomenon relates to the identifiability issue; see Remark 7. On a different note, just like uncensored quantile regression, this censored quantile regression may not respect quantile monotonicity in general.

3.4. Relationships with standard methods in special cases

In the absence of censoring, estimating integral equation (13) reduces to

i=1nZi[I{Ti<Ziβ(τ)}+I{Ti=Ziβ(τ)}wi(τ)]=τi=1nZi,
(21)

by the same approach to obtaining (11) from (8). Thus, [beta](·) is the cadlag function β(·) that minimizes i=1n[{TiZiβ(τ)}+τ{TiZiβ(τ)}], reducing to one regression quantile of Koenker & Bassett (1978); note that the Koenker–Bassett estimator is not always uniquely defined. In the mean time, 1 − ϕi(τ) becomes I{TiZiβ(τ)}I{Ti=Ziβ(τ)}wi(τ), which is the regression rank score of Gutenbrunner & Jurecková (1992).

On the other hand, in the one-sample case, [beta](·) reduces exactly to the cadlag inverse of the Kaplan–Meier estimator. It is clear from (17) that [beta](0) is the first failure time and from (20) that the breakpoint is the Nelson–Aalen estimate of the hazard at [beta](0). Subsequently and more generally, each estimated 0th relative quantile is a failure time and the relative breakpoint is the Nelson–Aalen hazard estimate. In case that the last observation is censored, the final estimated quantile is defined as this last follow-up time by convention. More generally, in the k-sample problem, [beta](·) is a linear combination of cadlag inverses of the k Kaplan–Meier estimators.

4. Asymptotic study and inference

In our developments thus far, we have kept our assumptions to minimal. But the generality challenges large-sample developments in both exposition and technicalities; see Section 6 for further discussion. In this section, we shall focus on the situation that FZ is continuous and free of zero-density intervals, and additionally C is continuously distributed. These regularity conditions were also adopted in previous investigations (Portnoy, 2003; Neocleous et al., 2006; Peng & Huang, 2008). Nevertheless, Portnoy (2003) and Neocleous et al. (2006) required the absence of censoring prior to and around the 0th quantile. On the other hand, Peng & Huang (2008) presumed that the 0th quantile is −∞. In contrast, we do not impose any conditions on the 0th quantile.

A parameter space needs to be specified. In light of the interpolation property of the estimator by Theorem 1, we require that any b in such a parameter space satisfies that E[Z[multiply sign in circle]2I{Z[top top]β0(0) ≤ Z[top top]bZ[top top]β0(1−) ∧ C}] is nonsingular. Write eigmin as the minimum eigenvalue of a matrix. Specifically, a parameter space containing β0(τ) for all τ [set membership] [0, u] is given by

𝔹(u)={b×p1:   eigminE[Z2I{Zβ0(0)ZbZβ0(1)C}]>c(u)},

where constant u < [tau with overline], compact space Cp−1 [subset or is implied by] Rp−1, and positive constant c(u) < eigminE[Z[multiply sign in circle]2I{CZ[top top] β0(u)}]. Thus, all slope components are bounded but the intercept may be −∞.

Write ‖ · ‖ as the Euclidean norm. Let [F with macron]Z(t) = 1 − FZ(t) and GZ(t) = 1 − GZ(t) = Pr(C > t | Z). Adopt the following regularity conditions:

  • C1. [tau with overline] > 0 and ‖Z‖ is bounded;
  • C2. FZ and GZ have density functions fZ and gZ, which both are continuous and bounded, uniformly in t and Z;
  • C3. β0(·) is Lipschitz-continuous on [τ1, τ2] for any τ1 and τ2 such that 0 < τ1 < τ2 < 1;
  • C4. There exist u [set membership] (0, [tau with overline]) and a parameter space B(u) such that the maximum singular value of
    Ψ(b)=E{Z2F¯Z(Zb)gz(Zb)}[E{Z2G¯Z(Zb)fz(Zb)}]1
    is bounded uniformly in b [set membership] B(u) \ [partial differential]B(u), where [partial differential] denotes the boundary.

The first two conditions are self-explanatory. Conditions C3 implies that the survival distribution does not have zero-density intervals between QZ(0) and QZ(1−). Imposing constraints on censoring, condition C4 is a sufficient and technical one to accommodate the possibility of unbounded β0(0).

Throughout this section, [beta](·) is any cadlag solution to estimating integral equation (13). The solution may not be unique, nor is the interpolation property in Theorem 1 necessary.

Theorem 2. Suppose that quantile regression model (2) and censoring mechanism (3) hold along with conditions C1–C4. Equation (8) implies β(τ) = β0(τ) for all τ [set membership] (0, u]. For any l [set membership] (0, u), supτ [set membership][l,u][beta](τ)−β0(τ) ‖ → 0 almost surely. Furthermore, n1/2{[beta](τ) − β0(τ)} converges weakly to a Gaussian process on [l, u].

Remark 8. Integral equation (8) is an initial value problem, and estimating integral equation (13) is its empirical version. Accordingly, the large-sample study as provided in Appendix D exploits classical differential equation theory and modern empirical process theory. Our study bears similarities with that of Peng & Huang (2008). Indeed, under the continuity condition of C2, equation (13) is essentially equivalent to the estimating function of Peng & Huang (2008, equation 5) since wi(·) becomes negligible; see also Remark 5. Nevertheless, we spare the inductive arguments of Peng & Huang (2008) in their asymptotic study as typically necessary for a grid method, by virtue of the fact that [beta](·) is an exact solution to equation (13). Equally noteworthy is that the generality here on the 0th quantile requires a more delicate treatment.

Remark 9. In the case that Z[top top]β0(0) is −∞ for all Z, our estimator [beta](·) is asymptotically equivalent to that of Peng & Huang (2008) provided that mesh size of the grid as required by the latter is of order o(n−1/2).

To make inference, the distribution of n1/2{[beta](·) − β0(·)} needs to be estimated. For their estimator, Peng & Huang (2008) adapted the resampling approach of Jin, Ying & Wei (2001). We adopt the same approach by perturbing estimating integral equation (13). This procedure is equivalent to a multiplier bootstrap as described in Kosorok (2008, section 2.2.3).

Theorem 3. Suppose that the conditions of Theorem 2 hold, and that nonnegative random variable ξ has unit mean and unit variance and satisfies 0Pr(ξ>x)1/2dx<. Perturb estimating integral equation (13) by assigning iid random variables of the same distribution as ξ and independent of the data to individuals in the sample, and denote a solution to the perturbed equation by [beta]*(·). On [l, u], n1/2 {[beta](·) − β0(·)} has the same asymptotic distribution as n1/2 {[beta]*(·) − [beta](·)} conditionally on the data.

The standard exponential distribution, for example, may be used to generate these perturbing random variables. By repeatedly simulating perturbed samples, the conditional distribution of [beta]*(·) can be obtained as an approximation for the distribution of [beta](·).

5. Numerical studies

The quantile regression model is formulated in β0(·). But alternative covariate-effect measures can be practically useful and were used in our application (Section 5.3). Write

μ0(τ1,τ2)(τ2,τ1)1τ1τ2β0(ν)dν.

Model (2) implies

(τ2τ1)1τ1τ2QZ(ν)dν=Zμ0(τ1,τ2),
(22)

where the left-hand side is a trimmed mean of T. Therefore, μ01, τ2) measures trimmed mean effect. This measure is versatile through the choices of τ1 and τ2. In fact, β0(τ) = limν↓τ μ0(τ, ν) is a special case. On the other hand, μ0(0, 1) is the mean effect, i.e., the regression coefficient in the linear regression model. Originally suggested as an average effect measure by Peng & Huang (2008), μ01, τ2) becomes even more appealing in light of its specific interpretation as revealed. With censored data, μ01, τ2) is identifiable when τ2[tau with overline], and a natural estimator is given by

μ^(τ1,τ2)=(τ2τ1)1τ1τ2β^(ν)dν.

Obviously, [mu]1, τ2) with 0 < τ1 < τ2u is strongly consistent and asymptotically normal under the conditions of Theorem 2. The variance can be estimated by using the simulated distribution of [beta]*(·). Our numerical experience suggested that [mu]1, τ2) behaves reasonably well even when τ1 takes 0.

5.1. Finite-sample statistical performance

Simulations were conducted to mimic a clinical trial. On the original time scale, the baseline survival distribution was standard exponential and the censoring distribution was uniform on [0, 5]. The quantile regression model held on the logarithmic time scale, with two non-constant covariates: Z1 was Bernoulli of probability 0.5 and Z2 uniform on [0, 1]. We considered two scenarios with the following conditional quantile functions:

Qz(τ)= log{log(1τ)}+(1.25τ0.5)Z1+0.5Z2,QZ(τ)= log{log(1τ)}+0.5Z1+0.5Z2.

Scenario 1 above involved a ramp-up effect of Z1, going from none to full linearly with probability τ and staying constant afterwards. In contrast, Scenario 2 followed the accelerated failure time model.

The sample size was 200. Under each scenario, simulations were conducted with 1,000 iterations. For both scenarios, the censoring rate was approximately 32%. Table 1 reports the summary statistics for the proposed τ -th quantile coefficient estimates ranging from τ = 0.1 to 0.7, along with estimates based on the pivoting method of Portnoy (2003), the grid method of Portnoy (Neocleous et al., 2006), and Peng & Huang (2008). The two Portnoy’s methods are available in R Quantreg package, of which the latest version at the time of this research, 4.20, was used. The default mesh size, 0.01 in this case, was adopted for the grid method of Portnoy, and the same mesh size for Peng & Huang. For point estimation, the pivoting method of Portnoy had large bias and variance at τ = 0.7 under both scenarios. Other than that, these estimators all had negligible bias and similar efficiency. But the bias of the proposed estimator seemed smaller. These findings are not surprising since the estimator of Peng & Huang is asymptotically equivalent to the proposed in the settings under study; see Remark 9. In addition, similar efficiency between Peng & Huang and the grid method of Portnoy was already observed in Peng & Huang (2008). For interval estimation, Peng & Huang employs the same procedure as the proposed, whereas the methods of Portnoy use bootstrap. The resampling size was set to 200 for all these methods. The standard error of the proposed estimator agreed with the standard deviation well. The Wald-type 95% confidence intervals of both the proposed and Peng & Huang achieved reasonably accurate coverage probability. In contrast, the bootstrap confidence intervals of Portnoy’s methods had undercoverage particularly for the intercept, a finding consistent with that reported in Peng & Huang (2008).

Table 1
Statistical assessment under models with two non−constant covariates

These preceding stimulation settings conform to the conditions of the asymptotic study in Section 4. Additional settings with distributional discontinuities and zero-density intervals of the survival time were also considered. One such simulation involved a setting similar to the preceding ones but having a discontinuous baseline survival distribution:

QZ(τ)=log{log(1τ0.4)}+(τ0.4)Z1+0.5Z2,

where [logical or] denotes the maximization operator. Unfortunately, the pivoting and grid methods of Portnoy as implemented in R Quantreg package had numerical difficulties and their appropriate evaluation was not permitted. Both the estimator of Peng & Huang and the proposed had negligible bias at τ = 0.1 and 0.3. However, for τ = 0.5 and 0.7, the absolute median-bias of Peng & Huang reached 0.136, 0.065, and 0.041 for the intercept and two slopes, respectively. In comparison, the absolute median-bias of the proposed estimator was no larger than 0.026 for all three coefficients. Here, medianbias is a more appropriate summary than mean-bias due to the skewness of these estimators resulting from discontinuity in the survival distribution. These results were expected since the validity of Peng & Huang is tied to the assumption of continuous survival distribution; see Remark 5.

5.2. Algorithmic performance

The proposed method was compared against the pivoting method of Portnoy, the grid method of Portnoy, and the Peng & Huang method implemented by Koenker (2008). All these existing methods are implemented with Fortran source code in R Quantreg package. The original implementation of Peng & Huang in R language was inappropriate for comparison due to the inherent slower speed of R. For the two grid methods, the default mesh size, 0.01 ∧ n−0.7/2 for sample size n, was adopted. The proposed method was also implemented in R with Fortran source code. To minimize the impact of R overhead, calling the Fortran function of a method from R was timed. The computation was performed on a Dell 2950 computer with 3.0 GHz Pentium Xeon X5365 CPUs.

The survival time followed the accelerated failure time model with p − 1 non-constant covariates:

logT=ε+m=2p(1)m12Z(m),

where ε followed the extreme-value distribution, and Z(m), m = 2,(...), p, were independent and uniformly distributed on [0, 1]. The number of non-constant covariates ranged from 1 to 8, and the sample size from 100 to 1,600. Three levels of censoring, 0%, 25%, and 50%, were investigated. Unless there was no censoring, the censoring time followed the uniform distribution between 0 and a censoring rate-determined upper bound. Computational reliability and efficiency of various methods for point estimation of the quantile coefficient process were assessed with 1,000 iterations, shown in Table 2.

Table 2
Algorithmic evaluation of timing and reliability

Both the proposed and Peng & Huang methods were reliable. However, the pivoting and grid methods of Portnoy tended to cause frequent R session crashes in case of no censoring and more covariates. Furthermore, in the presence of censoring, the pivoting method of Portnoy might terminate with warning or error messages. This rate increased with the number of covariates and censoring rate, up to 67%.

The computer time of the proposed method was roughly constant over different censoring levels, given sample size and number of covariates. Comparatively, the proposed is faster than other methods uniformly in all scenarios considered. Specifically, the pivoting method of Portnoy cost 1.6 to 6.7 times as much computer time, the grid method of Portnoy cost 1.8 to 5.9 times, and the Peng & Huang method cost 10.5 to 47.5 times. This result is remarkable since the grid methods involve much fewer grid points than breakpoints of the proposed method at larger sample size, suggesting that a grid point could be much more costly to compute.

5.3. Application to a clinical study

For illustration, we applied the proposed estimation procedure to the Mayo primary biliary cirrhosis dataset (Fleming & Harrington, 1991, app. D). Conducted at Mayo Clinic between 1974 and 1984, the study followed 418 patients with primary biliary cirrhosis, a rare but fatal chronic liver disease. One question of interest was concerned with prognostic factors associated with survival. In this analysis, we considered five baseline measures: age, edema, log(bilirubin), log(albumin), and log(prothrombin time). Two participants had incomplete measures and were thus removed. Our analysis data consisted of 416 patients, with a median follow-up time of 4.74 years and a censoring rate of 61.5%.

We adopted the quantile regression model on the logarithmic time scale, with the five baseline measures as covariates. The estimated quantile coefficient processes are shown in Figure 1, along with pointwise Wald 95% confidence intervals. The resampling size for the interval estimation was 200. The maximum cumulative probability up to which the estimated quantile coefficient was unique is 0.91. Among the five covariates, log(prothrombin time) in particular exhibited a prominent varying effect. It was negatively associated with survival time for short survivors, but the effect diminished gradually for longer survivors. This result echos findings from analyses of this dataset with other varying-coefficient models, e.g., by Tian, Zucker & Wei (2005) using the varying-coefficient Cox model. Nevertheless, this varying effect was not apparent in the model with log(prothrombin time) as the only covariate, shown in Figure 2.

Fig. 1
Analysis of the Mayo primary biliary cirrhosis data. Estimated quantile coefficient processes are shown in rugged lines, along with pointwise Wald 95% confidence intervals given by vertical bars. The three regions on which the estimated quantile coefficient ...
Fig. 2
Analysis of the Mayo primary biliary cirrhosis data with log(prothrombin time) as the only covariate. Dots and open circles represent uncensored and censored individuals, respectively. Estimated decile coefficients are shown from τ = 0 up to 0.8. ...

The graphical presentation is revealing of the covariate effect evolution. To summarize, estimated upper-trimmed mean effects and standard errors are given in Table 3. For comparison, the estimates based on the accelerated failure time model using the log-rank and Gehan estimating functions are also included. Notice that the two estimated regression coefficients deviate from each other for log(prothrombin time) with the accelerated failure time model. This disparity also suggests a lack of fit of this sub-model. In this situation, estimates from the accelerated failure time model are difficult to interpret. In contrast, the estimated upper-trimmed mean effects from the quantile regression model are meaningful, for covariates with constant or varying effects alike.

Table 3
Analysis results of the Mayo primary biliary cirrhosis data

6. Discussion

Quantile calculus as developed proves useful and effective for quantile regression. With uncensored data, it offers a new perspective of the standard regression procedure. Most importantly, censoring can be naturally accommodated, and it gives rise to our proposed censored regression via solving a well-defined estimating integral equation. To focus on the main ideas, we have not addressed second-stage inference and model diagnostics, which are practically useful. They can be developed along the lines similar to those in Peng & Huang (2008).

For survival data, alternative models exist to address varying covariate effects. One better known varying-coefficient model is the additive hazards model of Aalen (1980). There is also an extensive literature on the varying-coefficient Cox model, but most available estimation methods require smoothing; see Tian et al. (2005) and the references therein. More recently, Peng & Huang (2007) extended the class of semiparametric linear transformation models to allow for varying coefficients. In comparison with all these alternatives, the quantile regression model is particularly attractive with its simple interpretation; see Section 1.

When the inferential goal is on a specific quantile, the quantile regression model for the given τ only, or the singleton model, is of direct interest. In this case, methods for uncensored quantile regression (Koenker & Bassett, 1978) and censored one with always-observed censoring time (Powell, 1984; 1986) are still applicable. But when censoring time is only observed for censored individuals, the proposed method as well as Portnoy (2003), Neocleous et al. (2006) and Peng & Huang (2008) may not be applied unless the quantile regression model holds from the 0th through the τ -th quantile. In contrast, the approaches of Ying et al. (1995) and Honoré et al. (2002) do not necessarily require the functional model but at the price of a more restrictive censoring mechanism. A choice between these two classes of methods depends on whether functional survival-time modeling or censoring-time modeling might be more reasonable and justifiable in a specific application. The method of Wang & Wang (2009) is appealing in this regard, but might have practicality concerns in the case of multiple continuously-distributed covariates, as discussed in Section 1.

Generalizing the asymptotic results given in Section 4 is of interest, say, to allow for zero-density intervals and discontinuities in the survival distribution. Unfortunately, difficulties include the open question on solution uniqueness for integral equation (8), as indicated in Remark 2, and more. These additional ones can be readily seen in the one-sample case. First, the notion of consistency might not even be appropriate in evaluating the estimated quantile corresponding to a zero-density interval. Indeed, consistent estimation might be impossible in this circumstance but the estimated quantile is nonetheless informative of the estimand. Second, a distributional discontinuity might ruin asymptotic normality of the corresponding estimated quantile. These issues become much more complex and also have broader implication in the general case. Due to the sequential nature of the estimation, of concern are not only those corresponding quantile coefficients but also the subsequent ones. Nevertheless, it seems reasonable to conjecture that consistency and asymptotic normality might still hold for estimated quantile coefficients other than those corresponding to zero-density intervals and distributional discontinuities.

Several additional problems are also worth further investigation. First, our focus has been on the estimation of β0(τ) provided τ < [tau with overline], and the estimation of identifiability limit [tau with overline] remains to be addressed; see Remark 7. Second, the sub-model with a mixture of constant- and varying-coefficients would be useful when constant effects are determined a priori for some covariates. Efficiency gain might be expected over the more general method in this article. Third, in addition to right censoring, quantile regression with other types of censoring and truncation is also of interest. But new techniques might be needed. Finally, the proposed 0th quantile coefficient estimator as given by (17) might be of interest in its own right. In the absence of censoring, our estimator reduces to the extreme regression quantile studied by Smith (1994), Portnoy & Jurecková (1999) and Chernozhukov (2005) among others. Some of their results may be extended.

ACKNOWLEDGEMENT

The author extends his gratitude to Limin Peng for discussions over the course of this research and her review of an earlier version of this paper. He also thanks the seminar participants at the University of North Carolina at Charlotte and the reviewers for helpful comments.

APPENDIX A: PROOF OF PROPOSITION 1

Consider assertion (i). Given that η(τ) = ξ(τ) for all τ such that Pr{T = Q(τ)} > 0, equation (4) still holds when ξ(·) is replaced by η(·). Therefore,

0τ[Pr{CQ(ν)}Pr{C=Q(ν)}η(ν)]×d[Pr{T<Q(ν)}+Pr{T=Q(ν)}η(ν)]  =0τ[Pr{CQ(ν)}Pr{C=Q(ν)}η(ν)]×[Pr{TQ(ν)}Pr{T=Q(ν)}η(ν)]dν1ν   τ[0,1).

The above equation simplifies to (5) under the given conditions.

For assertion (ii), only the case of [tau with overline] > 0 needs to be considered. The definition of [tau with overline] implies E(Δ[I{XQ(τ)} − I{X = Q([tau with overline] )}ξ([tau with overline] )]) = 0. Thus,

E(Δ[I{X<Q(τ¯)}+I{X=Q(τ¯)}ξ(τ¯)])=EΔ.
(A.1)

Define τ* = sup{τ : Pr{Cq(ν)} > 0 ∀ ν [set membership] [0, τ ]}. The same argument as before gives

E(Δ[I{X<q(τ*)}+I{X=q(τ*)}η(τ*)])=EΔ.
(A.2)

Given the continuity of the left-hand side of equation (5) in τ, the above equation implies τ* > 0. Since Pr{Cq(τ)} > 0 for any τ [set membership] [0, τ*), equation (5) under the given conditions implies

Pr{T<q(τ)}+Pr{T=q(τ)}η(τ)   =0τ[1Pr{T<q(ν)}Pr{T=q(ν)}η(ν)]dν1ν   τ[0,τ*).

The above integral equation has a unique solution:

Pr{T<q(τ)}+Pr{T=q(τ)}η(τ)=τ   τ[0,τ*),

from which (6) and (7) follow for τ [set membership] (0, τ*). Furthermore, equations (A.1) and (A.2) imply τ* = [tau with overline]. This completes the proof.

APPENDIX B: PROOF OF PROPOSITION 2

Existence result (i) follows directly from Proposition 1. We now prove uniqueness result (ii) by construction. Start from τ = 0. Write H as the discrete distributional support of (C,Z), and define

τ1=sup{T:I{czβ(ν1)}=limν20I{czβ(ν2)}andczβ(ν1)ν1(0,τ]and(c,z)}.

Thus, I{CZ[top top]β(τ)} remains constant and CZ[top top]β(τ) over τ [set membership] (0, τ1) almost surely. Locally, equation (8) reduces to

D(τ)=0τ{Y(τ)D(ν)}dν1ν   τ[0,τ1),

where D(τ) is the left-hand side of (8) and Y (τ) [equivalent] E[ZI{CZ[top top]β(τ)}] is constant over τ [set membership] (0, τ1). The above equation admits a unique solution D(τ) = τY (τ), or equivalently

E(ZI{CZβ(τ)}[I{T<Zβ(τ)}+I{T=Zβ(τ)}ηZ(τ)τ])=0τ(0,τ1).

By arguments similar to Remark 3, β(τ) is the minimizer of E[{XZ[top top]β(τ) ∧ C} + τ{XZ[top top]β(τ) ∧ C}]. Recognizing that this minimization problem is the basis for Powell’s (1984, 1986) estimator, we then obtain (9) for τ [set membership] (0, τ1) so long as [tau with overline] > 0. Given (9), integral equation (8) implies

J(τ)=0τJ(ν)dν1ν   τ[0,τ1),

where J(τ) is the difference between the two sides of (10). Thus, (10) is obtained for τ [set membership] (0, τ1) by an application of the Gronwall’s inequality.

Under Assumption 1, one can show

limττ1E(Z[I{XZβ(τ)}I{X=Zβ(τ)}ηZ(τ)])=(1τ1)limττ1E[ZI{CZβ(τ)}].

Then, by taking advantage of the notion of relative quantile and integral equation (12), results (9) and (10) can be established inductively beyond τ1, up to [tau with overline].

APPENDIX C: PROOF OF LEMMA 1 AND THEOREM 1

With the developments in Section 3, it remains to establish Lemma 1. Given the existence of a feasible value for [beta](0, τ), nonnegativity of the objective function in (16) ensures the existence of a minimizer. Furthermore, note that the objective function becomes linear upon adding XiZiborXiZib for each censored individual to the constraints. Therefore, this piecewise-linear programming problem becomes the minimization of a set of linear programming problems, where each member involves additional constraints concerning censored individuals. It is known that a linear programming problem has a vertex solution if a bounded solution exists (e.g., Gill et al., 1991, Section 7.8.2). Assertion (iii) of Lemma 1 then follows.

For assertion (iv), we only consider the situation that the interpolated set is of size p; otherwise one may work with the corresponding perturbed problem. Write SC as the subset of censored individuals in S. The following two linear programming problems have the same solution as (16):

minbAbminb(A+i𝕊CZi)bsubjecttoXiZib  i𝔻subjecttoXiZib  i𝔻Xi=Zib  i𝔻0,Xi=Zib  i𝔻0,XiZib  i𝔻+XiZib  i𝔻+XiZib  i𝕊CXiZib  i𝕊C

where A=i𝔻0{1wi(τ)}Zi+i𝔻+Zi+i:Δi=0,Xi>Ziβ^(0,τ)Zi. Of course, the above two coincide in the case of SC = [empty]. Applying Gill et al. (1991, theorem 7.8.1) yields

A=i𝕊Ziγi(1),   A+i𝕊CZi=i𝕊Ziγi(2),

for some γi(·), where γi(·)0 for i [set membership] D, γi(·)0 for i [set membership] D+, and γi(1)0andγi(2)0 for i [set membership] SC. Since ZS is of full rank, γi(1)=γi(2) for i [set membership] S \ SC and γi(1)=γi(2)1 for i [set membership] SC. Therefore, Ĥ as determined upon setting wi(τ)=γi(2)[0,1] for i [set membership] SC satisfies equation (18), with γi=γi(·) for i [set membership]S \ SC. This completes the proof.

APPENDIX D: PROOF OF THEOREM 2

Similar to Peng & Huang (2008), we introduce monotone maps Γ1(b) = E{ZΔI(XZ[top top]b)} and Γ2(b) = E{ZI(XZ[top top]b)}. Write their empirical counterparts as Γ^1(b)=n1i=1nZiΔiI(XiZib)andΓ^2(b)=n1i=1nZiI(XiZib) Under condition C3, Γ1(b) is a one-to-one map for b [set membership] B(u) and Γ11(·) exists. Write H(a)=Γ2{Γ11(a)} and note H(a)/a=Ψ{Γ11(a)}Π, where Ψ(·) is defined in condition C4 and Π is the p × p identity matrix.

Identifiability. Write α(τ) = Γ1{β(τ)}, and integral equation (8) can be written as

α(τ)=0τH{α(ν)}dν1ν;

note that terms involving ηZ(·) vanish under condition C2. Condition C4 implies that H(a) is Lipschitz-continuous. The Picard–Lindelöf theorem then asserts the solution uniqueness, i.e., α(·) = α0(·) [equivalent] Γ1{β0(τ)}. It follows that β(τ) = β0(τ) for all τ [set membership] (0, u].

Consistency. It is known that {I(XZ[top top]b) : b [set membership] Rp} is Donsker (e.g., Kosorok, 2008, lemma 9.12). Furthermore, Z is bounded under condition C1. By permanence property of the Donsker class, {ZΔI(XZ[top top]b) : b [set membership] Rp} is Donsker. So is {ZI(XZ[top top]b) : b [set membership] Rp} by similar arguments. Since Donsker implies Glivenko–Cantelli, almost surely

supbpΓj^(b)Γj(b)=0(1)   j=1,2.
(A.3)

On the other hand, condition C2 implies that supb[set membership]Rp i=1nI(Xi=Zib)p almost surely. Then, coupled with condition C1, with any wi [set membership] [0, 1], almost surely

supbpn1i=1nZiΔiI(Xi=Zib)(wi1)=O(n1),supbpn1i=1nZiI(Xi=Zib)wi=O(n1).

Therefore, almost surely

supτ[0,u]Γ^1{β^(τ)}0τΓ^2{β^(ν)}dν1ν=O(n1),
(A.4)

since equation (13) can be written as

Γ^1{β(τ)}+n1i=1nZiΔiI{Xi=Ziβ(τ)}{wi(τ)1}=0τ[Γ^2{β(ν)}n1i=1nZiI{Xi=Ziβ(ν)}wi(ν)]dν1ν

and [beta](·) is a solution.

Following equations (A.3) and (A.4), almost surely

supτ[0,u]α^(τ)0τH{α^(ν)}dν1ν=o(1),

where [alpha](τ) = Γ1{[beta](τ)}. Write L as the Lipschitz constant of H(·). Thus, for every ε > 0 and with sufficiently large n, almost surely

α^(τ)αo(τ)0τH{α^(ν)}H{α0(ν)}dν1ν+ε0τLα^(ν)α0(ν)dν1ν+ε,

which leads to

α^(τ)α0(τ)ε(1τ)L   τ[0,u],
(A.5)

by the Gronwall’s inequality. Therefore, [alpha](τ) is strongly consistent for α0(τ) uniformly in τ [set membership] [0, u].

It remains to show that, for any ε > 0, there exists δ > 0 such that supτ[set membership][l,u] ‖α(τ) − α0(τ) ‖ < δ implies supτ [set membership][l,u] ‖β(τ) − β0(τ) ‖ < ε. Suppose that the assertion is false. Thus, for each δ > 0, there exists (b, ν) such that ‖Γ1(b) − α0(ν) ‖ < δ and ‖b − β0(ν) ‖ > d for some constant d > 0. Then, there is a subsequence of (b, ν) that converges to, say, (b0, ν0). This means that Γ1(b0) = Γ100)} but b0 ≠ β00). However, conditions C1 and C3 imply that fZ{Z[top top]β0(τ)} is bounded below away from 0 uniformly in τ [set membership] [l, u] and Z. Therefore, [partial differential]Γ1(b)/[partial differential]b[top top] = E{Z[multiply sign in circle]2GZ(Z[top top]b)fZ(Z[top top]b)} at b = β00) is positive definite, which along with the monotonicity of Γ1(·) gives rise to a contradiction.

Weak convergence.

Lemma A.1 Under the conditions in Theorem 2,

supτ[0,u]Γ^1{β^(τ)}Γ1{β^(τ)}Γ^1{β0(τ)}+Γ1{β0(τ)}=op(n1/2),
(A.6)

supτ[0,u]0τ[Γ^2{β^(ν)}Γ2{β^(ν)}Γ^2{β0(ν)}+Γ2{β0(ν)}]dν1ν=op(n1/2).
(A.7)

Proof of Lemma A.1. Consider (A.6) first. Since {ZΔI(XZ[top top]b) : b [set membership] Rp} is Donsker, n1/2{[Gamma]1(b) − Γ1(b)} converges weakly to a tight Gaussian process. The tightness implies that, for every ε > 0 and m = 1, (...), p,

limδ0limsupnPr(supb1,b2:σ[n1/2{Γ^1(m)(b1)Γ^1(m)(b2)}]<δ   n1/2|Γ^1(m)(b1)Γ1(m)(b1)Γ^1(m)(b2)+Γ1(m)(b2)|>ε)=0,

where σ(·) denotes standard deviation and superscript (m) the m-th component of a vector; see, e.g., Kosorok (2008). Furthermore, note that

σ2[n1/2{Γ^1(m)(b1)Γ^1(m)(b2)}]     =σ2[Z(m)Δ{I(XZb1)I(XZb2)}]     E{Z(m)2Δ|I(XZb1)I(XZb2)|}.

Write [Upsilon]0, [beta], τ) = E[Δ|I(XZ[top top]b) − I{X ≤ Z[top top]β0(τ)}|] |b=[beta](τ). Given condition C1, it then suffices to show sup

supτ[0,u]ϒ(β0,β^,τ)=o(1)
(A.8)

almost surely. Let cf be the upper bound of fZ(·). Apparently,

ϒ(β0,β^,τ)cfE[|Z{bβ0(τ)}|]|b=β^(τ)cfβ^(τ)β0(τ)EZ.

Following the consistency of [beta](·), for every l > 0,

supτ[l,u]ϒ(β0,β^,τ)=o(1)
(A.9)

almost surely. On the other hand,

ϒ(β0,β^,τ)Γ1(1){β^(τ)}+Γ1(1){β0(τ)}2Γ1(1){β0(τ)}+|Γ1(1){β^(τ)}Γ1(1){β0(τ)}|.

Therefore, following the consistency of [alpha](·), for every ε > 0, there exists τε > 0 such that

supτ[0,τe]ϒ(β0,β^,τ)<ε
(A.10)

almost surely for sufficiently large n. Combining (A.9) and (A.10) gives (A.8).

Now consider (A.7). Arguments similar to the above establish that, for every l > 0,

supτ[l,u]Γ^2{β^(τ)}Γ2{β^(τ)}Γ^2{β0(τ)}+Γ2{β0(τ)}=op(n1/2).
(A.11)

Since supb[set membership]Rp[Gamma]2(b) − Γ2(b)‖ = Op(n−1/2), for every ε > 0, there exists τε > 0 such that

Pr(supτ[0,τe]0τn1/2[Γ^2{β^(ν)}Γ2{β^(ν)}      Γ^2{β0(ν)}+Γ2{β0(ν)}]dν1ν>ε)0.
(A.12)

Then, (A.7) follows from (A.11) and (A.12).

Plugging equations (A.6) and (A.7) into (A.4) yields that, for τ [set membership] [0, u],

α^(τ)α0(τ)0τ[H{α^(ν)}H{α0(ν)}]dν1ν+op(n1/2)      =Γ^1{β0(τ)}+0τΓ^2{β0(ν)}dν1ν,

where op(·) is uniform for τ [set membership] [0, u]. Under conditions C2 and C4, H{[alpha](τ)}− H0(τ)} = −[Ψ {β0(τ)}+Π+o(1)][[alpha](τ)−α0(τ)] almost surely. Therefore,

α^(τ)α0(τ)+0τ[Ψ{β0(ν)}+Π][α^(ν)α0(ν)]dν1ν     +op(n1/2+α^(·)α0(·))  =Γ^1{β0(τ)}+0τΓ^2{β0(ν)}dν1ν.
(A.13)

The remaining proof is sketched since it essentially follows that of theorem 2 in Peng & Huang (2008), where more details can be found. Note that the right-hand side of (A.13) is a martingale and converges weakly to a Gaussian process by the martingale central limit theorem (e.g., Fleming & Harrington, 1991). Furthermore, (A.13) as a differential equation of [alpha](·) − α0(·) can be solved by using product integration theory (Gill & Johansen, 1990), establishing that [alpha](·)−α0(·) as a linear map of the right-hand side converges weakly to a Gaussian process. The weak convergence of [beta](·) then follows by the functional delta method.

APPENDIX E: PROOF OF THEOREM 3

Throughout, a quantity based on the perturbed sample is denoted by adding an asterisk. For example, Γ^1*(b) is the perturbed version of [Gamma]1(b).

The same arguments of the consistency proof in Appendix D may be used to show the strong consistency of [alpha]*(·) for α0(·) on [0, u] and that of [beta]*(·) for β0(·) on [l, u], upon establishment of the following two results. First, by Kosorok (2008, theorem 10.13), almost surely

supbpΓ^j*(b)Γj(b)=o(1)   j=1,2.

Second, the terms involving wi(·) in the perturbed version of equation (13) are negligible, by the fact that the maximum of the n iid perturbing random variables is almost surely o(n1/2) (Owen, 1990, lemma 3).

By an unconditional multiplier central limit theorem (Kosorok, 2008, theorem 10.1 and corollary 10.3), n1/2{Γ^j*(·)Γj(·)},j=1,2 converge weakly to tight processes. The arguments in the proof of Lemma A.1 then can be used to establish

supτ[0,u]Γ^1*{β^*(τ)}Γ1{β^*(τ)}Γ^1*{β0(τ)}+Γ1{β0(τ)}=op(n1/2),supτ[0,u]0τ[Γ^2*{β^*(ν)}Γ2{β^*(ν)}Γ^2*{β0(ν)}+Γ2{β0(ν)}]dν1ν=op(n1/2).

Thus, along the lines to establish equation (A.13), one obtains

α^*(τ)α0(τ)+0τ[Ψ{β0(ν)}+Π][α^*(ν)α0(ν)]dν1ν+op(n1/2+α^*(·)α0(·))=Γ^1*{β0(τ)}+0τΓ^2*{β0(ν)}dν1ν,
(A.14)

given that a perturbing random variable is almost surely o(n1/2).

Following from equations (A.13) and (A.14),

  α^*(τ)α^(τ)+0τ[Ψ{β0(ν)}+Π][α^*(ν)α^(ν)]dν1ν   +op(n1/2+α^*(·)α^(·))=Γ^1*{β0(τ)}+Γ^1{β0(τ)}+0τ[Γ^2*{β0(ν)}Γ^2{β0(ν)}]dν1ν,
(A.15)

since ‖[alpha](·) − α0(·) ‖ = Op(n−1/2). Note that both ΔI{XZ[top top]β0(τ)} and 0τI{XZβ0(ν)}(1ν)1dν are monotone in τ. Therefore, {ΔI{XZβo(τ)}0τI{XZβ0(ν)}(1ν)1dν:τ[0,u]} is Donsker and so is {Z[ΔI{XZβ0(τ)}0τI{XZβ0(ν)}(1ν)1dν]:τ[0,u]}. By a conditional multiplier central limit theorem (Kosorok, 2008, theorem 10.4), the right-hand side of (A.15) conditionally on the data converges weakly to the same Gaussian process as the right-hand side of equation (A.13). Then, the assertion of Theorem 3 follows the arguments at the end of Appendix D.

Footnotes

*Supported in part by grants R01 CA090747 and R01 AI078835 from the U.S. National Institutes of Health

References

  • Aalen OO. A model for nonparametric regression analysis of counting processes. In: Klonecki W, Kozek A, Rosiński J, editors. Lecture Notes in Statistics, Volume 2. New York: Springer-Verlag; 1980. pp. 1–25.
  • Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436.
  • Chernozhukov V. Extremal quantile regression. Ann. Statist. 2005;33:806–839.
  • Efron B. Proceedings of the Fifth Berkeley Symposiums in Mathematical Statistics IV. New York: Prentice-Hall; 1967. The two sample problem with censored data; pp. 831–853.
  • Fleming TR, Harrington DP. Counting Processes and Survival Analysis. New York: Wiley; 1991.
  • Gill PE, Murray W, Wright MH. Numerical Linear Algebra and Optimization. Redwood City, CA: Addison–Wesley; 1991.
  • Gill RD, Johansen S. A survey of product-integration with a view towards application in survival analysis. Ann. Statist. 1990;18:1501–1555.
  • Gutenbrunner C, Jurecková J. Regression rank scores and regression quantiles. Ann. Statist. 1992;20:305–330.
  • Honoré B, Khan S, Powell JL. Quantile regression under random censoring. J. Economet. 2002;109:67–105.
  • Jin Z, Ying Z, Wei LJ. A simple resampling method by perturbing the minimand. Biometrika. 2001;88:381–390.
  • Koenker R. Censored quantile regression redux. J. Statist. Software. 2008;27(Issue 6):1–25.
  • Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46:33–50.
  • Koenker R, D’Orey V. Computing regression quantiles. Appl. Statist. 1987;36:383–393.
  • Koenker R, Geling O. Reappraising medfly longevity: a quantile regression survival analysis. J. Am. Statist. Assoc. 2001;96:458–468.
  • Kosorok MR. Introduction to Empirical Processes and Semi-parametric Inference. New York: Springer; 2008.
  • Neocleous T, Vanden Branden K, Portnoy S. Correction to censored regression quantiles by S. Portnoy, 98 (2003), 1001–1012. J. Am. Statist. Assoc. 2006;101:860–861.
  • Owen A. Empirical likelihood ratio confidence regions. Ann. Statist. 1990;18:90–120.
  • Peng L, Huang Y. Survival Analysis with Temporal Covariate Effects. Biometrika. 2007;94:719–733.
  • Peng L, Huang Y. Survival analysis with quantile regression models. J. Am. Statist. Assoc. 2008;103:637–649.
  • Portnoy S. Censored regression quantiles. J. Am. Statist. Assoc. 2003;98:1001–1012.
  • Portnoy S, Jurecková J. On extreme regression quantiles. Extremes. 1999;2:227–243.
  • Powell JL. Least absolute deviations estimation for the censored regression model. J. Economet. 1984;25:303–325.
  • Powell JL. Censored regression quantiles. J. Economet. 1986;32:143–155.
  • Robins JM, Ritov Y. Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Statist. Med. 1997;16:285–319. [PubMed]
  • Smith R. Nonregular regression. Biometrika. 1994;81:173–183.
  • Tian L, Zucker D, Wei LJ. On the Cox model with time-varying regression coefficients. J. Am. Statist. Assoc. 2005;100:172–183.
  • Tsiatis AA. Estimating regression parameters using linear rank tests for censored data. Ann. Statist. 1990;18:354–372.
  • Wang H, Wang L. Locally weighed censored quantile regression. J. Am. Statist. Assoc. 2009;104:1117–1128.
  • Ying Z, Jung SH, Wei LJ. Survival analysis with median regression models. J. Am. Statist. Assoc. 1995;90:178–184.