Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Sci China Ser A Math Phys Astron. Author manuscript; available in PMC 2010 April 21.
Published in final edited form as:
Sci China Ser A Math Phys Astron. 2009 June; 52(6): 1131–1133.
doi:  10.1007/s11425-009-0134-3
PMCID: PMC2857738

A note on Optimal weights and variable selections for multivariate survival data

Fan et al. are to be congratulated for this important contribution to the analysis of multivariate failure time data. They have provided three regression parameter estimators for multiple covariates in the marginal hazard model. Using the weighted estimating equation approach, they proposed sets of weights to:

  1. Minimize the componentwise variance of each parameter estimate,
  2. Minimize the sum of the variances of each parameter estimate, or
  3. Minimize each element of the covariance matrix.

They showed that each of their proposed estimators can consistently outperform estimates derived using the working independence model.

In this short note, we show that in the presence of high-dimensional covariates Fan et al.’s ideas can be combined with those of [1] to achieve these optimal estimates along with simultaneous variable selection. That is, our interest lies in controlling the variances of the estimates of β = (β1,…,βp)T associated with high dimensional covariates, while simultaneously selecting the “important” covariates in order to construct a parsimonious model. Here, we consider p as a large but fixed constant, as opposed to [1] which considered the situation where p may increase with the sample size n.

The key idea is to add a penalty function pλj (|βj|) to Fan et al.’s weighted partial likelihood function (10), which leads to the following penalized likelihood function


Denote by β0 = (β01,…,β0p)T the true value of β and suppose, without loss of generality, that β0k ≠ 0, k [less-than-or-eq, slant] s and β0k = 0, k > s for some s [less-than-or-eq, slant] p. Let β01 = (β01,…,β0s)T. Finally, let [beta] = ([beta]1,[beta]2)T be the solution that maximizes (1) such that [beta]1 = ([beta]1,… ,[beta]s)T and [beta]2 = ([beta]s+1,…,[beta]p)T.

With an appropriate penalty function, our estimator [beta] may enjoy the oracle property. That is, the procedure should select the true model with probability tending to 1 and, given the true model, the coefficient estimates should asymptotically behave like maximum (partial) likelihood estimators. More specifically, consider

an=max 1js{|pλjn(|βj0|)|:βj00},bn=max 1js{|pλjn(|βj0|)|:βj00}.

Here, since λj depends on n, we write it as λjn. Along the lines of Theorem 2 of [1], we can show that if the penalty function is such that an=O(n12) and bn → 0, and in addition λjn → 0 and λjnn0,

β^2=0 with probability approaching 1, andn1/2(AW11+Σ11){β^1β01+(AW11+Σ11)1bn}dN(0,VW11),


  • AW11 is the first s × s submatrix of AW(β)=j=1JwjΣj(β),
  • Σ11 is the first s × s submatrix of diag{pλ1n(|β01|)sgn(β01),,pλpn(|β0p|)sgn(β0p)},
  • VW11 is the first s × s submatrix of ΣW (β), which was defined in Fan et al.’s (11), and
  • bn=(pλ1n(|β01|),,pλsn(|β0s|))T.

Therefore, by (2) we have that the asymptotic covariance of nβ^1 is


where ΣW*P is the penalized version of var([beta]W) defined in section 2.3 of Fan et al. One example of an appropriate pλj (θ) is the smoothly clipped absolute deviation penalty of [2], where


. More examples can be found in [3].

We can now follow Fan et al. and define optimality criteria that will allow us to simultaneously estimate the optimal weights w = (w1,…,wJ). Minimizing the component-wise variance may not be ideal because minimizing the variance of the [beta]j, j > s is irrelevant if the true β0j = 0. Minimizing the variance of any arbitrary linear function of the parameter estimates is also not always feasible, as was explained in Subsection 3.3 of Fan et al. Hence, we focus on minimizing the total variance:

minw tr(ΣW*P),

analogous to (14). Following the derivation in Subsection 3.2, we assume that Σj(β) ≈ bjΓ for some Γ. Thus, if we constrain j=1Jwjbj=1, we are to minimize


over w, where Γ11 and Dkl11 are the first s × s submatrices of Γ and Dkl(β), respectively. If HP is a symmetric matrix with diagonal elements tr([Γ11 + Σ11]−1Σj11[Γ11 + Σ11]−1) and off-diagonal elements tr([Γ11 + Σ11]−1Dkl11[Γ11 + Σ11]−1), the solution is given by


where b = (b1,…,bJ)T. Γ11 can be estimated by Γ11=1Jj=1JΣ^11, and suggestions for possible choices for b were given in Subsection 3.2.

Finally, we can choose the parameters λj by iteratively minimizing the generalized cross-validation statistic of Cai et al. and solving for the optimal weights wj. Let


If λ = (λ1,…,λp)T, we choose


We can first assume the working independence model to choose the initial λj with (5), then use those λj to solve for the wj with (4). We can then iterate between (5) and (4) until λj and wj converge.

To conclude, we have suggested a way in which the work of Fan et al. can be extended to perform variable selection on models for multivariate survival times with high dimensional covariates, while simultaneously providing optimally efficient parameter estimates for the selected covariates. Our future direction lies in utilizing empirical date and simulations to evaluate the accuracy and stability of the variable selection process, as well as the performance of the estimates derived using these optimal weights.


This work was supported by U. S. National Cancer Institute (Grant No. R01 CA95747)


1. Cai J, Fan J, Li R, et al. Variable selection for multivariate failure time data. Biometrika. 2005;92:303–316. [PMC free article] [PubMed]
2. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc. 2001;96:1348–1360.
3. Johnson BA, Li DY, Zeng D. Penalized estimating functions and variable selection in semiparametric regression models. J Amer Statist Assoc. 2008;103:672–680. [PMC free article] [PubMed]