Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2857738

Formats

Article sections

Authors

Related links

Sci China Ser A Math Phys Astron. Author manuscript; available in PMC 2010 April 21.

Published in final edited form as:

Sci China Ser A Math Phys Astron. 2009 June; 52(6): 1131–1133.

doi: 10.1007/s11425-009-0134-3PMCID: PMC2857738

NIHMSID: NIHMS186527

Zhao Sihai Dave and LI Yi

Department of Biostatistics, Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, MA 02115, USA

Zhao Sihai Dave: ude.dravrah.saf@oahzs; LI Yi: ude.dravrah.ymmij@iliy

Fan et al. are to be congratulated for this important contribution to the analysis of multivariate failure time data. They have provided three regression parameter estimators for multiple covariates in the marginal hazard model. Using the weighted estimating equation approach, they proposed sets of weights to:

- Minimize the componentwise variance of each parameter estimate,
- Minimize the sum of the variances of each parameter estimate, or
- Minimize each element of the covariance matrix.

They showed that each of their proposed estimators can consistently outperform estimates derived using the working independence model.

In this short note, we show that in the presence of high-dimensional covariates Fan et al.’s ideas can be combined with those of [1] to achieve these optimal estimates along with simultaneous variable selection. That is, our interest lies in controlling the variances of the estimates of **β** = (β_{1},…,β_{p})^{T} associated with high dimensional covariates, while simultaneously selecting the “important” covariates in order to construct a parsimonious model. Here, we consider *p* as a large but fixed constant, as opposed to [1] which considered the situation where *p* may increase with the sample size *n*.

The key idea is to add a penalty function *p*_{λj} (|β_{j}|) to Fan et al.’s weighted partial likelihood function (10), which leads to the following penalized likelihood function

$${l}_{W}^{P}(\beta )={\displaystyle \sum _{j=1}^{J}{w}_{j}{l}_{j}(\beta )-n{\displaystyle \sum _{j=1}^{p}{p}_{{\lambda}_{j}}(|{\beta}_{j}|)}}.$$

(1)

Denote by **β**_{0} = (β_{01},…,β_{0p})^{T} the true value of **β** and suppose, without loss of generality, that β_{0k} ≠ 0, *k* *s* and β_{0k} = 0, *k* > *s* for some *s* *p*. Let **β**_{01} = (β_{01},…,β_{0s})^{T}. Finally, let = (_{1},_{2})^{T} be the solution that maximizes (1) such that _{1} = (_{1},… ,_{s})^{T} and _{2} = (_{s+1},…,_{p})^{T}.

With an appropriate penalty function, our estimator may enjoy the oracle property. That is, the procedure should select the true model with probability tending to 1 and, given the true model, the coefficient estimates should asymptotically behave like maximum (partial) likelihood estimators. More specifically, consider

$$\begin{array}{c}{a}_{n}=\underset{1\u2a7dj\u2a7ds}{\text{max}}\{|{p}_{{\mathrm{\lambda}}_{\mathit{\text{jn}}}}^{\prime}(|{\beta}_{j0}|)|:{\beta}_{j0}\ne 0\},\hfill \\ {b}_{n}=\underset{1\u2a7dj\u2a7ds}{\text{max}}\{|{p}_{{\mathrm{\lambda}}_{\mathit{\text{jn}}}}^{\u2033}(|{\beta}_{j0}|)|:{\beta}_{j0}\ne 0\}.\hfill \end{array}$$

Here, since λ_{j} depends on *n*, we write it as λ_{jn}. Along the lines of Theorem 2 of [1], we can show that if the penalty function is such that
${a}_{n}=O\phantom{\rule{thinmathspace}{0ex}}({n}^{-\frac{1}{2}})$ and *b _{n}* → 0, and in addition λ

$$\begin{array}{c}{\widehat{\mathit{\beta}}}_{2}=0\text{with probability approaching}1,\text{and}\hfill \\ {n}^{1/2}({\mathbf{A}}_{W11}+{\mathbf{\Sigma}}_{11})\{{\widehat{\mathit{\beta}}}_{1}-{\mathit{\beta}}_{01}+{({\mathbf{A}}_{W11}+{\Sigma}_{11})}^{-1}{\mathbf{b}}_{n}\}{\to}_{d}N(0,{\mathbf{V}}_{W11}),\hfill \end{array}$$

(2)

where

**A**_{W11}is the first*s*×*s*submatrix of ${\mathbf{A}}_{W}(\mathit{\beta})={\displaystyle {\sum}_{j=1}^{J}{w}_{j}{\mathbf{\Sigma}}_{j}(\mathit{\beta})}$,**Σ**_{11}is the first*s*×*s*submatrix of $\text{diag}\{{p}_{{\mathrm{\lambda}}_{1n}}^{\u2033}(|{\beta}_{01}|)\text{sgn}({\beta}_{01}),\dots ,{p}_{{\mathrm{\lambda}}_{\mathit{\text{pn}}}}^{\u2033}(|{\beta}_{0p}|)\text{sgn}({\beta}_{0p})\}$,**V**_{W11}is the first*s*×*s*submatrix of**Σ**_{W}(**β**), which was defined in Fan et al.’s (11), and- ${\mathbf{b}}_{n}={({p}_{{\mathrm{\lambda}}_{1n}}^{\prime}(|{\beta}_{01}|),\dots ,{p}_{{\mathrm{\lambda}}_{\mathit{\text{sn}}}}^{\prime}(|{\beta}_{0s}|))}^{\mathrm{T}}$.

Therefore, by (2) we have that the asymptotic covariance of $\sqrt{n}{\widehat{\mathit{\beta}}}_{1}$ is

$${\mathbf{\Sigma}}_{W}^{*P}={({\mathbf{A}}_{W11}+{\mathbf{\Sigma}}_{11})}^{-1}{\mathbf{V}}_{W11}({\mathbf{A}}_{W11}+{\mathbf{\Sigma}}_{11}),$$

(3)

where
${\mathbf{\Sigma}}_{W}^{*P}$ is the penalized version of var(_{W}) defined in section 2.3 of Fan et al. One example of an appropriate *p*_{λj} (θ) is the smoothly clipped absolute deviation penalty of [2], where

$${p}_{\mathrm{\lambda}}^{\prime}(\theta )=\mathrm{\lambda}I(\theta \u2a7d\mathrm{\lambda})+\frac{{(a\mathrm{\lambda}-\theta )}_{+}}{a-1}I(\theta >\mathrm{\lambda}).$$

. More examples can be found in [3].

We can now follow Fan et al. and define optimality criteria that will allow us to simultaneously estimate the optimal weights **w** = (*w*_{1},…,*w _{J}*). Minimizing the component-wise variance may not be ideal because minimizing the variance of the

$$\underset{\mathbf{w}}{\text{min}}\text{tr}({\mathbf{\Sigma}}_{W}^{*P}),$$

analogous to (14). Following the derivation in Subsection 3.2, we assume that **Σ**_{j}(β) ≈ *b _{j}*

$$\text{tr}\left({\displaystyle \sum _{j=1}^{J}{w}_{j}^{2}{[{\mathbf{\Gamma}}_{11}+{\mathbf{\Sigma}}_{\mathbf{11}}]}^{-1}{\mathbf{\Sigma}}_{j11}{[{\mathbf{\Gamma}}_{11}+{\mathbf{\Sigma}}_{11}]}^{-1}+{\displaystyle \sum _{k\ne l}^{J}{w}_{k}{w}_{l}}{[{\mathbf{\Gamma}}_{11}+{\mathbf{\Sigma}}_{11}]}^{-1}{\mathbf{D}}_{\mathit{\text{kl}}11}{[{\mathbf{\Gamma}}_{11}+{\mathbf{\Sigma}}_{11}]}^{-1}}\right)$$

over **w**, where **Γ**_{11} and **D**_{kl11} are the first *s* × *s* submatrices of **Γ** and **D**_{kl}(β), respectively. If **H**^{P} is a symmetric matrix with diagonal elements tr([**Γ**_{11} + **Σ**_{11}]^{−1}**Σ**_{j11}[**Γ**_{11} + **Σ**_{11}]^{−1}) and off-diagonal elements tr([**Γ**_{11} + **Σ**_{11}]^{−1}**D**_{kl11}[**Γ**_{11} + **Σ**_{11}]^{−1}), the solution is given by

$$\mathbf{w}={({\mathbf{H}}^{P})}^{-1}\mathbf{b}/{\mathbf{b}}^{\mathrm{T}}{({\mathbf{H}}^{P})}^{-1}\mathbf{b},$$

(4)

where **b** = (*b*_{1},…,*b _{J}*)

Finally, we can choose the parameters λ_{j} by iteratively minimizing the generalized cross-validation statistic of Cai et al. and solving for the optimal weights *w _{j}*. Let

$$\begin{array}{c}{\mathbf{\Sigma}}_{\mathrm{\lambda}}(\widehat{\mathit{\beta}})=\text{diag}({p}_{{\mathrm{\lambda}}_{1}}^{\prime}(|{\widehat{\beta}}_{1}|/|{\widehat{\beta}}_{1}|),\dots ,{p}_{\mathrm{\lambda}p}^{\prime}(|{\widehat{\beta}}_{p}|)/|{\widehat{\beta}}_{p}|)),\hfill \\ e({\mathrm{\lambda}}_{1},\dots ,{\mathrm{\lambda}}_{p})=\text{tr}\left\{{\left[{\displaystyle \sum _{j=1}^{J}{w}_{j}{l}_{j}^{\u2033}(\widehat{\beta})}-{\mathbf{\Sigma}}_{\mathrm{\lambda}}\right]}^{-1}\left[{\displaystyle \sum _{j=1}^{J}{w}_{j}{l}_{j}^{\u2033}(\widehat{\beta})}\right]\right\}.\hfill \end{array}$$

If **λ** = (λ_{1},…,λ_{p})^{T}, we choose

$$\mathbf{\lambda}={\text{argmin}}_{\mathbf{\lambda}}\text{GGV}(\mathbf{\lambda})=\frac{-{\displaystyle {\sum}_{j=1}^{J}{w}_{j}{l}_{j}(\widehat{\beta})}}{n{[1-e(\mathbf{\lambda})/n]}^{2}}.$$

(5)

We can first assume the working independence model to choose the initial λ_{j} with (5), then use those λ_{j} to solve for the w_{j} with (4). We can then iterate between (5) and (4) until λ_{j} and *w _{j}* converge.

To conclude, we have suggested a way in which the work of Fan et al. can be extended to perform variable selection on models for multivariate survival times with high dimensional covariates, while simultaneously providing optimally efficient parameter estimates for the selected covariates. Our future direction lies in utilizing empirical date and simulations to evaluate the accuracy and stability of the variable selection process, as well as the performance of the estimates derived using these optimal weights.

This work was supported by U. S. National Cancer Institute (Grant No. R01 CA95747)

1. Cai J, Fan J, Li R, et al. Variable selection for multivariate failure time data. Biometrika. 2005;92:303–316. [PMC free article] [PubMed]

2. Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Amer Statist Assoc. 2001;96:1348–1360.

3. Johnson BA, Li DY, Zeng D. Penalized estimating functions and variable selection in semiparametric regression models. J Amer Statist Assoc. 2008;103:672–680. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |