Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2989175

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Bridge estimation in the AFT model
- 3 Computation
- 4 Evaluation
- 5 Asymptotic properties
- 6 Data analysis
- 7 Discussions
- References

Authors

Related links

Lifetime Data Anal. Author manuscript; available in PMC 2010 November 20.

Published in final edited form as:

Published online 2009 December 16. doi: 10.1007/s10985-009-9144-2

PMCID: PMC2989175

NIHMSID: NIHMS251151

Jian Huang, Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA; Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA;

Jian Huang: ude.awoiu@gnauh-naij

The publisher's final edited version of this article is available at Lifetime Data Anal

See other articles in PMC that cite the published article.

In high throughput genomic studies, an important goal is to identify a small number of genomic markers that are associated with development and progression of diseases. A representative example is microarray prognostic studies, where the goal is to identify genes whose expressions are associated with disease free or overall survival. Because of the high dimensionality of gene expression data, standard survival analysis techniques cannot be directly applied. In addition, among the thousands of genes surveyed, only a subset are disease-associated. Gene selection is needed along with estimation. In this article, we model the relationship between gene expressions and survival using the accelerated failure time (AFT) models. We use the bridge penalization for regularized estimation and gene selection. An efficient iterative computational algorithm is proposed. Tuning parameters are selected using V-fold cross validation. We use a resampling method to evaluate the prediction performance of bridge estimator and the relative stability of identified genes. We show that the proposed bridge estimator is selection consistent under appropriate conditions. Analysis of two lymphoma prognostic studies suggests that the bridge estimator can identify a small number of genes and can have better prediction performance than the Lasso.

High throughput technologies make it possible to identify genomic markers that are associated with disease development and progression. Gene profiling studies have been extensively conducted using microarrays. Identification of genomic markers from analysis of microarray data may lead to a better understanding of the genomic mechanism beneath disease development and assist future clinical diagnosis and prognosis. Among many disease outcomes measured in microarray studies, censored disease-free or overall survival has attracted much attention. See Alizadeh et al. (2000), Rosenwald et al. (2003), and Dave et al. (2004) for representative examples. Because of the high dimensionality of gene expression data, standard survival analysis techniques cannot be directly used. In addition, among the thousands of genes surveyed, only a subset may be associated with disease. Thus, gene selection is needed along with survival model construction.

When analyzing censored survival data with microarray gene expression measurements, the Cox proportional hazards model and the additive risk model have been adopted (Gui and Li 2005; Ma and Huang 2007). An alternative to those models is the accelerated failure time (AFT) model. Unlike the Cox and additive models, the AFT model is a linear regression model, in which logarithm (or in general a known monotone transformation) of the failure time is directly regressed on gene expressions (Kalbfleisch and Prentice 1980). Compared with the Cox and additive models, the AFT model has an intuitive linear regression interpretation (Wei 1992). In this article, we apply the method of Zhou (1992) and Stute (1993), which uses the Kaplan–Meier weights to account for censoring and has a weighted least squares loss function. The simple form of the loss function makes this estimation approach especially suitable for high dimensional data.

To tackle the high dimensionality problem of gene expression data, various dimension reduction or variable selection techniques have been employed. Previously used dimension reduction techniques include principal component analysis, singular value decomposition, partial least squares, and others. Among the many variable selection techniques developed, penalized selection has attracted extensive attentions. Penalization methods put penalties on the regression coefficients. By properly balancing goodness of fit and model complexity, penalization approaches can lead to parsimonious models with reasonable fit.

The most famous example of penalization methods is the Lasso (Tibshirani 1996), which has been used in gene expression analysis with survival data (Gui and Li 2005; Ma and Huang 2007; Wang et al. 2008). However, it has been shown that the Lasso is in general not variable selection consistent (Leng et al. 2006). Various penalization methods that can have consistent selection have been proposed. Examples include the adaptive Lasso and the SCAD. Another penalty that also enjoys consistent selection is the bridge penalty. Under conventional setup, i.e., when the number of observations is much larger than the number of covariates, the bridge penalty has been investigated. See for example Fu (1998). Huang et al. (2008a) shows that the bridge penalty can have the oracle estimation and selection properties in linear regression models with a divergent number of covariates.

In this article, we consider genomic studies where gene expressions are measured along with censored disease survival. The bridge penalization approach is used for regularized estimation and gene selection. The rest of the article is organized as follows. The AFT model and bridge estimation are introduced in Sect. 2. An efficient computational algorithm is proposed in Sect. 3. Resampling based methods are proposed to evaluate prediction performance and relative stability of selected genes in Sect. 4. Asymptotic selection consistency is established in Sect. 5. Analysis of two lymphoma studies are provided to illustrate the proposed method in Sect. 6. The article concludes with discussions in Sect. 7. Proofs are given in the Appendix.

Let *T _{i}* be logarithm of the failure time and

$${T}_{i}=\alpha +{X}_{i}^{\prime}\beta +{\epsilon}_{i},\phantom{\rule{thinmathspace}{0ex}}i=1,\dots ,n,$$

(1)

where α is the intercept, β IR^{p} is the regression coefficient, and ε_{i} is the error term. When *T _{i}* is subject to right censoring, we observe (

Estimation in the AFT model with an unspecified error distribution has been studied extensively. The following two approaches have received special attentions. The first is the Buckley-James estimator which adjusts censored observations using the Kaplan–Meier estimator (Buckley and James 1979); and the second is the rank based estimator motivated by the score function of the partial likelihood function (Ying 1993). Although they both perform well when there are a small number of covariates, with high dimensional gene expression data, both approaches have high computational cost.

A computationally more feasible approach is the weighted least squares (LS) approach (Zhou 1992; Stute 1993). Let * _{n}* be the Kaplan–Meier estimator of

$$\frac{1}{2}{\displaystyle \sum _{i=1}^{n}{w}_{i}{({Y}_{(i)}-\alpha -{X}_{(i)}^{\prime}\beta )}^{2}.}$$

We center *X*_{(i)} and *Y*_{(i)} with their *w _{i}*-weighted means, respectively. Let ${\overline{X}}_{w}={\displaystyle {\sum}_{i=1}^{n}{w}_{i}\phantom{\rule{thinmathspace}{0ex}}{X}_{(i)}/{\displaystyle {\sum}_{i=1}^{n}{w}_{i}\text{and}{\overline{Y}}_{w}={\displaystyle {\sum}_{i=1}^{n}{w}_{i}{Y}_{(i)}/}}}{\displaystyle {\sum}_{i=1}^{n}{w}_{i}}$. Denote ${X}_{(i)}^{*}={({\mathit{\text{nw}}}_{i})}^{1/2}({X}_{(i)}-{\overline{X}}_{w})\text{and}{Y}_{(i)}^{*}={({\mathit{\text{nw}}}_{i})}^{1/2}({Y}_{(i)}-{\overline{Y}}_{w})$. We can then rewrite the weighted LS loss function as

$${Q}_{n}(\beta )=\frac{1}{2}{\displaystyle \sum _{i=1}^{n}{({Y}_{(i)}^{*}-{X}_{(i)}^{*\prime}\beta )}^{2}.}$$

(2)

The bridge penalized objective function is

$${L}_{n}(\beta )={Q}_{n}(\beta )+\lambda {\displaystyle \sum _{j=1}^{p}|{\beta}_{j}{|}^{\gamma}},$$

(3)

where λ is a data dependent tuning parameter and γ > 0 is the bridge index. The value _{n} that minimizes (3) is called the bridge estimator (Frank and Friedman 1993; Fu 1998).

The bridge estimator includes two important special cases. When γ = 2, it is the familiar ridge estimator, which does not have a “built-in” variable selection mechanism. When γ = 1, it is the Lasso estimator. In this article, we focus on the case with γ < 1.

Direct minimization of *L _{n}*(β) is difficult, since the bridge penalty is not convex. An approximation approach is proposed in Huang et al. (2008a). As an alternative, we consider the following approach, which is more efficient and does not need any approximation.

For 0 < γ < 1, define

$${S}_{n}(\beta ,\theta )={Q}_{n}(\beta )+{\displaystyle \sum _{j=1}^{p}{\theta}_{j}^{1-1/\gamma}}|{\beta}_{j}|+{\tau}_{n}{\displaystyle \sum _{j=1}^{p}{\theta}_{j}},$$

(4)

where τ_{n} is a penalty parameter.

**Proposition 1** *If* $\lambda ={\tau}_{n}^{1-\gamma}{\gamma}^{-\gamma}{(1-\gamma )}^{\gamma -1}$, *then* _{n} *minimizes L _{n}*(β)

This proposition can be proved as in Huang et al. (2009), in which a similar result is shown for the group bridge estimator in linear regression without censoring. Based on Proposition 1, we propose the following iterative algorithm for computing the bridge estimate in the AFT models.

- Compute an initial estimate β
^{(0)}. Specifically, we propose using the Lasso estimate, i.e, the minimizer of Eq. 3 with γ = 1.For*s*= 1, 2, … - Compute ${\theta}_{j}^{(s)}={\left(\frac{1-\gamma}{{\tau}_{n}\gamma}\right)}^{\gamma}{|{\beta}_{j}^{(s-1)}|}^{\gamma}$,
*j*= 1, …,*p*. - Compute ${\beta}^{(s)}={\text{arg min}}_{\beta}\{{Q}_{n}(\beta )+{\displaystyle {\sum}_{j=1}^{p}{({\theta}_{j}^{(s)})}^{1-1/\gamma}|{\beta}_{j}|}\}$.
- Repeat Steps 2–3 until convergence.

The proposed algorithm always converges, since at each step the non-negative objective function (4) decreases. In our numerical studies, convergence is usually achieved within ten iterations. In Step 1, we choose the Lasso estimate as the initial value with the penalty parameter in the Lasso criterion determined by *V*-fold cross validation. Theorem 1 in Sect. 5 establishes that the Lasso tends to select all the important genes plus a few false positives. Thus, using the Lasso as the starting value will not miss any important genes. The main computational cost comes from Step 3, which computes a weighted Lasso estimate and can be achieved with many existing algorithms as such the LARS (Efron et al. 2004) or the boosting (Ma and Huang 2007). In this article, we adopt the boosting, since its computational cost is relatively insensitive to the number of genes.

We use *V*-fold cross validation to determine the tuning parameter λ. For a pre-defined integer *V*, partition the data randomly into *V* non-overlapping subsets with equal sizes. For a given λ, we define $\mathit{\text{CV score}}={\displaystyle {\sum}_{\upsilon =1}^{V}{Q}^{(\upsilon )}({\widehat{\beta}}^{(-\upsilon )})}$, where ^{(−υ)} is the bridge estimator of β based on the data without the υ^{th} subset and *Q*^{(υ)} is the function defined in (2) evaluated on the υ^{th} subset. Optimal tuning is defined as the minimizer of the CV score. In this article, we set *V* = 5.

With gene expression data, *p* *n*. Most of the conventional evaluation techniques are valid only under the *p* *n* scenario and cannot be applied here. In this study, we are most interested in two aspects: (1) prediction performance. That is, whether the identified genes and corresponding AFT models can make proper predictions for subjects not used in the model estimation; and (2) stability of identified genes. Early studies have shown that gene signatures identified from analysis of gene expression data may suffer from low reproducibility. That is, genes identified using different data sets may differ significantly. Ideally, evaluation should be based on independent data, which is usually not available. As an alternative, we propose the following resampling based approaches.

We propose prediction evaluation based on random partitions as follows.

- Partition the data randomly into a training set of size
*n*_{1}and a testing set of size*n*_{2}with*n*_{1}+*n*_{2}=*n*. We use*n*_{1}= 2/3*n*. - Compute the bridge estimate using the training set data. Cross validation is needed to select the optimal tuning for the training set.
- Use the training set estimate to make predictions for subjects in the testing set. Specifically, we first compute the risk scores
*X*′. We then dichotomize the risk scores at the median and create two risk groups (referred to as high and low risk groups respectively). Compare the survival functions for the two risk groups, and compute the logrank statistic. - To avoid over fitting caused by a “lucky” partition, we repeat Steps 1–3
*B*= 500 times. Each time a new partition is made and the value of the logrank statistic is computed.

We partition the dataset in Step 1. To generate a fair evaluation, we recompute tuning parameters for each individual partitions in Step 2. In Step 3, we adopt the logrank statistic as the evaluation measurement. A larger logrank statistic suggests that the high and low risk groups are better separated and the proposed approach is more effective. We create two risk groups, mainly because of the small sample sizes. By repeating the partitioning process many times, we can obtain a Monte Carlo estimation of the *distribution* of the logrank statistics (as opposed to a single logrank statistic in several early studies). We call it the *observed predictive distribution* (OPD) of the logrank statistic.

When *n* *p*, the logrank statistic is asymptotically χ^{2} distributed. With gene expression data and *n* *p*, it is not clear how effective the χ^{2} approximation is. To tackle this problem, we propose the following permutation based approach to generate the reference distribution for the OPD. We first randomly permute the event times together with the censoring indicators. We then follow the same procedure as for the OPD and obtain a Monte Carlo estimation of the distribution of the logrank statistic under permutation. We call it *permutation predictive distribution* (PPD) of the logrank statistic. With permutation, the event times and gene expressions are expected to be independent. The distribution of logrank statistics so computed can serve as the reference distribution for the OPD.

Calculations of the OPD and PPD are parallel: the OPD is calculated from the *observed* data, whereas the PPD is calculated from the *permuted* data. Well separated OPD and PPD may indicate that the propose approach can identify genes and models with satisfactory prediction performance, whereas substantially overlapped distributions suggest that either the proposed approach is not effective or the gene expressions do not have good discriminant power.

The prediction evaluation described above assesses overall performance of the proposed approach and selected genes/models. In what follows, we evaluate the relative stability of each identified gene. The rationale behind the proposed approach is that, if a gene is more “important” or more “stable”, it should be identified “more often” in analysis of multiple data sets. Since multiple independent data sets not available, we resort to random sampling again.

We first randomly sample *n*_{1} = 2/3*n* subjects. We then use the bridge approach to identify genes in the sampled subset. We repeat this procedure *B* = 500 times. For the *j*th gene, we count *c _{j}*, the number of times it is identified. The proportion

In this section, we investigate asymptotic properties of the proposed bridge approach with *p* *n*. We are especially interested in the gene selection consistency property, because once genes are consistently selected, standard approaches can lead to consistent estimates. We note that for fixed *p*, the asymptotic results can be obtained easily using standard approaches. Since the case with fixed *p* is not relevant to our data applications, we will not consider it here.

We note that, with the proposed iterative algorithm, the Lasso estimate is used as the starting value. Genes not selected by the Lasso will not be selected in the final model. Thus, it is crucial to first establish properties of the Lasso estimate under the present data/model setup. Careful inspection of the proposed computational algorithm suggests that, once the initial estimate is obtained, in each step, an adaptive Lasso estimate is computed. Thus, we are able to use similar methods as in Huang et al. (2008b), which studies properties of the adaptive Lasso in high dimensional linear regression models, to establish properties of the bridge estimate.

We consider the rescaled ${X}_{(i)}^{*}\text{and}{Y}_{(i)}^{*}$ defined in Sect. 2. For simplicity of notations, we use *X*_{(i)} and *Y*_{(i)} to denote ${X}_{(i)}^{*}\text{and}{Y}_{(i)}^{*}$ hereafter. Let *Y* = (*Y*_{(1)}, …, *Y*_{(n)})′. Let *X* be the *n* × *p* covariate matrix consisting of row vectors ${X}_{(1)}^{\prime},\dots ,{X}_{(n)}^{\prime}$. Let *X*_{1}, …, *X _{p}* be the

Let β_{0} = (β_{01}, …, β_{0p})′ be the true value of the regression coefficients. Let *A*_{1} = {*j* : β_{0j} ≠ 0} be the set of nonzero coefficients and let *q* = |*A*_{1}|. We make the following assumptions.

- (A1)The number of nonzero coefficients
*q*is finite. - (A2)(a) The observations (
*Y*), 1 ≤_{i}, X_{i}, δ_{i}*i*≤*n*are independent and identically distributed; (b) The errors ε_{1}, …, ε_{n}are independent and identically distributed with mean 0 and finite variance σ^{2}. Furthermore, they are subgaussian, in the sense that there exist*K*_{1},*K*_{2}> 0 such that the tail probabilities of ε_{i}satisfy*P*(|ε_{i}| >*x*) ≤*K*_{2}exp(−*K*_{1}*x*^{2}) for all*x*≥ 0 and all*i*. - (A3)(a) The errors (ε
_{1}, …, ε_{n}) are independent of the Kaplan–Meier weights (*w*_{1}, …,*w*); (b) The covariates are bounded. That is, there is a constant_{n}*M*> 0 such that |*X*| ≤_{ij}*M*, 1 ≤*i*≤*n*, 1 ≤*j*≤*p*. - (A4)The covariate matrix satisfies the sparse Riesz condition (SRC) with rank
*q*^{*}: there exist constants 0 <*c*_{*}<*c*^{*}< ∞, such that for*q*^{*}= (3 + 4*C*)*q*and*C*=*c*^{*}/*c*_{*}, with probability converging to 1, ${c}_{*}\le \frac{\nu \prime {\mathrm{\Sigma}}_{A}\nu}{{\Vert \nu \Vert}^{2}}\le {c}^{*}$, ∀*A*with |*A*| =*q*^{*}and ν IR^{q*}where ‖ · ‖ is the_{2}norm.

By (A1), the model is sparse in the sense that although the total number of covariates may be large, the number of covariates with nonzero coefficients is still small. The tail probability assumption in (A2) has been made with high-dimensional linear regression models. See for example Zhang and Huang (2008). With assumption (A3), it can be shown that the subguassian tail property still holds under censoring. The SRC condition (A4) has been formulated in study of the Lasso with linear regressions without censoring (Zhang and Huang 2008). This condition implies that all the eigenvalues of any *d* × *d* submatrix of *X′WX/n* with *d* ≤ *q** lie between *c*^{*} and *c**. It ensures that any model with dimension no greater than *q** is identifiable.

We first consider the Lasso estimator defined as $\tilde{\beta}=\text{arg min}\{{Q}_{n}(\beta )+\lambda {\displaystyle {\sum}_{j=1}^{p}|{\beta}_{j}|}\}$. With = (_{1}, …, _{p})′, let *Ã*_{1} = {*j*,_{j} ≠ 0} be the set of nonzero Lasso estimated coefficients.

**Theorem 1** *Suppose that (A1)–(A4) hold and* $\lambda \ge O(1)\sqrt{n\text{log}p}$. *Then*

*With probability converging to 1,*|Ã_{1}| ≤ (2 + 4*C*)*q*.*If*λ*/n*→ 0*and*(log*p*)*/n*→ 0,*then with probability converging to 1, all the covariates with nonzero coefficients are selected*.- ${\Vert \tilde{\beta}-{\beta}_{0}\Vert}_{2}^{2}\le \frac{16{\lambda}^{2}q}{{n}^{2}{c}_{*}^{2}}+{O}_{p}\phantom{\rule{thinmathspace}{0ex}}\left(\frac{|{\tilde{A}}_{1}|\text{log}p}{{\mathit{\text{nc}}}_{*}^{2}}\right)$.
*In particular, if*$\lambda =O\phantom{\rule{thinmathspace}{0ex}}(\sqrt{n\text{log}p})$,*then*${\Vert \tilde{\beta}-{\beta}_{0}\Vert}_{2}^{2}={O}_{p}(\text{log}\phantom{\rule{thinmathspace}{0ex}}p/n)$.

This theorem suggests that, with high probability, the number of covariates selected by the Lasso is a finite multiply of the number of covariates with nonzero coefficients. Moreover, all the covariates with nonzero coefficients are selected with probability converging to one. This justifies using the Lasso as the initial estimator in the algorithm proposed in Sect. 3.1. In addition, the Lasso estimator is estimation consistent.

Starting from the initial Lasso estimator , we denote as the estimate after one iteration (in the algorithm described in Sect. 3.1). Simple algebra shows that the value of ${\theta}_{j}^{(1)}$ computed in Step 2 of the proposed algorithm is ${\theta}_{j}^{(1)}=(\lambda /2)|{\tilde{\beta}}_{j}{|}^{-1/2}$. Thus Step 3 of the proposed algorithm is

$$\widehat{\beta}=\text{arg min}\phantom{\rule{thinmathspace}{0ex}}\left\{{Q}_{n}(\beta )+\frac{\lambda}{2}{\displaystyle \sum _{j=1}^{p}|\tilde{{\beta}_{j}}{|}^{-1/2}|}{\beta}_{j}|\right\}.$$

computed above takes the form of an adaptive Lasso estimator. Of note, here, the penalty parameter is the same as the λ used in the Lasso estimator.

For any vector *x* = (*x*_{1}, *x*_{2}, …), denote its sign vector by sgn(*x*) = (sgn(*x*_{1}), sgn(*x*_{2}), …) where sgn(*x _{i}*) = 1, 0, −1 if

**Theorem 2** *Suppose that (A1)–(A4) are satisfied,* (log *p*)*/n* → 0, *and* $\lambda =O\phantom{\rule{thinmathspace}{0ex}}(\sqrt{n\text{log}p})$. *Then*

$$\mathrm{P}(\mathit{\text{sgn}}(\widehat{\beta})=\mathit{\text{sgn}}({\beta}_{0}))\to 1.$$

The above theorem shows that the one-step estimator is sign consistent. Thus, the one-step estimator is selection consistent, in the sense that it can correctly distinguish covariates with zero and nonzero coefficients with probability converging to 1. Following similar arguments, we can prove that any finite-step estimator (computed from the algorithm described in Sect. 3.1) is sign consistent and hence selection consistent. We note that, although the one-step estimator is selection consistent, our numerical studies suggest that iterating until convergence tends to improve finite sample performance.

We note that in Theorem 2, we allow log *p* = *o*(*n*) or *p* = exp(*o*(*n*)). Thus the dimension of covariates can be larger than the sample size, which accommodates gene expression data.

A study using microarray expression analysis of mantle cell lymphoma (MCL) is reported in Rosenwald et al. (2003). The primary goal of this study is to identify genes that have good predictive power of patients’ survival risk. Among 101 untreated patients with no history of previous lymphoma, 92 were classified as having MCL based on established morphologic and immunophenotypic criteria. Survival times of 64 patients were available and the other 28 patients were censored. The median survival time was 2.8 years (range 0.02 to 14.05 years). Lymphochip DNA microarrays were used to quantify mRNA expression in the lymphoma samples from the 92 patients. The gene expression data that contains expression values of 8810 cDNA elements is available at http://llmpp.nih.gov/MCL/.

We model survival with the AFT model, and use the proposed bridge approach for gene selection. Although there is no limitation on the number of genes that can be used in the proposed approach, we pre-process the data as follows to exclude noises and gain further stability: (1) Un-supervised screening: compute the interquartile ranges of all gene expressions. Remove genes with interquartile ranges smaller than their first quartile. 6608 genes pass this screening; (2) Supervised screening: compute correlation coefficients of the uncensored survival times with gene expressions. Select 500 genes with the largest absolute values of the correlation coefficients. We then standardize these 500 gene expressions to have zero mean and unit variance. We note that the supervised screening utilizes the survival information. In the random sampling based evaluation, to guarantee a fair evaluation, the supervised screening needs to be conducted for each sampled data.

We employ the proposed approach and select the optimal tuning with 5-fold cross validation. Genes selected with the bridge approach and their corresponding estimates are shown in Table 1. For comparison, we also provide the Lasso estimate. 40 and 34 genes are identified using the Lasso and bridge approaches, respectively. Because of the special setup of the computational algorithm, genes identified using the bridge are a subset of those identified using the Lasso.

We evaluate prediction performance using the approach described in Sect. 4.1. For comparison, we also evaluate the Lasso approach using the same evaluation technique. We show in Fig. 1 (upper panels) the density estimates of OPD and PPD. We can see that (1) the bridge yields well separated OPD and PPD (Wilcoxon test *p*-value < 0.001), which suggests satisfactory prediction performance. The 90% percentile of the PPD is 3.32, and 62% of the logrank statistics from the OPD are larger than that value; (2) the PPD is close to the χ^{2} distribution, but there is still very small discrepancy. The 95% and 90% percentiles of the PPD are 4.11 and 3.32, respectively, which are slightly larger than their counterparts (3.84 and 2.71) from the χ^{2}; (3) the prediction performance of Lasso is also satisfactory, but inferior compared to that of the bridge. The mean and median of the Lasso OPD are 4.617 and 3.616, which are smaller than their bridge counterparts (5.319 and 4.404).

MCL data. *Left-upper panel*: Lasso estimation. Green solid line: OPD; Red dash-dotted line: PPD; Black dotted line: density function of 500 χ^{2} distributed random variables; *Right-upper penal*: Bridge estimation. Green solid line: OPD; Red dash-dotted **...**

We evaluate stability of identified genes using the approach described in Sect. 4.2. Results are shown in the lower panels of Fig. 1. For a better view, we only plot the 500 genes that pass the screening in the whole dataset. We can see that (1) most of the identified genes have relatively large OOI; (2) there are a few genes that are not identified, but have moderate OOI. It is still not clear why those genes are not identified. Such a question is worth investigating in future studies; and (3) with permutated data, the POI for all genes are small, with no genes having significantly larger occurrence indexes than others.

The DLBCL (diffuse large B-cell lymphoma) data was first analyzed in Rosenwald et al. (2002). This dataset consists of 240 patients with DLBCL, including 138 patient deaths during the followup. Expression profiles of 7399 genes are obtained. Missing values are imputed using a K-nearest neighbors approach. We carry out supervised selection and select 500 genes with the largest absolute values of marginal correlation coefficients with the uncensored event times to gain further stability. Gene expressions are then normalized to have zero mean and unit variance. We note that, in the random sampling based evaluation, the supervised screening is conducted for each sampled data.

With the proposed approach and optimal tuning selected using 5-fold cross validation, 44 genes are identified. As a comparison, the Lasso identifies 46 genes. The UNI-QID, gene names, and bridge and Lasso estimated coefficients are shown in Table 2.

We evaluate prediction performance and show the results in Fig. 2. Similar conclusions as those in Sect. 6.1 can be drawn. With the bridge, the OPD has mean and median 4.416 and 3.674, respectively, which are larger than their Lasso counterparts (3.53 and 2.59). 61% of the logrank statistics from the OPD are greater than the 90% percentile of the PPD. The Wilcox test suggests that the OPD and PPD are well separated (*p*-value < 0.001). Evaluation of stability using the occurrence index is presented in Fig. 2 (lower panels). The observations are similar to those summarized in Sect. 6.1.

Analyses of the MCL and DLBCL data suggest that the bridge approach is capable of identifying a smaller number of genes than the Lasso. With gene expression data, a smaller number of identified genes means more focused hypothesis for future confirmation studies, and is thus preferred. In addition, prediction performance of the bridge is better than that of the Lasso. We note that, although prediction is not based on completely independent data, by properly using resampling and comparing the bridge and Lasso on the same basis, the prediction comparison is expected to be valid.

Genomic studies with high dimensional markers measured along with censored survival outcomes are becoming more and more common. In this article, we model the relationship between gene expressions and censored survival with AFT models. AFT models have been commonly adopted and provide useful alternatives to the Cox and additive hazards models. Of note, since it is still not clear how to compare different models under the “large p, small n” setting, we do not pursue any model comparison. More methodological studies are needed before such a comparison can be conducted.

We propose using the bridge penalty for gene selection. Our numerical studies suggest that the bridge has better performance than the Lasso in terms of variable selection in AFT models. There are other penalties, for example the adaptive Lasso and SCAD, that can be used in the present setup. Since it is beyond the scope of this paper to compare our proposed method with all the existing ones, we only pursue comparison with the Lasso, which has been commonly used as benchmark.

This work is partially supported by CA120988 from the National Cancer Institute and DMS 0805670 from the National Science Foundation. We thank the editors and reviewers for their helpful and constructive comments on an earlier version of the paper.

Let τ = (τ_{1}, …, τ_{n})′ where ${\tau}_{i}={w}_{i}{\epsilon}_{(i)}\equiv {w}_{i}({Y}_{(i)}-{X}_{(i)}^{\prime}{\beta}_{0})$.

**Lemma 1** *Suppose that conditions (A2) and (A3) hold. Let* ${\xi}_{j}={\displaystyle {\sum}_{i=1}^{n}{X}_{\mathit{\text{ij}}}{\tau}_{i},1\le j\le p}$. *Let* ξ_{n} = max_{1≤j≤p} |ξ_{j}|. *Then*

$$\mathrm{E}({\xi}_{n})\le {C}_{1}\sqrt{\text{log}(p)}{\left(\sqrt{2{C}_{2}n\phantom{\rule{thinmathspace}{0ex}}\text{log}(p)}+4\phantom{\rule{thinmathspace}{0ex}}\text{log}(2p)+{C}_{2}n\right)}^{1/2},$$

*where* *C*_{1} *and* *C*_{2} *are two positive constants. In particular, when* log (*p*)*/n* → 0,

$$\mathrm{E}({\xi}_{n})=O(1)\sqrt{n\text{log}p}.$$

*Proof of Lemma 1* Let ${s}_{\mathit{\text{nj}}}^{2}={\displaystyle {\sum}_{i=1}^{n}{X}_{\mathit{\text{ij}}}^{2}}$. Conditional on *X _{ij}*’s, assumptions (A2) and (A3) imply that ξ

$$\mathrm{E}\phantom{\rule{thinmathspace}{0ex}}(\underset{1\le j\le p}{\text{max}}|{\xi}_{j}||{X}_{\mathit{\text{ij}}},1\le i\le n,1\le j\le p)\le {C}_{1}{s}_{n}\sqrt{\text{log}(p)},$$

for a constant *C*_{1} > 0. Therefore,

$$\mathrm{E}\phantom{\rule{thinmathspace}{0ex}}\left(\underset{1\le j\le p}{\text{max}}|{\xi}_{j}|\right)\le {C}_{1}\sqrt{\text{log}(p)}\phantom{\rule{thinmathspace}{0ex}}\mathrm{E}({s}_{n}).$$

(5)

Since

$$\sum _{i=1}^{n}\mathrm{E}{[{X}_{\mathit{\text{ij}}}^{2}-\mathrm{E}{X}_{\mathit{\text{ij}}}^{2}]}^{2}}\le 4{C}_{2}n,$$

(6)

and

$$\underset{1\le j\le p}{\text{max}}{\displaystyle \sum _{i=1}^{n}\mathrm{E}{X}_{\mathit{\text{ij}}}^{2}\le {C}_{2}n},$$

(7)

by Lemma 4.2 of Van de Geer (2008), (6) implies

$$\mathrm{E}\phantom{\rule{thinmathspace}{0ex}}\left(\underset{1\le j\le p}{\text{max}}\phantom{\rule{thinmathspace}{0ex}}\left|{\displaystyle \sum _{i=1}^{n}\{{X}_{\mathit{\text{ij}}}^{2}-\mathrm{E}{X}_{\mathit{\text{ij}}}^{2}\}}\right|\right)\le \sqrt{2{C}_{2}n\phantom{\rule{thinmathspace}{0ex}}\text{log}(p)}+4\phantom{\rule{thinmathspace}{0ex}}\text{log}(2p).$$

Therefore, by (7) and the triangle inequality,

$$\mathrm{E}{s}_{n}^{2}\le \sqrt{2{C}_{2}n\phantom{\rule{thinmathspace}{0ex}}\text{log}(p)}+4\phantom{\rule{thinmathspace}{0ex}}\text{log}(2p)+{C}_{2}n.$$

Now since $\mathrm{E}{s}_{n}\le {(\mathrm{E}{s}_{n}^{2})}^{1/2}$, we have

$$\mathrm{E}{s}_{n}\le {\left(\sqrt{2{C}_{2}n\phantom{\rule{thinmathspace}{0ex}}\text{log}(p)}+4\phantom{\rule{thinmathspace}{0ex}}\text{log}(2p)+{C}_{2}n\right)}^{1/2}.$$

(8)

The lemma follows from (5) and (8).

In the proofs below, let *Y*^{*} = *W*^{1/2}*Y* and *X*^{*} = *W*^{1/2}*X*. Then

$${Q}_{n}(\beta )=\frac{1}{2}{\displaystyle \sum _{i=1}^{n}{({Y}_{i}^{*}-{X}_{i}^{*\prime}\beta )}^{2}=\frac{1}{2}{\Vert {Y}^{*}-{X}^{*}\beta \Vert}^{2}},$$

where ‖ · ‖ is the _{2} norm.

*Proof of Theorem 1, part* (i) Part (i) follows from the proof of Theorem 1 of Zhang and Huang (2008). The only difference is that here we use the subgaussian assumption to control certain tail probabilities, instead of the normality condition assumed in Zhang and Huang (2008). Since subgaussian random variables have the same tail behavior as normal random variables, the argument of Zhang and Huang goes through.

*Proof of Theorem 1, part* (ii) Part (ii) follows from part (iii) and the assumption that the number of nonzero coefficients is fixed. Thus the absolute values of the nonzero coefficients are bounded away from 0 by a positive constant independent of *n*.

*Proof of Theorem 1, part* (iii) By the definition of ,

$${\Vert {Y}^{*}-{X}^{*}\tilde{\beta}\Vert}_{2}^{2}+2\lambda {\displaystyle \sum _{j=1}^{{p}_{n}}|{\tilde{\beta}}_{j}|}\le {\Vert {Y}^{*}-{X}^{*}{\beta}_{0}\Vert}^{2}+2\lambda {\displaystyle \sum _{j=1}^{{p}_{n}}|{\beta}_{0j}|}.$$

Thus

$${\Vert {Y}^{*}-{X}^{*}\tilde{\beta}\Vert}_{2}^{2}+2\lambda {\displaystyle \sum _{j\in {A}_{1}}|{\tilde{\beta}}_{j}|}\le {\Vert {Y}^{*}-{X}^{*}{\beta}_{0}\Vert}^{2}+2\lambda {\displaystyle \sum _{j\in {A}_{1}}|{\beta}_{0j}|}.$$

This implies

$${\Vert {Y}^{*}-{X}^{*}\tilde{\beta}\Vert}_{2}^{2}-{\Vert {Y}^{*}-{X}^{*}{\beta}_{0}\Vert}^{2}\le 2\lambda {\displaystyle \sum _{j\in {A}_{1}}|{\tilde{\beta}}_{j}-{\beta}_{0j}|}.$$

That is,

$${\Vert {X}^{*}(\tilde{\beta}-{\beta}_{0})\Vert}^{2}-2\tau \prime {X}^{*}(\tilde{\beta}-{\beta}_{0})\le 2\lambda {\displaystyle \sum _{j\in {A}_{1}}|{\tilde{\beta}}_{j}-{\beta}_{0j}|.}$$

(9)

Let *B* = *A*_{1} *A*_{2} = {*j* : β_{0j} ≠ 0 or _{j} ≠ 0}. Note that |*B*| ≤ *q*_{*} with probability converging to 1 by part (i), where *q*^{*} is given in (A4). Denote ${X}_{B}^{*}=({X}_{j}^{*},j\in B)$, _{B} = (_{j}, *j* *B*), and β_{0B} = (β_{0j}, *j* *B*). Denote

$${\eta}_{B}={X}_{B}^{*}({\tilde{\beta}}_{B}-{\beta}_{0B}).$$

Since *A*_{1} *B*,

$$\sum _{j\in {A}_{1}}|{\tilde{\beta}}_{j}-{\beta}_{0j}|\le \sqrt{|{A}_{1}|}\Vert {\tilde{\beta}}_{{A}_{1}}-{\beta}_{0{A}_{1}}\Vert \le \sqrt{|{A}_{1}|}\Vert {\tilde{\beta}}_{B}-{\beta}_{0B}\Vert .$$

(10)

$${\Vert {\eta}_{B}\Vert}^{2}-2\tau \prime {\eta}_{B}\le 2\lambda \sqrt{|{A}_{1}|}\Vert {\tilde{\beta}}_{B}-{\beta}_{0B}\Vert .$$

(11)

Let ${\tau}_{B}^{*}$ be the projection of τ to the span of ${X}_{B}^{*}$, i.e., ${\tau}_{B}^{*}={X}_{B}^{*}{({X}_{B}^{*\prime}{X}_{B}^{*})}^{-1}{X}_{B}^{*\prime}\tau $. We have

$$\tau \prime {\eta}_{B}=\tau \prime {X}_{B}^{*}({\tilde{\beta}}_{nB}-{\beta}_{0B})=\left\{{({X}_{B}^{*\prime}{X}_{B}^{*})}^{-1/2}{X}_{B}^{*\prime}\tau \right\}\prime \phantom{\rule{thinmathspace}{0ex}}\{{({X}_{B}^{*\prime}{X}_{B}^{*})}^{1/2}({\tilde{\beta}}_{B}-{\beta}_{0B})\}.$$

Therefore, by the Cauchy–Schwarz inequality,

$$2|\tau \prime {\eta}_{B}|\le 2\Vert {\tau}_{B}^{*}\Vert \xb7\Vert {\eta}_{B}\Vert \le 2{\Vert {\tau}_{B}^{*}\Vert}^{2}+\frac{1}{2}{\Vert {\eta}_{B}\Vert}^{2}.$$

(12)

$${\Vert {\eta}_{B}\Vert}^{2}\le 4{\Vert {\tau}_{B}^{*}\Vert}^{2}+4\lambda \sqrt{|{A}_{1}|}\xb7\Vert {\tilde{\beta}}_{B}-{\beta}_{0B}\Vert $$

(13)

By the SRC condition (A4), ‖η_{B}‖^{2} ≥ *nc*_{*}‖_{B} − β_{0B}‖^{2}. Thus (13) implies

$${\mathit{\text{nc}}}_{*}{\Vert {\tilde{\beta}}_{B}-{\beta}_{0B}\Vert}^{2}\le 4{\Vert {\tau}_{B}^{*}\Vert}^{2}+\frac{{(4\lambda \sqrt{|{A}_{1}|})}^{2}}{2{\mathit{\text{nc}}}_{*}}+\frac{1}{2}{\mathit{\text{nc}}}_{*}{\Vert {\tilde{\beta}}_{B}-{\beta}_{0B}\Vert}^{2}.$$

It follows that

$${\Vert {\tilde{\beta}}_{B}-{\beta}_{0B}\Vert}^{2}\le \frac{8{\Vert {\tau}_{B}^{*}\Vert}^{2}}{{\mathit{\text{nc}}}_{*}}+\frac{16{\lambda}^{2}|{A}_{1}|}{{n}^{2}{c}_{*}^{2}}.$$

(14)

Now

$${\Vert {\tau}_{B}^{*}\Vert}^{2}={\Vert {({X}_{B}^{*\prime}{X}_{B}^{*})}^{-1/2}{X}_{B}^{*\prime}\tau \Vert}^{2}\le \frac{1}{{\mathit{\text{nc}}}_{*}}{\Vert {X}_{B}^{*}\tau \Vert}^{2}\le \frac{1}{{\mathit{\text{nc}}}_{*}}\underset{A:|A|\le {q}^{*}}{\text{max}}{\Vert {X}_{A}^{*\prime}\tau \Vert}^{2}.$$

We have

$$\underset{A:|A|\le {q}^{*}}{\text{max}}{\Vert {X}_{A}^{*\prime}\tau \Vert}^{2}=\underset{A:|A|\le {q}^{*}}{\text{max}}{\displaystyle \sum _{j\in A}|{X}_{j}^{*\prime}\tau {|}^{2}}\le {q}^{*}\underset{1\le j\le p}{\text{max}}{|{X}_{j}^{*\prime}\tau |}^{2}.$$

By Lemma 1,

$$\underset{1\le j\le p}{\text{max}}|{X}_{j}^{*\prime}\tau {|}^{2}=n\underset{1\le j\le p}{\text{max}}|{n}^{-1/2}{X}_{j}^{*\prime}\tau {|}^{2}={O}_{p}(n\text{log}p).$$

Therefore,

$${\Vert {\tau}_{B}^{*}\Vert}^{2}={O}_{p}\phantom{\rule{thinmathspace}{0ex}}\left(\frac{{q}^{*}\text{log}\phantom{\rule{thinmathspace}{0ex}}p}{{c}_{*}}\right).$$

(15)

The result follows from (14) and (15).

*Proof of Theorem 2* The proof follows from the argument of Huang et al. (2008b). So we only provide the basic idea below. Let *a _{j}* = |

$$\{\begin{array}{cc}{X}_{j}^{*\prime}({Y}^{*}-{X}^{*}\widehat{\beta})={\lambda}_{n}{a}_{j}\text{sgn}({\widehat{\beta}}_{j}),\hfill & {\text{}\widehat{\beta}}_{\mathit{\text{nj}}}\ne 0\hfill \\ |{X}_{j}^{*\prime}({Y}^{*}-{X}^{*}\widehat{\beta})|\le {\lambda}_{n}{a}_{j},\hfill & {\text{}\widehat{\beta}}_{j}=0\hfill \end{array}$$

(16)

and the vectors $\{{X}_{j}^{*},{\widehat{\beta}}_{j}\ne 0\}$ are linearly independent. Recall *A*_{1} = {*j* : β_{0j} ≠ 0}. Let _{n1} = (*a _{j}* sgn(β

Define

$${\widehat{\beta}}_{{A}_{1}}={\left({X}_{{A}_{1}}^{*\prime}{X}_{{A}_{1}}^{*}\right)}^{-1}({X}_{{A}_{1}}^{*\prime}{Y}^{*}-{\lambda}_{n}{\tilde{\mathbf{s}}}_{n1})={\beta}_{0{A}_{1}}+{C}_{11}^{-1}({X}_{{A}_{1}}^{*\prime}\tau -{\lambda}_{n}{\tilde{\mathbf{s}}}_{n1})/n,$$

(17)

where ${C}_{11}={X}_{{A}_{1}}^{*\prime}{X}_{{A}_{1}}^{*}/n$. If sgn(_{A1}) = sgn(β_{0A1}), then the equation in (16) holds for $\tilde{\beta}=({\widehat{\beta \text{'}}}_{A1},\mathbf{0}\prime )\prime $. Thus, since ${X}^{*}\widehat{\beta}={X}_{{A}_{1}}^{*}{\widehat{\beta}}_{{A}_{1}}$ for this ,

$$\text{sgn}(\widehat{\beta})=\text{sgn}({\beta}_{0})\text{if}\{\begin{array}{c}\text{sgn}({\widehat{\beta}}_{{A}_{1}})=\text{sgn}({\beta}_{0{A}_{1}})\hfill \\ \left|{X}_{j}^{*\prime}\phantom{\rule{thinmathspace}{0ex}}\left({Y}^{*}-{X}_{{A}_{1}}^{*}{\widehat{\beta}}_{{A}_{1}}\right)\right|\le {\lambda}_{n}{a}_{j},\forall j\notin {A}_{1}.\hfill \end{array}$$

(18)

Let ${H}_{n}={I}_{n}-{X}_{{A}_{1}}^{*}{C}_{11}^{-1}{X}_{n1}^{*\prime}/n$. It follows from (17) that ${Y}^{*}-{X}_{{A}_{1}}^{*}{\widehat{\beta}}_{{A}_{1}}=\tau -{X}_{{A}_{1}}^{*}({\widehat{\beta}}_{{A}_{1}}-{\beta}_{0{A}_{1}})={H}_{n}\tau +{X}_{{A}_{1}}^{*}{C}_{11}^{-1}{\tilde{\mathbf{s}}}_{n1}{\lambda}_{n}/n$, so that by (18),

$$\text{sgn}(\widehat{\beta})=\text{sgn}({\beta}_{0})\text{if}\{\begin{array}{cc}\text{sgn}({\beta}_{0j})({\beta}_{0j}-{\widehat{\beta}}_{j})\le |{\beta}_{0j}|,\hfill & \forall j\in {A}_{1}\hfill \\ \left|{X}_{j}^{*\prime}\phantom{\rule{thinmathspace}{0ex}}\left({H}_{n}\tau +{X}_{{A}_{1}}^{*}{C}_{11}^{-1}{\tilde{\mathbf{s}}}_{n1}{\lambda}_{n}/n\right)\right|{\lambda}_{n}{a}_{j},\hfill & \forall j\notin {A}_{1}.\hfill \end{array}$$

(19)

$$P\phantom{\rule{thinmathspace}{0ex}}\{\text{sgn}(\widehat{\beta})\ne \text{sgn}({\beta}_{0})\}\le P\phantom{\rule{thinmathspace}{0ex}}\left\{|{\mathbf{e}}_{j}^{\prime}{C}_{11}^{-1}{X}_{{A}_{1}}^{*\prime}\tau |/n\ge |{\beta}_{0j}|/2\text{for some}j\in {A}_{1}\right\}+P\phantom{\rule{thinmathspace}{0ex}}\left\{|{\mathbf{e}}_{j}^{\prime}{C}_{11}^{-1}{\tilde{\mathbf{s}}}_{n1}|{\lambda}_{n}/n\ge |{\beta}_{0j}|/2\text{for some}j\in {A}_{1}\right\}+P\phantom{\rule{thinmathspace}{0ex}}\left\{|{X}_{j}^{*\prime}{H}_{n}\tau |\ge {\lambda}_{n}{a}_{j}/2\text{for some}j\notin {A}_{1}\right\}+P\phantom{\rule{thinmathspace}{0ex}}\left\{|{X}_{j}^{*\prime}{X}_{{A}_{1}}^{*}{C}_{11}^{-1}{\tilde{\mathbf{s}}}_{n1}|/n\ge {a}_{j}/2\text{for some}j\notin {A}_{1}\right\}=P\{{B}_{n1}\}+P\{{B}_{n2}\}+P\{{B}_{n3}\}+P\{{B}_{n4}\},\text{say},$$

where **e**_{j} is the unit vector in the direction of the *j*-th coordinate. Therefore, to prove the theorem, it suffices to show that each probability in the last line converges to zero. The same argument as in Huang et al. (2008b) can be used here and is omitted. This completes the outline of the proof.

Jian Huang, Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA. Department of Biostatistics, University of Iowa, Iowa City, IA 52242, USA.

Shuangge Ma, Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA.

- Alizadeh AA, Eisen MB, Davis RE, Ma C, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. [PubMed]
- Buckley J, James I. Linear regression with censored data. Biometrika. 1979;66:429–436.
- Dave SS, Wright G, Tan B, et al. Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. New Engl J Med. 2004;351:2159–2169. [PubMed]
- Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Stat. 2004;32:407–499.
- Frank IE, Friedman JH. A statistical view of some chemometrics regression tools (with discussion) Technometrics. 1993;35:109–148.
- Fu WJ. Penalized regressions: the bridge versus the Lasso. J Comput Graph Stat. 1998;7:397–416.
- Gui J, Li H. Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics. 2005;21:3001–3008. [PubMed]
- Huang J, Ma SG, Xie HL. Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics. 2006;62:813–820. [PubMed]
- Huang J, Horowitz JL, Ma S. Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Stat. 2008a;36:587–613.
- Huang J, Ma SG, Xie HL, Zhang C-H. A group bridge approach for variable selection. Biometrika. 2009;96:339–355. [PMC free article] [PubMed]
- Huang J, Ma S, Zhang C. Adaptive Lasso for high-dimensional regression models. Stat Sinica. 2008b;18:1603–1618.
- Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. New York: John Wiley; 1980.
- Leng C, Lin Y, Wahba G. A note on the LASSO and related procedures in model selection. Stat Sinica. 2006;16:1273–1284.
- Ma S, Huang J. Additive risk survival model with microarray data. BMC Bioinform. 2007;8:192. [PMC free article] [PubMed]
- Rosenwald A, Wright G, Chan WC, Conners JM, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large B cell lymphoma. New Engl J Med. 2002;346:1937–1947. [PubMed]
- Rosenwald A, Wright G, Wiestner A, Chan WC, et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell. 2003;3:185–197. [PubMed]
- Stute W. Consistent estimation under random censorship when covariables are available. J Multivar Anal. 1993;45:89–103.
- Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc B. 1996;58:267–288.
- van de Geer S. High-dimensional generalized linear models and the Lasso. Ann Stat. 2008;36:614–645.
- Van der Vaart AW, Wellner JA. Weak convergence and empirical processes: with applications to statistics. New York: Springer; 1996.
- Wang S, Nan B, Zhu J, Beer DG. Doubly penalized Buckley-James method for survival data with high-dimensional covariates. Biometrics. 2008;6:132–140. [PubMed]
- Wei LJ. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med. 1992;11:1871–1879. [PubMed]
- Ying ZL. A large sample study of rank estimation for censored regression data. Ann Stat. 1993;21:76–99.
- Zhang C, Huang J. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann Stat. 2008;36:1567–1594.
- Zhou M. M-estimation in censored linear models. Biometrika. 1992;79:837–841.