Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2794904

Formats

Article sections

- SUMMARY
- 1. Introduction
- 2 Background and Notation
- 3 Estimation of β
- 4 Estimation of d
- 5 Simultaneous Inference on (d)
- 6 Model Selection Strategy
- 7 Simulation Study
- 8 Analysis of ACTG 398
- 9 Discussion
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2010 December 1.

Published in final edited form as:

PMCID: PMC2794904

NIHMSID: NIHMS88182

A. Gregory DiRienzo, Department of Biostatistics, Harvard University, Boston, Massachusetts 02115, U.S.A. Email: ude.dravrah.tatsoib@ozneirid;

The publisher's final edited version of this article is available at Biometrics

Clinicians are often interested in the effect of covariates on survival probabilities at pre-specified study times. Since different factors can be associated with the risk of short-term and long-term failure, a flexible modeling strategy is pursued. Given a set of multiple candidate working models, an objective methodology is proposed that aims to construct consistent and asymptotically normal estimators of regression coefficients and average prediction error for each working model, that are free from the nuisance censoring variable. It requires the conditional distribution of censoring given covariates to be modeled. The model selection strategy uses stepup or stepdown multiple hypothesis testing procedures that control either the proportion of false positives or generalized familywise error rate when comparing models based on estimates of average prediction error. The context can actually be cast as a missing data problem, where augmented inverse probability weighted complete case (AIPWCC) estimators of regression coefficients and prediction error can be used (Tsiatis, 2006). A simulation study and an interesting analysis of a recent AIDS trial are provided.

Researchers are often interested in evaluating the effects of a set of covariates on a survival time endpoint. In some settings, it is of importance to identify potential treatment by covariate interactions. For example, AIDS Clinical Trials Group (ACTG) study 398 (Hammer, Vaida, et al. 2002) was a randomized study of four salvage therapy regimens for HIV-1 infected subjects with loss of virologic suppression. In addition to a backbone regimen (Amprenavir, Abacavir, Efavirenz and Adefovir Dipivoxil), subjects received one of three protease inhibitors or placebo. Some other baseline covariates of interest were non-nucleoside reverse transcriptase inhibitor (NNRTI) experience (yes or no), log_{10} HIV-1 RNA copies/mL, and reverse-transcriptase codon 103 (mutant or not). The primary endpoint was defined as the time from randomization to going *off-track* for suppression of HIV-1 RNA copies/mL (Huang, DeGruttola, et al. 2001). Going off-track is an empirical definition of virologic failure that has been shown to generally agree with protocol-based definitions. Basically, at each follow-up visit, a subject is classified as either on-track for suppression of HIV-1 RNA, or not (off-track). Going off-track generally refers to either a sharp, unexpected rebound in HIV-1 RNA, or failure to achieve a substantial initial drop in HIV-1 RNA. The censoring time in this study was the time from randomization to the first missed follow-up visit, or study closure when no follow-up visits were missed, which was observed in full for each subject. Since treatment efficacy may depend on subgroups of subjects as defined by baseline covariates, it is of interest to investigate potential treatment by covariate interactions on the risk of going off-track. Furthermore, it would be informative to know if factors associated with the risk of early failure (at or before week 8) differ from those associated with the risk of late failure (at or before week 24).

A common way to model survival data is thru the semiparametric approaches suggested by Cox (1972), Cheng, Wei, et al. (1995), and Kalbfleisch and Prentice (2002, Section 7). These methods are useful for a *global* assessment of the effect of covariates on survival. In fact, the Cox (1972) model can allow for dynamic covariate effects by including time dependent covariates, as nicely illustrated in Kalbfleisch and Prentice (2002, Section 6.4). To compare working models, a desirable criterion is average prediction error (Tian, Cai, et al., 2007 for uncensored outcomes; Uno, Cai, et al., 2007 for censored outcomes). However, with censored survival data, it is in general difficult to obtain estimates of average prediction error because the support of the censoring variable is typically much shorter than that of the survival time (Sinisi, Neugebauer, et al. 2006). Thus a *localized* approach to model selection may be preferred.

Uno, Cai, et al. (2007) considered a localized approach that focused on survival prediction at one or more pre-specified failure times. Their methodology permits flexible modeling of the effect of covariates on survival by allowing regression coefficients and link functions to arbitrarily vary with time in binary regression models. The inference procedures provided by Uno, Cai, et al. (2007) provide consistent estimatioin of and large-sample confidence *intervals* for regression coefficients and average absolute prediction error, that are free of the nuisance censoring variable. Although their results do not require that working survival models are properly specified, Uno, Cai, et al. (2007) do work under the assumption that the distribution of the censoring time is independent of both survival time and covariates.

Overall, the purpose of this paper is to develop an objective, statistically formal yet flexible localized survival model selection methodology under the assumption of conditional independence of censoring and survival times given covariates. Furthermore, the proposed methods provide *simultaneous* inference over all time points and model comparisons of interest, based on estimates of average prediction error, by using multiple testing procedures that control either the generalized familywise error rate or proportion of false positives.

More specifically, the proposed methodology acts locally by specifying a set of multiple, varying-sized candidate working models for survival prediction at each of several pre-specified failure times. The conditional distribution of censoring time given covariates needs to be modeled so that consistent and asymptotically normal estimators of regression parameters and average prediction error, that are free of the nuisance censoring variable, can be obtained. This paper assumes that the censoring variable is observed in full for each subject, which allows a flexible semiparametric modeling approach to be suggested. The Discussion addresses how to handle the case when the censoring variable is not observed in full for each subject. Consistent estimates of regression coefficients and average prediction error are obtained when the working model for censoring is properly specified. When the working model for censoring is misspecified, but the working model for survival given covariates is properly specified, estimated regression coefficients are consistent, but estimated average prediction error is in general not. However, simulations demonstrate that fact may not preclude consistent selection (and estimation) of the most parsimonious candidate survival model.

At each pre-specified study time, *t*, estimation of regression coefficients and average prediction error of working survival models use so-called inverse probability weighted complete case estimators (IPWCC) (Uno, Cai, et al., 2007). The complete cases here are those subjects for which the value of *I*(*T* ≤ *t*) is able to be assessed, where *T* is the survival time. For those subjects censored at or before time *t*, the variable *I*(*T* ≤ *t*) is missing, otherwise it is observed. As illustrated in this paper, this framework allows use of augmented inverse probability weighted complete case estimators (AIPWCC) (Tsiatis, 2006), which attempt to gain efficiency by using information from subjects with missing *I*(*T* ≤ *t*), thereby making use of all available data.

The paper is organized as follows. Section 2 describes the background and notation. Estimation of regression parameters is detailed in Section 3 and of average prediction error in Section 4. Section 5 outlines the multiple testing procedures used in the context of the proposed model selection strategy that is detailed in Section 6. A simulation study is presented in Section 7 and an analysis of ACTG 398 in Section 8. Some final remarks are given in the Discussion.

Let *T* denote the survival time and let *C* denote the potential censoring time. Denote the unique observed failure times by ${t}_{1}^{*},\dots ,{t}_{{R}^{*}}^{*},$ and suppose the time points of interest are *t*_{1}, … , *t _{R}*. For simplicity in the exposition below, consider only one time point

Let *X* = min(*T,C*), with δ = *I*(*C* > *T*), *I*(.) the indicator function, and let * Z* denote the column vector of baseline covariates. The methodology presented below requires that the conditional survival function for censoring given covariates be estimated. It is frequently the case in studies of HIV/AIDS that

Consider *L* pre-specified functions * _{}* of

To evaluate the fit of each of the *L* candidate survival models ${g}_{\ell}({\beta}_{\ell}^{\top}{\tilde{\mathit{Z}}}_{\ell})$, consider average squared prediction error ${d}_{\ell}=E{\{I(T\le t)-{g}_{\ell}({\beta}_{\ell}^{\top}{\tilde{\mathit{Z}}}_{\ell})\}}^{2}$, = **1**, … , *L*, where expected value *E*(.) is with respect to *P*(*T, Z*). Average squared prediction error is used here primarily for mathematical convenience when constructing AIPWCC estimators, as shown below. To evaluate the adequacy of the th working survival model, it is required that

This paper shows how to obtain consistent estimators of β_{} and *d _{}* that are asymptotically normal (and free of the nuisance variable

The construction of AIPWCC estimators for β_{} and *d _{}* includes an augment term, which requires an estimate for

When *T* *C*|* Z* and the model for $P\left(C>x|\mathit{Z}\right),x\in {\mathcal{T}}_{t}^{*}$, is misspecified, it is not in general the case that corresponding estimators for

Using the results from Tsiatis (2006, section 10.2), define the AIPWCC estimating equation

$$\begin{array}{cc}{\widehat{U}}_{\ell}(\beta )\hfill & =\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\frac{{\mathrm{\Delta}}_{i}}{\widehat{P}*(C>{X}_{i}\wedge t|{\mathit{Z}}_{i})}{\tilde{\mathit{Z}}}_{i,\ell}\{I({X}_{i}\le t)-{g}_{\ell}({\beta}^{\top}{\tilde{\mathit{Z}}}_{i,\ell})\}}\hfill \\ \hfill & -\left\{\frac{{\mathrm{\Delta}}_{i}}{\widehat{P}*(C>{X}_{i}\wedge t|{\mathit{Z}}_{i})}-1\right\}{\tilde{\mathit{Z}}}_{i,\ell}\{\widehat{P}*(T\le t|{\mathit{Z}}_{i})-{g}_{\ell}({\beta}^{\top}{\tilde{\mathit{Z}}}_{i,\ell})\},\hfill \end{array}$$

where Δ_{i} = δ_{i}*I*(*X _{i}* ≤

Note that when the covariate vector * Z* is categorical, a saturated model for

$$0=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{\tilde{\mathit{Z}}}_{i,C}\{I\left({C}_{i}>x\right)-{h}_{x}({\widehat{\gamma}}_{x}^{\top}{\tilde{\mathit{Z}}}_{i,C})\},}$$

where * _{i,C}* is the value of

The working model *P*^{*}(*T* ≤ *t*|* Z*) is estimated using only those observations with Δ = 1, in which case

$$0=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{\mathrm{\Delta}}_{i}{\tilde{\mathit{Z}}}_{i,T}\{I({X}_{i}\le t)-{h}_{T}({\widehat{\gamma}}_{T}^{\top}{\tilde{\mathit{Z}}}_{i,T})\}}$$

and * _{i,T}* is the value of

Now, to show consistency of the estimator β ^_{}, as given by Uno, Cai, et al. (2007), it suffices to show that *E*{*Û _{}*(β)} =

Define the empirical average squared prediction error

$$\begin{array}{ccc}{\widehat{D}}_{\ell}\hfill & =\hfill & \frac{1}{n}{\displaystyle \sum _{i=1}^{n}\frac{{\mathrm{\Delta}}_{i}}{{\widehat{P}}^{*}(C>{X}_{i}\wedge t|{\mathit{Z}}_{i})}{\{I({X}_{i}\le t)-{g}_{\ell}({\widehat{\beta}}_{\ell}^{\top}{\tilde{\mathit{Z}}}_{i,\ell})\}}^{2}}\hfill \\ \hfill & \hfill & -\left\{\frac{{\mathrm{\Delta}}_{i}}{{\widehat{P}}^{*}(C>{X}_{i}\phantom{\rule{thinmathspace}{0ex}}\wedge \phantom{\rule{thinmathspace}{0ex}}t|{\mathit{Z}}_{i})}-1\right\}[{\widehat{P}}^{*}(T\le t|{\mathit{Z}}_{i})\{1-2{g}_{\ell}({\widehat{\beta}}_{\ell}^{\top}{\tilde{\mathit{Z}}}_{i,\ell})\}+{g}_{\ell}^{2}({\widehat{\beta}}_{\ell}^{\top}{\tilde{\mathit{Z}}}_{i,\ell})]\hfill \\ \hfill & =\hfill & \frac{1}{n}{\displaystyle \sum _{i=1}^{n}{A}_{i,\ell .}}\hfill \end{array}$$

The statistic * _{}* is an estimate of the so-called

Suppose it is of interest to simultaneously test the sequence of null hypotheses *H _{}* :

When the working model for censoring is properly specified, the statistic *Ŝ _{}* is asymptotically standard normal under

The stepup algorithm given by Romano and Shaikh (2006) to control the PFP(θ) at level α is as follows. Suppose there are *B* null hypotheses to be tested. Put the observed p-values in descending order to obtain *( _{B})* ≥ … ≥

$${\widehat{\pi}}_{(B)}>\frac{\alpha}{c(B,\theta )}{\alpha}_{B}^{\prime},\dots \phantom{\rule{thinmathspace}{0ex}},{\widehat{\pi}}_{({b}^{*}+1)}>\frac{\alpha}{c(B,\theta )}{\alpha}_{{b}^{*}+1}^{\prime},$$

and ${\alpha}_{b}^{\prime}=\{\text{int}(\theta b)+1\}/\{B-b+\text{int}(\theta b)+1\}$, with int(*a*) the integer part of *a*. If *b** = 0 then don’t reject any hypotheses. The formula for the constant *c*(*B*, θ) is provided in Romano and Shaikh (2006). Lehmann and Romano (2005) and Romano and Shaikh (2004) provide stepdown algorithms that provide strong control of the gFWER(*k*) and PFP(θ).

The following is an approach that attempts to identify the best fitting working survival model with the smallest number of regression terms at each failure time of interest. The algorithm works in two general steps. The first is to identify the model comparisons of interest at each failure time under consideration; once these have been identified, an MTP from section 5 is then used to simultaneously test all corresponding null hypotheses across all failure times considered. To identify the model comparisons of interest, the algorithm first compares the fit of all working models to the null model, where only an intercept term is fit. If working models are identified that provide a significantly better fit to the data than the null model, then, the reference model is re-defined from the null model to the smallest working model among these with the best fit. Working models offering a better fit than this new reference are then compared to it. The algorithm iterates in this fashion, continuing to re-define the reference null model, until no working models are determined be significantly better than the reference. A step-by-step description is provided below, where for simplicity, *t* is consider fixed.

- 0.Set
*i*= 1. - 1.Define ${H}_{\ell}^{(i)}:{d}_{\ell}={d}_{0}^{(i)}$ against the one-sided alternative ${d}_{\ell}<{d}_{0}^{(i)}.\phantom{\rule{thinmathspace}{0ex}}\text{Here}\phantom{\rule{thinmathspace}{0ex}}{d}_{0}^{(1)}=E{\{I(T\le t)-{g}_{0}^{(1)}({\beta}_{0}^{(1)})\}}^{2}$ corresponds to the null model with only an intercept. Test the sequence of null hypotheses $({H}_{\ell}^{(i)})$ using an MTP as outlined in Section 5, denoting test statistics by $({\widehat{S}}_{\ell}^{(i)})$ and p-values by $({\widehat{\pi}}_{\ell}^{(i)}).$
- 2.Re-define the reference model to be the working model rejected from $({H}_{\ell}^{(i)})$ with the smallest number of regressors with the smallest ${\widehat{\pi}}_{\ell}^{(i)}$; denote this model index by
and its population average squared prediction error by ${d}_{0}^{(i+1)}$, so that ${d}_{0}^{(i+1)}={d}_{{\ell}_{i}}$. If none of $({H}_{\ell}^{(i)})$ are rejected then stop._{i} - 3.Define ${H}_{\ell}^{(i+1)}:{d}_{\ell}={d}_{0}^{(i+1)}$, against ${d}_{\ell}<{d}_{0}^{(i+1)}$, where the index corresponds to those working models rejected from $({H}_{\ell}^{(i)})$ with ${\widehat{\pi}}_{\ell}^{(i)}<{\widehat{\pi}}_{{\ell}_{i}}^{(i)}$; if no ${\widehat{\pi}}_{\ell}^{(i)}<{\widehat{\pi}}_{{\ell}_{i}}^{(i)}$ then stop. Otherwise test this sequence using an MTP as outlined in Section 5, where ${\widehat{S}}_{\ell}^{(i+1)}$ is now defined using the estimate of ${d}_{0}^{(i+1)}$ as a reference, with p-value ${\widehat{\pi}}_{\ell}^{(i+1)}$.
- 4.Repeat 2. and 3. above, incrementing
*i*=*i*+ 1 after each iteration.

The algorithm above is repeated for each *t* (*t*_{1},… ,*t _{R}*); let

A simulation study was performed to investigate the small sample properties of the proposed model selection methodology. For simplicity, only comparisons to the null model were considered (stage *i* = 1). Two probability distributions were initially used. First, for *P*_{1}(*T,C, Z*), the vector

The second probability distribution, *P*_{2}(.), was defined as *P*_{1}(.) except for ** Z**. Here,

In all cases, the working model for *P*(*C* > *x*|* Z*) was $h({\gamma}_{0,x}+{\gamma}_{1,x}^{\top}\mathit{Z})$, for

The distributions *P*_{1}(.) and *P*(.) have 7 and 12 covariates respectively, so that the total number of candidate working models for *I*(*T* ≤ 3), using only one-way terms, is 2^{7} − 1 = 127 and 2^{12} − 1 = 4095. All working models for survival used the anti-logit link function. The best fitting, most parsimonious model for *I*(*T* ≤ 3) is the one that has *Z*_{7} as the only predictor. For both *P*_{1}(.) and *P*_{2}(.), the population value of *d _{}* is 0.225 for models containing

The simulations were conducted as follows. Separately for each sample size and each distribution, a random sample ${({\mathit{Z}}_{i})}_{i=1}^{n}$ was generated from *P _{j}*(

The empirical power of several different MTPs corresponding to *H*_{7} : *d*_{7} = *d*_{0}, where = 7 indexes the model with *Z*_{7} as the only term, is shown in Table 1 for several sample sizes and for *P*_{1}(.) and *P*_{2}(.). The target level error was α = 0.1 in all cases. The MTP with the largest power was the stepup procedure using the PFP(θ = 0.1). Figure 1 plots these powers for the stepup algorithm. Notice that, when increasing the number of null hypotheses tested, there is only a small reduction in power of the MTP using the PFP(θ = 0.1), however, the reduction when using the PFP(θ = 0) (FWER) is large. For *P*_{1}(.), the empirical FWER for all MTPs was 0.004 or less for *n* = 100, and was 0.002 or less for *n* = 200 thru 1000, with the vast majority equal to 0. For *P*_{2}(.), the empirical FWER of all MTPs was less than 0.024 for *n* = 100, less than 0.009 for *n* = 200, and 0.005 or less for *n* = 300 thru 1000, with the vast majority equal to 0. Note that with 1000 simulation iterations and a nominal level error of α = 0.1, a normal-theory based 95% confidence interval for the population PFP(θ) extends ±0.0186 from the empirical PFP(θ).

Empirical power to reject *H*_{7} for stepup procedure with PFP(θ = 0) (‘o’) and PFP(θ = 0.1) (^{‘* ’}) for *P*_{1}(.) (dashed lines) and *P*_{2}(.) (solid lines).

Define the probability distributions *P*_{3}(.) and *P*_{4}(.) as *P*_{1}(.) except with the first and second alternate definitions of *C*_{2} above, respectively. The empirical FWER and powers to reject *H*_{7} corresponding to *P*_{3}(.) and *P*_{4}(.) were all very similar to those presented for *P*_{1}(.), suggesting minor impact from mismodeling *P*(*C* > *x*|* Z*).

Table 2 presents the empirical mean and standard deviation of estimates for the coefficient β_{1,7} and average squared prediction error *d*_{7} for the candidate survival model *P*(*T* ≤ 3|*Z*_{7}) = *g*(β_{0,7} + β_{1,7}*Z*_{7}), for *P*_{1}(.), *P*_{3}(.) and *P*_{4}(.) and several sample sizes. Results for *P*_{2}(.) were very similar to those of *P*_{1}(.). For all distributions, empirical means of the estimates _{1,7} and _{7} are close to their population values of 0.70 and 0.225, respectively, with empirical standard deviations decreasing with increasing sample size. Define the working model index = 1 by *P*(*T* ≤ 3|*Z*_{1}) = *g*(β_{0,1} + β_{1,1}*Z*_{1}); note that *H*_{1} : *d*_{1} = *d*_{0} is true. As shown in Table 2, the empirical mean and standard deviation of *Ŝ*_{1}, corresponding to *H*_{1}, differ somewhat from the target values 0 and 1, with variance overestimation present. The empirical mean of *Ŝ*_{7}, corresponding to *H*_{7}, increases with sample size, and, although variance overestimation is present, its empirical standard deviation is closer to **1** compared to *Ŝ*_{1}. Since the statistic * _{}* is asymptotically linear, the nonparametric bootstrap can be used to estimate the variance of

Consider model selection for the risk of going off-track at or before week 8 (early failure) and at or before week 24 (late failure). The time scale used in this analysis was weeks, as subjects were scheduled for follow up visits at week 2, 4, 8, 16 and every 8 weeks thereafter until week 48. Note that actual visit week was used in the analysis and not scheduled visit week. The censoring time was the time from randomization to going off-study, or study closure, whichever occurred first, and thus was observed in full for each subject. Full baseline data was available for *n* = 450 subjects, of those, a total of 331 subjects were observed to go off-track; 163 at or before week 8 and 284 at or before week 24. There were observed off-track events at weeks 1 thru 24; the shortest censoring time was at week 1. For early and late failure, the percent of subjects with Δ = 1 was 98.4% and 96.4%, respectively.

The following four baseline covariates were considered: treatment group (TRT), coded as 1=additional PI, 0=placebo; NNRTI experience (NN), coded as 1=naive, 0=experienced; log_{10} HIV-l RNA (RNA); and reverse-transcriptase codon 103, coded as 1=wildtype, 0=mutant. Besides these four main effect terms, it was of interest to identify potential treatment by covariate interactions, denoted by TRT*NN, TRT*RNA and TRT* 103. By imposing the restriction that working models that include an interaction term also include the corresponding main effect terms, at a given study time of interest, *t* (8,24), the total number of candidate survival models of one term, two terms, up thru seven terms, is 34.

The working model for the censoring distribution, *P**(*C* > *x*|* Z*), included all seven regression terms, and was separately estimated at each observed off-track time less than or equal to 24 weeks. The working model

In all cases, the MTP used was the stepup procedure with the PFP(θ = 0.1) and α = 0.1. To identify the model comparisons of interest, first, for week 8, 34 hypotheses were tested with the null model as the reference model (stage *i* = 1). The results are displayed in Table 3. A total of 30 of 34 hypotheses were rejected. The smallest working model rejected with smallest p-value had RNA as the only term $({\widehat{\pi}}_{{\ell}_{1}}^{(1)}=0.001)$. Using this model as the reference model, the second stage (*i* = 2) tested 21 hypotheses, corresponding to those models in Table 3 with ${\widehat{\pi}}_{\ell}^{(1)}<{\widehat{\pi}}_{{\ell}_{1}}^{(1)}$. Of these 21 hypotheses, the smallest model rejected with smallest p-value had 2 terms, NN and RNA $({\widehat{\pi}}_{{\ell}_{2}}^{(2)}=0.0034)$. Using this model as the reference model, stage *i* = 3 tested 13 hypotheses, none were rejected. Thus the total number of model comparisons identified for week 8 was *N H*_{1} = 34 + 21 + 13 = 68.

For week 24, the first stage (*i* = 1) tested 34 hypotheses using the null model as the reference model. The smallest model rejected with smallest p-value had NN as the only term $({\widehat{\pi}}_{{\ell}_{1}}^{(1)}=0.001)$. Using this model as the reference model, stage *i* = 2 tested 21 hypotheses, corresponding to those models with ${\widehat{\pi}}_{\ell}^{(1)}<{\widehat{\pi}}_{{\ell}_{1}}^{(1)}$. Of these 21 hypotheses, the smallest model rejected with smallest p-value had 3 terms, NN, TRT and RNA $({\widehat{\pi}}_{{\ell}_{2}}^{(2)}=0.0088)$. Using this model as the reference model, stage *i* = 3 tested 11 hypotheses, none were rejected. For week 24, the total number of model comparisons identified was *N H*_{2} = 34+ 21 + 11 = 66.

The final step was to simultaneously test all of the *N H* = 68+66 = 134 hypotheses corresponding to all model comparisons of interest. For week 8, the smallest model with smallest p-value (0.0034) from latest stage rejected (*i* = 2) had 2 terms, NN and RNA. For a fixed value of RNA, being naive to NNRTIs at baseline was estimated to reduce the odds of early failure by 70% compared to being experienced; for a fixed value of NN, a log_{10} increase in baseline viral load was estimated to increase the odds of early failure by 2.2 fold. For week 24, the smallest model with smallest p-value (0.0054) from latest stage rejected (*i* = 2) had 4 terms, NN, TRT, RNA and TRT*NN. Thus a treatment effect was identified on the risk of late failure but not early failure. For a fixed value of RNA: the additional PI was estimated to reduce the odds of late failure by 29% for NNRTI naive subjects, and 70% for NNRTI experienced subjects compared to placebo; compared to having NNRTI experience, being naive to NNRTIs was estimated to reduce the odds of late failure by 75% for subjects receiving an additional PI, and 89% for subjects receiving placebo. For a fixed value of NN and TRT, a log_{10} increase in baseline viral load was estimated to increase the odds of late failure by 1.9 fold.

In conclusion, NNRTI experience and HIV-1 RNA level at baseline were found to be significantly associated with the risk of early and late failure. There was a treatment effect detected only on the risk of late failure, where, after adjusting for baseline NNRTI experience and HIV-1 RNA level, the additional PI was more beneficial for NNRTI experienced subjects than for naive subjects.

This paper provides a statistically formal yet flexible approach to objective model selection for censored survival data, which can be especially useful when covariate effects depend on time. Consistent estimators of regression parameters and average prediction error for candidate survival models, that are free of the nuisance censoring variable, are obtained when the conditional censoring distribution given covariates is correctly modeled. When the censoring variable is completely observed for each subject in the dataset, this article shows how flexible working models for the conditional censoring distribution can be formed. When the model for censoring is misspecified, consistent estimators of regression parameters require proper specification of the candidate survival model given covariates and *P**(*T* ≤ *t*|* Z*) when AIPWCC estimators are used; however estimators of average prediction error are not necessarily consistent in this case. Nevertheless, one may still be able to identify the most parsimonious working survival model and consistently estimate its regression parameters when the censoring distribution is mismodeled, as was evidenced in the simulation study.

The potential censoring time *C* may not be completely observed for each subject, for example when *T* is the time to death. In such cases, one can reverse the roles of *T* and *C* and use the semiparametric modeling approaches mentioned in the Introduction or those presented in this paper to form working models for *P*(*C* > *x*|* Z*). However, having to select two censored-data regression models simultaneously can be burdensome.

Apparent error was used instead of cross validated error (Shao, 1993; 1996) to estimate average squared prediction error because of the savings in computation. The proposed model selection strategy is designed to potentially accommodate a large number of candidate working survival models. The small-sample correction that cross validation provides to estimates of average prediction error may be negligible in the test statistics *Ŝ _{}*, which are based on the difference between two prediction error estimates. However, one may choose to implement cross-validated prediction error estimates, perhaps as a descriptive statistic for a selected survival model.

Although only the PFP(θ) was used in the simulations and analysis of ACTG 398, one could have use the gFWER(*k*). However, unlike the PFP(θ), interpretation of the gFWER(*k*) depends on the total number of null hypotheses tested.

When relatively sparse covariates are considered, for example, treatment by categorical covariate interactions, the derivative of the estimating equation corresponding parameters of the working models for censoring or survival may not be of full rank. In these settings, one may use the trust-region dogleg method (Moré, Garbow, et al., 1980; Powell, 1970) instead of the Newton-Raphson algorithm for root finding.

This work was supported by grant R01-AI058836 from the U.S. National Institutes of Health. The author is grateful to an Associate Editor and Referee whose comments improved the clarity and presentation of the paper.

**Supplementary Materials**

Web Appendices referenced in Sections 2, 3, 4 and 7 are available under the Paper Information link at the Biometrics website http://www.biometrics.tibs.org

- Cheng SC, Wei LJ, Ying Z. Analysis of Transformation Models with Censored Data. Biometrika. 1995;82:835–845.
- Cox DR. Regression Models and Life Tables (with Discussion) Journal of the Royal Statistical Society, Series B. 1972;34:187–220.
- Dudoit S, van der Laan M, Pollard K. Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates. Statistical Applications in Genetics and Molecular Biology, 3, Article 13. 2004 http://www.bepress.com/sagmb/vol3/iss1/art13. [PubMed]
- Hammer S, Vaida F, Bennett K, et al. Dual vs Single Protease Inhibitor Therapy Following Antiretroviral Treatment Failure: A Randomized Trial. Journal of the American Medical Association. 2002;288(2):169–180. [PubMed]
- Huang W, DeGruttola V, Fischl M, et al. Patterns of Plasma Human Immunodeficiency Virus Type-1 RNA Response to Antiretroviral Therapy. Journal of Infectious Diseases. 2001;183(10):1455–1465. [PubMed]
- Kalbfleisch JD, Prentice RL. The Statistical Analysis of Failure Time Data. 2nd ed. New York: John Wiley & Sons, Inc.; 2002.
- Lehmann EL, Romano JP. Generalizations of the Familywise Error Rate. Annals of Statistics. 2005;33:1138–1154.
- Moré J, Garbow B, Hillstrom K. User Guide for MINPACK 1. Argonne National Laboratory; 1980. Rept. ANL-80-74.
- Powell MJD. A Fortran Subroutine for Solving Systems of Nonlinear Algebraic Equations. In: Rabinowitz P, editor. Numerical Methods for Nonlinear Algebraic Equations. 1970.
- Romano J, Shaikh A. Technical Report No. 2004–31. Dept. of Statistics, Stanford Univ.; 2004. On Control of the False Discovery Proportion.
- Romano J, Shaikh A. Stepup Procedures for Control of Generalizations of the Familywise Error Rate. Annals of Statistics. 2006;34:1850–1873.
- Shao J. Linear Model Selection by Cross-Validation. Journal of the American Statistical Association. 1993;88:486–494.
- Shao J. Bootstrap Model Selection. Journal of the American Statistical Association. 1996;91:655–665.
- Sinisi SE, Neugebauer R, van der Laan MJ. Cross-validated Bagged Prediction of Survival. Statistical Applications in Genetics and Molecular Biology. 2006;5(1) Article 12. [PubMed]
- Tian L, Cai T, Goetghebeur E, Wei LJ. Model Evaluation Based on the Distribution of Estimated Absolute Prediction Error. Biometrika. 2007;94:297–311.
- Tsiatis AA. Semiparametric Theory and Missing Data. New York: Springer; 2006.
- Uno H, Cai T, Tian L, Wei LJ. Evaluating Prediction Rules for
*t*-Year Survivors with Censored Regression Models. Journal of the American Statistical Association. 2007;102:527–537.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |