Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3633735

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Methods
- 3 Simulation studies
- 4 Example: the Framingham Offspring Study
- 5 Concluding remarks
- References

Authors

Related links

Lifetime Data Anal. Author manuscript; available in PMC 2014 April 1.

Published in final edited form as:

Published online 2012 December 23. doi: 10.1007/s10985-012-9235-3

PMCID: PMC3633735

NIHMSID: NIHMS431263

Qian M. Zhou, Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Email: qmzhou/at/stat.sfu.ca;

Email: tcai/at/hsph.harvard.edu

The publisher's final edited version of this article is available at Lifetime Data Anal

In many clinical applications, understanding when measurement of new markers is necessary to provide added accuracy to existing prediction tools could lead to more cost effective disease management. Many statistical tools for evaluating the incremental value (IncV) of the novel markers over the routine clinical risk factors have been developed in recent years. However, most existing literature focuses primarily on global assessment. Since the IncVs of new markers often vary across subgroups, it would be of great interest to identify subgroups for which the new markers are most/least useful in improving risk prediction. In this paper we provide novel statistical procedures for *systematically* identifying potential traditional-marker based subgroups in whom it might be beneficial to apply a new model with measurements of both the novel and traditional markers. We consider various conditional time-dependent accuracy parameters for censored failure time outcome to assess the subgroup-specific IncVs. We provide non-parametric kernel-based estimation procedures to calculate the proposed parameters. Simultaneous interval estimation procedures are provided to account for sampling variation and adjust for multiple testing. Simulation studies suggest that our proposed procedures work well in finite samples. The proposed procedures are applied to the Framingham Offspring Study to examine the added value of an inflammation marker, C-reactive protein, on top of the traditional Framingham risk score for predicting 10-year risk of cardiovascular disease.

Risk models have been applied in medical practice for prediction of long-term incidence or progression of many chronic diseases such as cardiovascular disease (CVD) and cancer. With the advancement in science and technology, a wide range of biological and genomic markers have now become available to assist in risk prediction. However, due to the potential financial and medical costs associated with measuring these markers, their ability in improving the prediction of disease outcomes and treatment response over existing risk models needs to be rigorously accessed.

Effective statistical tools for evaluating the incremental value (IncV) of the novel markers over the routine clinical risk factors are crucial in the field of outcome prediction. Many of newly discovered markers, while promising and strongly associated with clinical outcomes, may have limited capacity in improving risk prediction over and above routine clinical variables (Tice et al. 2005; Wacholder et al. 2010). For example, on top of traditional risk variables from the Framingham risk score (FRS) (Wilson et al. 1998), the inflammation biomarker, C-reactive protein (CRP), was shown to provide modest prognostic information (Cook et al. 2006; Blumenthal et al. 2007; Ridker et al. 2007) while a genetic risk score consisting of 101 single nucleotide polymorphisms was reported as not useful (Paynter et al. 2010). In a recent paper, Wang et al. (2006) concluded that almost all new contemporary biomarkers for prevention of coronary heart disease added rather moderate *overall* predictive values to the FRS.

One possible explanation for the minimal improvement at the population average level is that the new markers may only be useful for certain subpopulations. For example, while much debate about the clinical utility of CRP remains, there is empirical evidence that CRP may substantially improve the prediction for subjects at intermediate risk (Ridker 2007). Such finding, if valid, would be extremely useful in clinical practice, since identifying the subgroups where markers can provide valuable improvement in prediction will not only lead to more informed clinical decisions but also reduced cost and effort compared to measuring novel markers on the entire population. However, to ensure the validity of such claims and more precisely pinpoint such specific subgroups, rigorous and systematic analytical tools for IncV evaluation are needed.

To quantify the *global* IncV of new markers for risk prediction, various approaches have been advocated. For example with the most popular one being focused on a comparison of summary measures of accuracy under a conventional and new models respectively (Heagerty and Zheng 2005; Uno et al. 2007; Cai and Cheng 2008). Excellent discussions on the choices of different accuracy measures can be found in Gail and Pfeiffer (2005). However, these measures quantify the overall IncV of new markers averaged over the entire study population and do not provide information on how the IncV may vary across different groups of subjects. If there are pre-defined subgroups, these measures could be estimated for each of the subgroups. However, in practice, it is often unclear how to optimally select subgroups for comparisons and ad-hoc subgroup analyses without careful planning and execution may lead to invalid results (Rothwell 2005; Pfeffer and Jarcho 2006; Wang et al. 2007). Furthermore, it is vitally important to adjust for multiple comparisons when conducting any subgroup analysis. Thus, an important question is how to *systematically* identify the potential subgroups who would benefit from the additional markers properly adjusting for multiple comparisons. There is a paucity of statistical literature on approaches for identifying such subgroups (D’Agostino 2006). Tian et al. (2009) proposed an inference procedure to estimate the IncVs in absolute prediction error of new markers in various subgroups of patients classified by the conventional markers. However, their method does not incorporate censoring. In addition, the subgroups in their paper were defined as groups of subjects whose conventional risk scores lie in different pre-assigned intervals. However, how to determine the length of intervals could be an issue. Uno et al. (2011b) proposed estimation procedures for the conditional quantiles of the improvement in the predicted risk separately for the cases and the controls. However, they did not provide procedures for determining which subgroups should be recommended to have the new markers measured. Furthermore, no procedures were provided to account for the sampling variation or control overall type I error which is particularly important in subgroup analysis.

In this paper, we propose systematic approaches to analyzing censored event time data for identifying subgroups of patients for whom the new markers have the most or least IncV. We consider two common accuracy measures, the partial area under the ROC curve (pAUC) and the integrated discrimination improvement (IDI) index. Compared with the standard C-statistic, for many applications, the pAUC is often advocated as a better summary measure (Dwyer 1996; Dodd and Pepe 2003; Cai and Dodd 2008), since clinical interests often lie only in a specific range of the false positive rates (FPRs) or true positive rates (TPRs). For example, the region with low FPR is of more concern for disease screening (Baker and Pinsky 2001); while the region with high TPR is of more concern for the prognosis of serious disease (Jiang et al. 1996). However, the ROC curve does not capture certain aspects of the predicted absolute risk, since it is scale invariant. Many model performance measures, including the reclassification table (Cook and Ridker 2009), net reclassification improvement (NRI) and IDI (Pencina et al. 2008), proportion of case followed (PCF) and proportion needed to follow-up (PNF) (Pfeiffer and Gail 2010), have been proposed recently to overcome the limitation of the ROC curve method. Many of these measures, such as the reclassification table, NRI, PCF and PNF, rely on pre-specified clinically meaningful risk or quantile threshold values which may not be available for most diseases. For illustration purposes, we focus primarily on pAUC and IDI in this paper but note that our procedures can be easily extended to accommodate other accuracy measures.

The rest of paper is organized as follows. In Sect. 2, we present our proposed non-parametric estimation procedure for subgroup-specific IncV of new markers and along with their corresponding interval estimation procedures. In particular, resampling based simultaneous interval estimation procedures are provided as convenient and effective tools to control for multiple comparisons. We describe results from our simulation studies in Sect. 3 and the analyses of the Framingham Offspring Study using our proposed procedures in Sect. 4. Concluding remarks are given in Sect. 5. All the technical details are included in the appendices.

Let *X* denote a set of conventional markers and let *Z* denote a set of new markers. Due to censoring, for the event time *T*^{†}, one can only observe *T* = min(*T*^{†}, *C*), *Δ* = *I* (*T*^{†} ≤ *C*), where *C* is the censoring time, which is assumed to be independent of *T*^{†} conditional on (*X*, *Z*). See below for more discussions about censoring assumptions. Furthermore, define *Y*^{†} = *I* (*T*^{†} ≤ *t*_{0}), where *t*_{0} is the prediction time of clinical interest, and *Y* = *I* (*T* ≤ *t*_{0}). Let and be the true conditional risk of developing the event by time *t*_{0} conditional on *X* only and (*X*, *Z*), respectively. Suppose a data set for analysis consists of *n* independent realizations of (*T*, *Δ*, *X*, *Z*), {(*T _{i}*,

To estimate and , one may consider a fully non-parametrical approach (Li and Doss 1995). However, in practice, such non-parametric estimates may perform poorly when the dimension of *X* or *Z* is not small due to the curse of dimensionality (Robins and Ya’Acov 1997). An alternative feasible way is approximate and by imposing simple working models

(1)

where *V*, a *p* × 1 vector, is a function of *X*, *W*, a *q* × 1 vector, is a function of *X* and *Z*, *β* and *γ* are vectors of unknown regression parameters, and *g*_{1} and *g*_{2} are known, smooth, increasing functions. An estimator of *β* and *γ* can be obtained respectively by solving the following inverse probability weighted (IPW) estimating equations as given in Uno et al. (2007):

(2)

where , and is a root-*n* consistent estimator of *G _{X,Z}(t)* =

For illustration purposes, we consider two accuracy measures, the pAUC and the IDI index. We first define both concepts in the context of evaluating a risk score/model. Suppose that we use as a risk score for classifying the event status *Y*^{†}, and without loss of generality, we assume that a higher value of is associated with a higher risk and refer to the two states, *Y*^{†} = 1 and *Y*^{†} = 0, as “diseased” and “disease-free” or “cases” and “controls”. The discrimination capacity of can be quantified based on the ROC curve, which is a plot of the TPR function, , against the FPR function, . The ROC curve, describes the inherent capacity of distinguishing “cases” from “controls”. The pAUC with a restricted region of FPR, say FPR ≤ *f*, is given by , for *f* [0, 1]. The IDI index, is simply .

To evaluate how the IncV of *Z* may vary across subgroups defined by *X*, we define new conditional pAUC and IDI index. We propose to use as a scoring system for grouping subjects with potentially similar initial risk estimates and create subgroups _{s} = {*X* : _{1}(*X*) = *s*}. Then we evaluate the IncV of *Z* for each * _{s}* based on how well can further discriminate subjects within

Conditional on , the ROC curve of is , for *u* [0, 1]. The conditional pAUC is given by , *f* [0,1]. Note that *f* = 1 yields conditional AUC(*s*. If *Z* is non-informative for _{s}, the corresponding ROC curve would be a diagonal line, and we expect that pAUC* _{s}* =

(3)

If *Z* is non-informative for this subgroup _{s}, the conditional IDI index would be 0, and therefore, the subgroup _{s} specific IncV of Z wrt the IDI index is IDI(*s*). Based on these subgroup-specific IncVs, we are able to identify the set of *s* such that *Z* is useful to improve the prediction accuracy for * _{s}*, which is referred to as the

We first discuss the estimation for the conditional TPR and FPR functions since both pAUC_{f}(*s*) and IDI(*s*) are simple functionals of these two functions. Let and . To obtain a consistent estimator of , since is between 0 and 1, we consider a non-parametric local likelihood estimation method (Tibshirani and Hastie 1987) along with IPW accounting for censoring. Specifically, we obtain as the solution to the IPW local likelihood score equation,

(4)

where , *g*(*x*) = exp(*x*)/{1 + exp(*x*)}, *K _{h}*(

Based on can be estimated as

where and (*h*_{0}, *h*_{1}) is the pair of optimal band-widths for estimating and , respectively. In the Appendix A.2, we show that is uniformly consistent for pAUC_{f}(*s*).

As a special case, when both *X* and *Z* are univariate, the ROC curve of conditional on is equivalent to the ROC curve of *Z* conditional on *X* since the ROC curve is scale invariant. A simple local constant IPW estimator of is given by

The resulting estimator of pAUC_{f}(*x*) is

where is the estimated truncated placement value proposed by Cai and Dodd (2008).

It is difficult to directly estimate the variance of since it involves unknown derivative functions. We propose a perturbation-resampling method to approximate the distribution of . This method has been widely used in survival analyses (see for example, Jin et al. 2001; Park and Wei 2003; Cai et al. 2005). To be specific, let *Ξ* = {*ξ _{i}*,

where and is the perturbed estimator of *G _{X,Z}* (·) with

Then, the perturbed pAUC is given by, , where . In the Appendix A.3, we show that the unconditional distribution of can be approximated by the conditional distribution of

(5)

given the data. With the above resampling method, for any fixed , one may obtain a variance estimator of , , based on the empirical the variance of *B* realizations from (5). For any fixed and *α* *(*0, 1*)*, a pointwise 100(1 − *α*)% confidence interval (CI) for pAUC_{f}(*s*) can be constructed via , where *c _{α}* is the 100(1 −

Based on , we may obtain plug-in estimators for and respectively as

Thus, IDI(*s*) can be estimated by . Similar to the derivations given in the Appendix A for , the asymptotic results for can be directly used to establish the consistency and asymptotic normality for . In addition, the unconditional distribution of can be approximated the conditional distribution of , given the data, where and is the perturbed counterpart of . The pointwise CIs for any fixed are constructed in a similar way as the inference for pAUC_{f}(*s*). As a special case, a kernel local constant estimator of IDI(*s*) is given by

with the perturbed counterpart given by

Selection of the optimal bandwidths for pAUC_{f}(*s*) and IDI(*s*) is illustrated in the Appendix B.

To identify the effective subpopulation, one may simultaneously assess the subgroup-specific IncV wrt a certain accuracy measure, denoted by , for example pAUC_{f}(*s*) − *f*^{2}*/*2 or IDI(*s*), over a range of *s* values by constructing simultaneous CI for . Unfortunately, the distribution of does not converge as a process in *s*, as *n* → ∞. Thus, we cannot apply the standard large sample theory for stochastic processes to approximate the distribution of . Nevertheless, by the strong approximation argument and extreme value limit theorem (Bickel and Rosenblatt 1973), we show in the Appendix A.3 that a standardized version of the sup-statistic converges in distribution to a proper random variable, where denotes the variance estimator of . In practice, for large *n*, one can approximate the distribution of *Γ* based on realizations of , where is the perturbed counterpart of . Therefore, a 100(1 − *α*) % simultaneous CI for can be obtained as , where *d _{α}* is the empirical 100(1 −

Another question of interest is whether the subgroup-specific , for example pAUC_{f}(*s*), is constant across different values of *s* over a certain interval [*s _{l}, s_{u}*]. We define the average IncV over [

where , and we define the relative subgroup-specific IncV over [*s _{l}, s_{u}*] as . The point estimate of is given by

where . In addition, the unconditional distribution of can be approximated by the conditional distribution of given the data, where with as the perturbed counterpart of and . The variance estimator of can be obtained from realizations of .

If the subgroup-specific IncV of *Z* is constant over [*s _{l}, s_{u}*], i.e., for and for

To examine the finite sample properties of the proposed estimation procedure, we conduct a simulation study where the conventional marker *X* and the new marker *Z* are both univariate and jointly generated from a bivariate normal distribution

In this simulation study, *μ _{X}* =

We investigate the kernel local constant estimator for the conditional pAUC_{f} with *f* = 0.1 representing a low FPR region and *f* = 1 representing the standard AUC. Since *Z* and log *T* are jointly normal conditional on *X* = *x*, it is straightforward to calculate the true values of pAUC* _{f}*(

The performance of the point estimates and pointwise 95% CIs obtained by the resampling method was assessed from 1,000 independent replicates. For all of these scenarios, the non-parametric estimators have substantially small biases, the estimated standard errors are close to their empirical counterparts, and empirical coverage levels are close to the nominal level. In Fig. 1, we summarize the performance of the point and interval estimates for pAUC_{0.1} with sample size 10,000. For this scenario, the empirical coverage probabilities of the 95% pointwise CIs range from 92.9 to 95.4%. The empirical coverage levels of the 95% simultaneous confidence bands for the standard AUC are 93.2% for *n* = 1,000, 93.3% for *n* = 5,000, and 94.5% for *n* = 10,000; the empirical coverage levels of the 95% simultaneous confidence bands for the pAUC_{0.1} are 93.3% for *n* = 1,000, 93.4% for *n* = 5,000, and 92.5% for *n* = 10,000.

The Framingham Offspring Study was established in 1971 with 5,124 participants who were monitored prospectively on epidemiological and genetic risk factors of CVD. Here, we use data from 1,687 female participants of which 261 have either died or experienced a CVD event by the end of follow-up period, and the 10-year event rate is 6%. The Framingham risk model, based on several clinical risk factors including age, systolic blood pressure, diastolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, current smoking status and diabetes, is widely used in clinical settings but only with moderate accuracy for predicting the 10-year risk of CVD (Cook et al. 2006). The FRS is constructed as the weighted average of the risk factors in the Framingham risk model using *β*-coefficients given in Table 6 of Wilson et al. (1998). The risk estimates are obtained from the FRS through the transformation 1 − exp{− exp(·)}. The density plot of the risk estimates obtained from the FRS is shown in Fig. 2a. The overall gain in C-statistic by adding the CRP on top of FRS is 0.002 (from 0.776 to 0.778, with 95% CI (−0.005,0.01)). Note that a log transformation is applied on the CRP throughout the analysis. According to the Framingham risk model (Wilson et al. 1998) and the risk threshold values employed by the Adult Treatment Panel III of the National Cholesterol Eduction Program (Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults 2001), these 1,687 female participants may be classified into three risk groups: 1,462 as low risk (<10 %); 193 as intermediate risk (between 10% and 20%); 32 as high risk (>20 %). The IncVs wrt C-statistic are 0.00057 (with 95% CI (−0.012,0.013)) for the low risk group; 0.037 (with 95% CI (−0.054,0.13)) for the intermediate risk group; 0.034 (with 95% CI (−0.097,0.16)) for the high risk group. Note that the low risk group consists of about 87% of the entire cohort. Now we further classify the 1,462 patients of the low risk group into 10 finer subgroups with the length of the risk interval for each subgroup being 0.01, for example, 0–0.01, 0.01–0.02, and etc. The IncVs wrt C-statistics for these 10 subgroups of low risk as well as the intermediate and high risk groups with their 95% CIs are shown in Fig. 2b. This suggests that adding CRP on top of FRS may be most useful for the risk groups around 5%, which is also referred to as the intermedium low risk group in some literature.

(**a**) The density estimates of the 10-year event risk calculated from the FRS. (**b**) The IncVs wrt C-statistics for the 10 subgroups of low risk as well as the intermediate and high risk groups with their 95% CIs

First, we investigate the IncV of the CRP over the FRS wrt AUC, pAUC_{0.1} and IDI in predicting the 10-year risk of CVD events among subgroups defined by the FRS. For the purpose of kernel smoothing, the transformation function (·) in the local likelihood score equation (4) is , where *μ _{X}* = −3.74 is the sample mean of the FRS and

The point estimates (*solid line*), and its 95% pointwise CIs (*dashed lines*) and the 95% simultaneous confidence bands (*dark shaded region*) for (**I**) the subgroup-specific IncV with respect to AUC, AUC(*x*) − 1/2; (**II**) the subgroup-specific IncV with **...**

It is worth to note that the bandwidth selection procedure is not sensitive towards the choice of the number of folds in cross-validation. Using a five-fold cross-validation, the optimal bandwidths are (0.121, 0.394) for the standard AUC, (0.238, 0.614) for pAUC_{0.1}, and (0.016, 0.272) for IDI. The resulting point estimates and CIs are almost the same as the results with the bandwidths selected via a 10-fold cross validation procedure. In addition, for calculating the weights , the survival function *G*(·) of the censoring time *C* is estimated by a Kaplan-Meier estimator since in the study *C* is likely to be independent of both *T* and *X*, *Z*. In Sect. 2.1, we commented that if this independence assumption does not hold, we could still provide a correct estimate of *G*(·) via a semi-parametric model, for example a Cox PH model. Here, we also obtained the estimates of *G*(*t*_{0}) via a Cox PH model, i.e., where *W _{c}* consists of the FRS and the CRP. Based on the resulting weights , we obtained the point estimates and CIs for the subgroup-specific IncV wrt AUC, pAUC

The point estimates (*solid line*), and its 95% pointwise CIs (*dashed lines*) and the 95% simultaneous confidence bands (*dark shaded region*) for the subgroup-specific IncV with respect to AUC and pAUC_{0.1} as well as IDI. The results are based on the weights **...**

We are also interested in testing whether the subgroup-specific IncV of the CRP over the FRS is constant over the values [0,0.4] of the risk estimates obtained from the FRS. The *p* values of testing for constant subgroup-specific IncV are 0.028 for AUC, 0.108 for pAUC_{0.1} and 0.002 for IDI. These results agree with Fig. 5, which shows the point estimates and simultaneous 95% CIs for the relative subgroup-specific IncV wrt AUC, pAUC_{0.1} and IDI. It shows that the subgroup-specific IncVs wrt AUC and IDI are not constant over the interval [0,0.4]; on the other hand,the subgroup-specific IncV wrt pAUC_{0.1} is constant over this interval. It is worth to note that the asymptotic variance of is larger than that of , and therefore, the power of testing whether the subgroup-specific IncV is constant over a certain interval is not as strong as the power of testing whether the subgroup-specific IncV is above zero over the interval.

In this paper, we propose a non-parametric procedure to estimate the IncVs of new markers in prediction accuracy accross different subgroups defined by the conventional scoring system. We also provide the pointwise and simultaneous interval estimates via perturbation resampling. In addition, with proper adjustment for multiple subgroups comparison, our approach is able to systematically identify the subgroups which would benefit from adding new markers. Unlike global measures which do not provide information on how the IncV may vary across subgroups, our methods enables the identification of subgroups for which the new markers may or may not be useful. Existing procedures often assess subgroup-specific IncVs empirically. We provide more rigorous and systematic analytical tools to ensure the validity of such claims and more precisely pinpoint such specific subgroups.

Appropriate choice of prediction accuracy summaries is of great importance to capture the usefulness of new markers. It is also motived by primary research interests. Discrimination is one of the major components in assessing the accuracy of prediction models. The AUC is the most popular summary index which depicts inherent discrimination capacity. However, it is unable to capture how well the predicted risks agree with the actual observed risks (Gail and Pfeiffer 2005). In some cases, alternative summary measures should be also considered, for example, NRI, PCF and PNF. Our approach can be naturally extended to other metrics that maybe more appropriate for particular clinical applications.

The subgroup-specific TPR and the subgroup-specific FPR both depend on the time point *t*, which is usually pre-determined. In some applications, new biomarkers might produce relatively better long-term performance in prediction accuracy than short-term. It is straightforward to extend our procedure to different time points over an arbitrary time interval since the non-parametric estimates of the TPR and FPR, , converge to a Gaussian process in time *t*. We could estimate the overall improvement of new markers over a certain time interval by integrating the subgroup-specific pAUC and the subgroup-specific IDI index wrt time *t*. Furthermore, with properly adjusting for multiple comparison, it is possible to identify the time interval where new markers have the most IncVs for different subgroups.

Instead of focusing on the prediction of *t*-year survival for a fixed time point, we might be also interested in a global assessment of a fitted prediction model for the continuous event time. One example of such global measure is the C-statistic of the prediction score (Harrell Jr et al. 1996; Korn and Simon 1990; Pencina and D’Agostino 2004). When the event time *T*^{†} is subject to right censoring which may have finite support [0, *τ*], one may consider a truncated C-statistic,

as considered in Heagerty and Zheng (2005) and Uno et al. (2011a). It is straightforward to extend *C _{τ}* to our subgroup-specific C-statistic

and construct an IPW kernel estimator for *C _{τ} (s)* as for other accuracy measures.

The Framingham Heart Study and the Framingham SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University. The Framingham SHARe data used for the analyses described in this manuscript were obtained through dbGaP (access number: phs000007.v3.p2). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI. The work is supported by Grants U01-CA86368, P01-CA053996, R01-GM085047, R01-GM079330, R01-AI052817 and U54-LM008748 awarded by the National Institutes of Health.

Let and denote expectation with respect to (wrt) the empirical probability measure of {(*T _{i}, Δ_{i}, X_{i}, Z_{i}*),

*Uniform convergence rate for*
We first establish the following uniform convergence rate of :

(6)

To this end, we note that for any given *c* and *s*,

is the solution to the estimating equation , where * ζ_{y}* = (

, and . We next establish the convergence rate for , where

We first show that

and

are both *O _{p}*{(

This implies that

where is a class of functions indexed by *β* and *e*. By the maximum inequality of Van der Vaart and Wellner (1996), we have

Together with the fact that from Uno et al. (2007), it implies that . In addition, with the standard arguments used in Bickel and Rosenblatt (1973), it can be shown that

Therefore, for *h* = *n*^{−ν}, 1/5 < ν < 1/2,

is *O _{p}*{(

Thus, . It follows from same arguments as given above that

Therefore, . In addition, we note that **0** is the unique solution to the equation **ψ**_{y}(* ζ_{y}*;

Asymptotic expansion for Let . It follows from a Taylor series expansion and the convergence rate of * ζ_{y}*(

(7)

where . Futhermore, since sup_{t≤to}{*Ĝ _{X, Z}*(

We next show that is asymptotically equivalent to

(8)

where . From (8) and the fact that *τ*{*y*; (*s*)} is bounded away from 0 uniformly in *s*, we have

where is the class of functions indexed by *γ, β* and *e*. By the maximum inequality of Van der Vaart and Wellner (1996) and the fact that from Uno et al. (2007), we have and . It follows that . Then, by a delta method,

(9)

where

(10)

Using the same arguments as for establishing the uniform convergence rate of conditional Kaplan-Meier estimators (Dabrowska 1989; Du and Akritas 2002), we obtain (6). Furthermore, following similar arguments as given in Dabrowska (1987, 1997), we have converges weakly to a Gaussian process in *c* for all *s*. Note that as for all kernel estimators, does not converge as a process in *s*.

Next we establish the uniform convergence rate for . To this end, we write

where and . It follows from (6) that . Let . Then . Noting that , we have by the continuity and boundedness of RC(*u*; *s*). Therefore,

which implies

and hence the uniform consistency of .

To derive the asymptotic distribution for , we first derive asymptotic expansions for . From the weak convergence of in *c*, the approximation in (9), and the consistency of given in the Appendix A.2, we have

On the other hand, from the uniform convergence of and the weak convergence of in *c*, we have

This, together with a Taylor series expansion and the expansion given (9), implies that

It follows that

(11)

where

(12)

It then follows from a central limit theorem that for any fixed *s*, converges to a normal with mean 0 and variance

where is the density function of ,

and

To justify the resampling method, we first note that . It follows from similar arguments given in the Appendix A and Appendix 1 of Cai et al. (2010) that , where is obtained by replacing all theoretical quantities in given in (10) with the estimated counterparts for the *i*th subject. This, together with similar arguments as given above for the expansion of , implies that

where Conditional on the data, is approximately normally distributed with mean 0 and variance

Using the consistency of the proposed estimators along with similar arguments as given above, it is not difficult to show that the above variance converges to as *n* → ∞. Therefore, the empirical distribution obtained from the perturbed sample can be used to approximate the distribution of .

We now show that after proper standardization, the supermum type statistics *Γ* converges weakly. To this end, we first note that, similar arguments as given in the Appendix A can be used to show that and

for some small positive constant *δ*. Using similar arguments in Bickel and Rosenblatt (1973), we have

where *a _{n}* = [2 log{{

where *pr*{sup_{sΩ(h)}|*n ^{ε}*(

where . It follows from similar arguments as given in Tian et al. (2005) and Zhao et al. (2010) that

in probability as *n* → ∞. Thus, the conditional distribution of *a _{n}*(

The choice of the bandwidths *h*_{0} and *h*_{1} is important for making inference about and consequently pAUC_{f}(*s*). Here we propose a two-stage K-fold cross-validation procedure to obtain the optimal bandwidth for and sequentially. Specifically, we randomly split the data into K disjoint subsets of about equal sizes denoted by . The two-stage procedure is described as follows:

- Motivated by the fact that is essentially the (1 −
*u*)-th quantile of the conditional distribution of given*Y*^{†}= 0 and , for each k, we use all the observations not in to estimate by obtaining , the minimizer ofwrt (*α*_{0},*α*_{1}), where*ρ*(_{τ}*e*) is a check function defined as*ρ*(_{τ}*e*) =*τ e*, if*e*≥ 0; = (*τ*− 1)*e*, otherwise. Let denote the resulting estimator of . With observation in , we obtainThen, we let . - Next, to find an optimal
*h*_{1}for , we choose an error function that directly relates to . Specifically, noting the fact thatwe use the corresponding mean integrated squared error for as the error function. For each*k*, we use all the observations which are not in to obtain the estimate of , denoted by via (4). Then, with the observations in , we calculate the prediction error

We let .

Since the order of is expected to be *n*^{−1/5} (Fan and Gijbels 1995), the bandwidth we use for estimation is with 0 < *d* < 3/10 such that *h _{y}* =

Same as bandwidth selection for pAUC, we also propose a K-fold cross validation procedure to choose the optimal bandwidth *h*_{1} for and *h*_{0} for separately. The procedure is described as follows: we randomly split the data into K disjoint subsets of about equal sizes denoted by . Motivated by the fact (3), for each *k*, we use all the observations not in to estimate by obtaining for *y* = 0, 1, which is the solution to the estimating equation

wrt . Let and . With observations in , we obtain

or

Then, we let and .

R codes for application will be available from the corresponding author upon request.

Qian M. Zhou, Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, Email: qmzhou/at/stat.sfu.ca.

Yingye Zheng, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, Email: yzheng/at/fhcrc.org.

Tianxi Cai, Department of Biostatistics, Harvard University, Boston, MA 02115, USA.

- Baker S, Pinsky P. A proposed design and analysis for comparing digital and analog mammography: special receiver operating characteristic methods for cancer screening. J Am Stat Assoc. 2001;96:421–428.
- Bickel P, Rosenblatt M. On some global measures of the deviations of density function estimates. Ann Stat. 1973;1:1071–1095.
- Blumenthal R, Michos E, Nasir K. Further improvements in CHD risk prediction for women. J Am Med Assoc. 2007;297:641–643. [PubMed]
- Cai T, Cheng S. Robust combination of multiple diagnostic tests for classifying censored event times. Biostatistics. 2008;9:216–233. [PubMed]
- Cai T, Dodd LE. Regression analysis for the partial area under the ROC curve. Stat Sin. 2008;18:817–836.
- Cai T, Tian L, Wei L. Semiparametric Box–Cox power transformation models for censored survival observations. Biometrika. 2005;92(3):619–632.
- Cai T, Tian L, Uno H, Solomon S, Wei L. Calibrating parametric subject-specific risk estimation. Biometrika. 2010;97(2):389–404. [PMC free article] [PubMed]
- Cook N, Ridker P. The use and magnitude of reclassification measures for individual predictors of global cardiovascular risk. Ann Intern Med. 2009;150(11):795–802. [PMC free article] [PubMed]
- Cook N, Buring J, Ridker P. The effect of including C-reactive protein in cardiovascular risk prediction models for women. Ann Intern Med. 2006;145:21–29. [PubMed]
- Cox D. Regression models and life-tables. J R Stat Soc B. 1972;34(2):187–220.
- Dabrowska D. Non-parametric regression with censored survival time data. Scand J Stat. 1987;14(3):181–197.
- Dabrowska D. Uniform consistency of the kernel conditional Kaplan–Meier estimate. Ann Stat. 1989;17(3):1157–1167.
- Dabrowska D. Smoothed Cox regression. Ann Stat. 1997;25(4):1510–1540.
- D’Agostino R. Risk prediction and finding new independent prognostic factors. J Hypertens. 2006;24(4):643–645. [PubMed]
- Dodd L, Pepe M. Partial AUC estimation and regression. Biometrics. 2003;59:614–623. [PubMed]
- Du Y, Akritas M. Iid representations of the conditional Kaplan–Meier process for arbitrary distributions. Math Methods Stat. 2002;11:152–182.
- Dwyer AJ. In pursuit of a piece of the ROC. Radiology. 1996;201:621–625. [PubMed]
- Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults Executive summary of the third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) J Am Med Assoc. 2001;285(19):2486–2497. [PubMed]
- Fan J, Gijbels I. Data-driven bandwidth selection in local polynomial regression: variable bandwidth selection and spatial adaptation. J R Stat Soc B. 1995;57:371–394.
- Gail M, Pfeiffer R. On criteria for evaluating models of absolute risk. Biostatistics. 2005;6(2):227–239. [PubMed]
- Gilbert P, Wei L, Kosorok M, Clemens J. Simultaneous inferences on the contrast of two hazard functions with censored observations. Biometrics. 2002;58(4):773–780. [PubMed]
- Harrell F, Jr, Lee K, Mark D. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361–387. [PubMed]
- Heagerty P, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61:92–105. [PubMed]
- Jiang Y, Metz C, Nishikawa R. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology. 1996;201:745–750. [PubMed]
- Jin Z, Ying Z, Wei L. A simple resampling method by perturbing the minimand. Biometrika. 2001;88(2):381–390.
- Korn E, Simon R. Measures of explained variation for survival data. Stat Med. 1990;9(5):487–503. [PubMed]
- An approach to nonparametric regression for life history data using local linear fitting. Ann Stat. 23:787–823.
- McIntosh M, Pepe M. Combining several screening tests: optimality of the risk score. Biometrics. 2002;58(3):657–664. [PubMed]
- Park Y, Wei L. Estimating subject-specific survival functions under the accelerated failure time model. Biometrika. 2003;9:717–723.
- Park B, Kim W, Ruppert D, Jones M, Signorini D, Kohn R. Simple transformation techniques for improved non-parametric regression. Scand J Stat. 1997;24(2):145–163.
- Paynter N, Chasman D, Pare G, Buring J, Cook N, Miletich J, Ridker P. Association between a literature-based genetic risk score and cardiovascular events in women. J Am Med Assoc. 2010;303(7):631–637. [PMC free article] [PubMed]
- Pencina M, D’Agostino R. Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Stat Med. 2004;23(13):2109–2123. [PubMed]
- Pencina M, D’Agostino RS, D’Agostino RJ, Vasan R. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond (with Coomentaries & Rejoinder) Stat Med. 2008;27:157–212. [PubMed]
- Pfeiffer R, Gail M. Two criteria for evaluating risk prediction models. Biometrics. 2010;67(3):1057–1065. [PMC free article] [PubMed]
- Pfeffer M, Jarcho J. The charisma of subgroups and the subgroups of CHARISMA. N Engl J Med. 2006;354(16):1744–1746. [PubMed]
- Ridker P. C-reactive protein and the prediction of cardiovascular events among those at intermediate risk: moving an inflammatory hypothesis toward consensus. J Am Coll Cardiol. 2007;49(21):2129–2138. [PubMed]
- Ridker P, Rifai N, Rose L, Buring J, Cook N. Comparison of C-reactive protein and low-density lipoprotein cholesterol levels in the prediction of first cardiovascular events. N Engl J Med. 2007;347:1557–1565. [PubMed]
- Robins J, Ya’Acov R. Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Stat Med. 1997;16(3):285–319. [PubMed]
- Rothwell P. Treating Individuals 1: External validity of randomised controlled trials: “To whom do the results of this trial apply?” Lancet. 2005;365:82–93. [PubMed]
- Tian L, Zucker D, Wei L. On the Cox model with time-varying regression coefficients. J Am Stat Assoc. 2005;100(469):172–183.
- Tian L, Cai T, Wei LJ. Identifying subjects who benefit from additional information for better prediction of the outcome variables. Biometrics. 2009;65:894–902. [PMC free article] [PubMed]
- Tibshirani R, Hastie T. Local likelihood estimation. J Am Stat Assoc. 1987;82(398):559–567.
- Tice J, Cummings S, Ziv E, Kerlikowske K. Mammographic breast density and the Gail model for breast cancer risk prediction in a screening population. Breast Cancer Res Treat. 2005;94(2):115–122. [PubMed]
- Uno H, Cai T, Tian L, Wei L. Evaluating prediction rules for t-year survivors with censored regression models. J Am Stat Assoc. 2007;102:527–537.
- Uno H, Cai T, Pencina M, D’Agostino R, Wei L. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011a;30(10):1105–1117. [PMC free article] [PubMed]
- Uno H, Cai T, Tian L, Wei LJ. Graphical procedures for evaluating overall and subject-specific incremental values from new predictiors with censored event time data. Biometrics. 2011b;67:1389–1396. [PMC free article] [PubMed]
- Van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer; New York: 1996.
- Wacholder S, Hartge P, Prentice R, Garcia-Closas M, Feigelson H, Diver W, Thun M, Cox D, Hankinson S, Kraft P, et al. Performance of common genetic variants in breast-cancer risk models. N Engl J Med. 2010;362(11):986–993. [PMC free article] [PubMed]
- Wand M, Marron J, Ruppert D. Transformation in density estimation (with comments) J Am Stat Assoc. 1991;86:343–361.
- Wang T, Gona P, Larson M, Tofler G, Levy D, Newton-Cheh C, Jacques P, Rifai N, Selhub J, Robins S. Multiple biomarkers for the prediction of first major cardiovascular events and death. N Engl J Med. 2006;355:2631–2639. [PubMed]
- Wang R, Lagakos S, Ware J, Hunter D, Drazen J. Statistics in medicine-reporting of subgroup analyses in clinical trials. N Engl J Med. 2007;357(21):2189–2194. [PubMed]
- Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of cornary heart disease using risk factor categories. Circulation. 1998;97:1837–1847. [PubMed]
- Zhao L, Cai T, Tian L, Uno H, Solomon S, Wei L, Minnier J, Kohane I, Pencina M, D’Agostino R, et al. Harvard University Biostatistics Working Paper Series 2010: Working Paper 122. 2010. Stratifying subjects for treatment selection with censored event time data from a comparative study.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |