PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Lifetime Data Anal. Author manuscript; available in PMC 2014 April 1.
Published in final edited form as:
PMCID: PMC3633735
NIHMSID: NIHMS431263

Subgroup specific incremental value of new markers for risk prediction

Abstract

In many clinical applications, understanding when measurement of new markers is necessary to provide added accuracy to existing prediction tools could lead to more cost effective disease management. Many statistical tools for evaluating the incremental value (IncV) of the novel markers over the routine clinical risk factors have been developed in recent years. However, most existing literature focuses primarily on global assessment. Since the IncVs of new markers often vary across subgroups, it would be of great interest to identify subgroups for which the new markers are most/least useful in improving risk prediction. In this paper we provide novel statistical procedures for systematically identifying potential traditional-marker based subgroups in whom it might be beneficial to apply a new model with measurements of both the novel and traditional markers. We consider various conditional time-dependent accuracy parameters for censored failure time outcome to assess the subgroup-specific IncVs. We provide non-parametric kernel-based estimation procedures to calculate the proposed parameters. Simultaneous interval estimation procedures are provided to account for sampling variation and adjust for multiple testing. Simulation studies suggest that our proposed procedures work well in finite samples. The proposed procedures are applied to the Framingham Offspring Study to examine the added value of an inflammation marker, C-reactive protein, on top of the traditional Framingham risk score for predicting 10-year risk of cardiovascular disease.

Keywords: Incremental value, Partial area under the ROC curve, Prognostic accuracy, Risk prediction, Subgroup analysis, Time dependent ROC analysis

1 Introduction

Risk models have been applied in medical practice for prediction of long-term incidence or progression of many chronic diseases such as cardiovascular disease (CVD) and cancer. With the advancement in science and technology, a wide range of biological and genomic markers have now become available to assist in risk prediction. However, due to the potential financial and medical costs associated with measuring these markers, their ability in improving the prediction of disease outcomes and treatment response over existing risk models needs to be rigorously accessed.

Effective statistical tools for evaluating the incremental value (IncV) of the novel markers over the routine clinical risk factors are crucial in the field of outcome prediction. Many of newly discovered markers, while promising and strongly associated with clinical outcomes, may have limited capacity in improving risk prediction over and above routine clinical variables (Tice et al. 2005; Wacholder et al. 2010). For example, on top of traditional risk variables from the Framingham risk score (FRS) (Wilson et al. 1998), the inflammation biomarker, C-reactive protein (CRP), was shown to provide modest prognostic information (Cook et al. 2006; Blumenthal et al. 2007; Ridker et al. 2007) while a genetic risk score consisting of 101 single nucleotide polymorphisms was reported as not useful (Paynter et al. 2010). In a recent paper, Wang et al. (2006) concluded that almost all new contemporary biomarkers for prevention of coronary heart disease added rather moderate overall predictive values to the FRS.

One possible explanation for the minimal improvement at the population average level is that the new markers may only be useful for certain subpopulations. For example, while much debate about the clinical utility of CRP remains, there is empirical evidence that CRP may substantially improve the prediction for subjects at intermediate risk (Ridker 2007). Such finding, if valid, would be extremely useful in clinical practice, since identifying the subgroups where markers can provide valuable improvement in prediction will not only lead to more informed clinical decisions but also reduced cost and effort compared to measuring novel markers on the entire population. However, to ensure the validity of such claims and more precisely pinpoint such specific subgroups, rigorous and systematic analytical tools for IncV evaluation are needed.

To quantify the global IncV of new markers for risk prediction, various approaches have been advocated. For example with the most popular one being focused on a comparison of summary measures of accuracy under a conventional and new models respectively (Heagerty and Zheng 2005; Uno et al. 2007; Cai and Cheng 2008). Excellent discussions on the choices of different accuracy measures can be found in Gail and Pfeiffer (2005). However, these measures quantify the overall IncV of new markers averaged over the entire study population and do not provide information on how the IncV may vary across different groups of subjects. If there are pre-defined subgroups, these measures could be estimated for each of the subgroups. However, in practice, it is often unclear how to optimally select subgroups for comparisons and ad-hoc subgroup analyses without careful planning and execution may lead to invalid results (Rothwell 2005; Pfeffer and Jarcho 2006; Wang et al. 2007). Furthermore, it is vitally important to adjust for multiple comparisons when conducting any subgroup analysis. Thus, an important question is how to systematically identify the potential subgroups who would benefit from the additional markers properly adjusting for multiple comparisons. There is a paucity of statistical literature on approaches for identifying such subgroups (D’Agostino 2006). Tian et al. (2009) proposed an inference procedure to estimate the IncVs in absolute prediction error of new markers in various subgroups of patients classified by the conventional markers. However, their method does not incorporate censoring. In addition, the subgroups in their paper were defined as groups of subjects whose conventional risk scores lie in different pre-assigned intervals. However, how to determine the length of intervals could be an issue. Uno et al. (2011b) proposed estimation procedures for the conditional quantiles of the improvement in the predicted risk separately for the cases and the controls. However, they did not provide procedures for determining which subgroups should be recommended to have the new markers measured. Furthermore, no procedures were provided to account for the sampling variation or control overall type I error which is particularly important in subgroup analysis.

In this paper, we propose systematic approaches to analyzing censored event time data for identifying subgroups of patients for whom the new markers have the most or least IncV. We consider two common accuracy measures, the partial area under the ROC curve (pAUC) and the integrated discrimination improvement (IDI) index. Compared with the standard C-statistic, for many applications, the pAUC is often advocated as a better summary measure (Dwyer 1996; Dodd and Pepe 2003; Cai and Dodd 2008), since clinical interests often lie only in a specific range of the false positive rates (FPRs) or true positive rates (TPRs). For example, the region with low FPR is of more concern for disease screening (Baker and Pinsky 2001); while the region with high TPR is of more concern for the prognosis of serious disease (Jiang et al. 1996). However, the ROC curve does not capture certain aspects of the predicted absolute risk, since it is scale invariant. Many model performance measures, including the reclassification table (Cook and Ridker 2009), net reclassification improvement (NRI) and IDI (Pencina et al. 2008), proportion of case followed (PCF) and proportion needed to follow-up (PNF) (Pfeiffer and Gail 2010), have been proposed recently to overcome the limitation of the ROC curve method. Many of these measures, such as the reclassification table, NRI, PCF and PNF, rely on pre-specified clinically meaningful risk or quantile threshold values which may not be available for most diseases. For illustration purposes, we focus primarily on pAUC and IDI in this paper but note that our procedures can be easily extended to accommodate other accuracy measures.

The rest of paper is organized as follows. In Sect. 2, we present our proposed non-parametric estimation procedure for subgroup-specific IncV of new markers and along with their corresponding interval estimation procedures. In particular, resampling based simultaneous interval estimation procedures are provided as convenient and effective tools to control for multiple comparisons. We describe results from our simulation studies in Sect. 3 and the analyses of the Framingham Offspring Study using our proposed procedures in Sect. 4. Concluding remarks are given in Sect. 5. All the technical details are included in the appendices.

2 Methods

2.1 Risk modeling with and without new markers

Let X denote a set of conventional markers and let Z denote a set of new markers. Due to censoring, for the event time T, one can only observe T = min(T, C), Δ = I (TC), where C is the censoring time, which is assumed to be independent of T conditional on (X, Z). See below for more discussions about censoring assumptions. Furthermore, define Y = I (Tt0), where t0 is the prediction time of clinical interest, and Y = I (Tt0). Let P1(X)=pr(Y=1X) and P2(X,Z)=pr(Y=1X,Z) be the true conditional risk of developing the event by time t0 conditional on X only and (X, Z), respectively. Suppose a data set for analysis consists of n independent realizations of (T, Δ, X, Z), {(Ti, Δi, Xi, Zi)}. Although Y and the conditional risk functions depend on t0, we suppress t0 from the notation throughout for the ease of presentation. From the Neyman–Pearson Lemma and similar arguments as given in McIntosh and Pepe (2002), it is not difficult to show that P1(X) achieves the optimal ROC curve for predicting Y based on X only. Similarly, P2(X,Z) is the optimal score for prediction Y given (X, Z).

To estimate P1(X) and P2(X,Z), one may consider a fully non-parametrical approach (Li and Doss 1995). However, in practice, such non-parametric estimates may perform poorly when the dimension of X or Z is not small due to the curse of dimensionality (Robins and Ya’Acov 1997). An alternative feasible way is approximate P1() and P2() by imposing simple working models

pr(Y=1X)=g1(βV),pr(Y=1X,Z)=g2(γW),
(1)

where V, a p × 1 vector, is a function of X, W, a q × 1 vector, is a function of X and Z, β and γ are vectors of unknown regression parameters, and g1 and g2 are known, smooth, increasing functions. An estimator of β and γ can be obtained respectively by solving the following inverse probability weighted (IPW) estimating equations as given in Uno et al. (2007):

i=1nω^iVi{Yig1(βVi)}=0,i=1nω^iWi{Yig2(γWi)}=0.
(2)

where ω^i=ΔiI(Tit0)G^Xi,Zi(Ti)+I(Ti>t0)G^Xi,Zi(t0), and G^X,Z(t) is a root-n consistent estimator of GX,Z(t) = pr(Ct|X, Z). This IPW estimator may be justified heuristically the argument that E{ω^iI(Yi=y)Ti,Xi,Zi}E{I(Yi=y)Ti,Xi,Zi}, for y = 0, 1. Let β^ and γ^ be the resulting estimator of β and γ, respectively. For a subject with X = x, Z = z whose V = v and W = w, the risk is estimated by p^1(x)=g1(β^ν) based on X alone and by p^2(x,z)=g2(γ^ω) based on X and Z. It has been previously shown in Uno et al. (2007) that regardless of the adequacy of the working model (1), θ^=(β^,γ^) converges in probability to a deterministic vector θ0=(β0,γ0) as n → ∞. Let p1(x)=g1(β0ν) and p2(x,z)=g2(γ0ω). When the models in (1) are correctly specified, p1(x)=P1(x) and p2(x,z)=P2(x,z). To obtain a valid estimator G^X,Z(), one may impose a semi-parametric model, such as the proportional hazards (PHs) model (Cox 1972), for GX,Z(t) and obtain G^X,Z(t) as exp{Λ^0(t)exp(γ^cWc)}, where Wc is a function of (X, Z), γ^c is the maximum partial likelihood estimator and Λ^0(t) is the Breslow’s estimator. When the censoring is independent of both T and (X, Z), one may obtain G^X,Z() simply as the Kaplan–Meier estimator. It is important to note that if the in (1) only hold for a given t0 and the dimension of (X, Z) is not small, root-n consistent estimators of β and γ may not exist without imposing additional modeling assumptions about GX,Z(t) due to the curse of dimensionality (Robins and Ya’Acov 1997).

2.2 Subgroup specific IncVs

For illustration purposes, we consider two accuracy measures, the pAUC and the IDI index. We first define both concepts in the context of evaluating a risk score/model. Suppose that we use p2(X,Z) as a risk score for classifying the event status Y, and without loss of generality, we assume that a higher value of p2(X,Z) is associated with a higher risk and refer to the two states, Y = 1 and Y = 0, as “diseased” and “disease-free” or “cases” and “controls”. The discrimination capacity of p2(X,Z) can be quantified based on the ROC curve, which is a plot of the TPR function, S1(c)pr{p2(X,Z)cY=1}, against the FPR function, S0(c)pr{p2(X,Z)cY=0}. The ROC curve, ROC(u)=S1{S01(u)} describes the inherent capacity of distinguishing “cases” from “controls”. The pAUC with a restricted region of FPR, say FPR ≤ f, is given by pAUCf=0fROC(u)du, for f [set membership] [0, 1]. The IDI index, is simply IDI=01S1(c)dc01S0(c)dc.

To evaluate how the IncV of Z may vary across subgroups defined by X, we define new conditional pAUC and IDI index. We propose to use p1(x) as a scoring system for grouping subjects with potentially similar initial risk estimates and create subgroups [upsilon]s = {X : [p with macron]1(X) = s}. Then we evaluate the IncV of Z for each [upsilon]s based on how well p2(X,Z) can further discriminate subjects within [upsilon]s with Y = 1 from those with Y = 0. Suppose p2(X,Z) is used to classify Y for subjects in [upsilon]s. The TPR and FPR of the classification rule p2(X,Z)c given [upsilon]s S1(c;s) and S0(c;s), respectively, where

Sy(c;s)=pr{p2(X,Z)cY=y,p1(X)=s},andc(0,1),y=0,1.

Conditional on p1(X)=s, the ROC curve of p2(X,Z) is ROC(u;s)=S1{S01(u;s);s}, for u [set membership] [0, 1]. The conditional pAUC is given by pAUCf(s)=0fROC(u;s)du, f [set membership] [0,1]. Note that f = 1 yields conditional AUC(s. If Z is non-informative for [upsilon]s, the corresponding ROC curve would be a diagonal line, and we expect that pAUCs = f2/2, which is the area under a diagonal line. Thus, the subgroup [upsilon]s specific IncV of Z with respect to (wrt) the pAUC is given by pAUCf(s) − f2/2. The IDI index conditional on p1(X)=s is given by

IDI(s)=01S1(c;s)dc01S0(c;s)dc=E{p2(X,Z)Y=1,p1(X)=s}E{p2(X,Z)Y=0,p1(X)=s}.
(3)

If Z is non-informative for this subgroup [upsilon]s, the conditional IDI index would be 0, and therefore, the subgroup [upsilon]s specific IncV of Z wrt the IDI index is IDI(s). Based on these subgroup-specific IncVs, we are able to identify the set of s such that Z is useful to improve the prediction accuracy for [upsilon]s, which is referred to as the effective subpopulation [upsilon]* in our paper. Specifically, the effective subpopulation wrt pAUC is defined as [upsilon]* = {X : pAUCf([p with macron]1(X))−f2/2>c1}; the effective subpopulation wrt IDI are defined as [upsilon]* = {X : IDI([p with macron]1(X))>c2ý, where c1 and c2 are some possibly data dependent threshold values. For the subjects in the effective subpopulation, measurement of new markers would provide added accuracy to the conventional risk model.

2.3 Inference about subgroup-specific IncVs

We first discuss the estimation for the conditional TPR and FPR functions {Sy(c;s),y=0,1} since both pAUCf(s) and IDI(s) are simple functionals of these two functions. Let p^1i=p^1(Xi)=g1(β^Vi) and p^2i=p^2(Xi,Zi)=g2(γ^Wi). To obtain a consistent estimator of Sy(c;s), since Sy(C;s) is between 0 and 1, we consider a non-parametric local likelihood estimation method (Tibshirani and Hastie 1987) along with IPW accounting for censoring. Specifically, we obtain {a^y,hy(c;s),b^y,hy(c;s)} as the solution to the IPW local likelihood score equation,

i:Yi=yω^i[1hy1ε^1i(s)]Khy{ε^1i(s)}[I(p^2ic)g{a+bε^1i(s)}]=0,
(4)

where ε^1i(s)=ϕ(p^1i)ϕ(s), g(x) = exp(x)/{1 + exp(x)}, Kh(x) = K (x/h)/h, K(·) is a known smooth symmetric kernel density function with a bounded support, and the bandwidth hy > 0 is assumed to be O(nν), for 1/5 < ν < 1/2, and [var phi](·) is a known, non-decreasing transformation function that can potentially be helpful in improving the performance of the smoothed estimator (Wand et al. 1991; Park et al. 1997). Then, S(c;s) can be estimated by S^y,hy(c;s)=g{a^y,hy(c;s)} for y = 0, 1. In the Appendix A.1, we show that S^y,hy(c;s)Sy(c;s)0 in probability as n → ∞, uniformly in c [set membership] [0, 1] and sIhy[ϕ1(ρl=hy),ϕ1(ρuhy)] where [ρl, ρu] is a subset of the support of ϕ{g1(β0V)} and β0 is the limit of β^. As a special case, by setting b in (4) to 0, one may obtain a local constant estimator,

S^y,hy(c;s)=i=1nω^iI(Yi=y)Khy{ε^1i(s)}I(p^2ic)i=1nω^iI(Yi=y)Khy{ε^1i(s)},y=0,1.

2.4 Inference procedures for pAUCf(s)

Based on S^y,hy(c;s) can be estimated as

pAUC^f(s)=0fROC^(u;s)du=S^0,h01(f;s)S^1,h1(c;s)d{1S^0,h0(c;s)}.

where ROC^(u;s)=S^1,h1{S^0,h01(u;s);s} and (h0, h1) is the pair of optimal band-widths for estimating S0(c;s) and S1(c;s), respectively. In the Appendix A.2, we show that pAUC^f(s) is uniformly consistent for pAUCf(s).

As a special case, when both X and Z are univariate, the ROC curve of p2(X,Z) conditional on p1(X) is equivalent to the ROC curve of Z conditional on X since the ROC curve is scale invariant. A simple local constant IPW estimator of Sy(z;x) is given by

S^y,hy(z;x)=i=1nω^iI(Yi=y)Khy(Xix)I(Ziz)i=1nω^iI(Yi=y)Khy(Xix).

The resulting estimator of pAUCf(x) is

pAUC^f(x)=0fS^1,h1{S^0,h01(u;x);x}du=i=1nω^iYiKh1(Xix)Ui(x;f,h0)i=1nω^iYiKh1(Xix),

where Ui(x;f,h0)=fmin{f,S^0,h0(Zi;x)} is the estimated truncated placement value proposed by Cai and Dodd (2008).

It is difficult to directly estimate the variance of W^pAUCf(s)=nh1{pAUC^f(s)pAUCf(s)} since it involves unknown derivative functions. We propose a perturbation-resampling method to approximate the distribution of W^pAUCf(s). This method has been widely used in survival analyses (see for example, Jin et al. 2001; Park and Wei 2003; Cai et al. 2005). To be specific, let Ξ = {ξi, i = 1, …, n} be n independent positive random variables following a known distribution with mean 1 and variance 1, and Ξ is independent of the data. For each set of Ξ, we first obtain β* and γ*, as the respective solutions to

i=1nωiVi{Yig1(βVi)}ξi=0,andi=1nωiWi{Yig2(γWi)}ξi=0,

where ωi=ΔiI(Tit0)Gxi,Zi(Ti)+I(Ti>t0)GXiZi(t0) and GX,Z is the perturbed estimator of GX,Z (·) with Ξ being the weights. Let p1i=g1(βVi),ε1i(s)=ϕ(p1i)ϕ(s), p2i=g2(γWi), and Mi(c)=I(p2ic). Subsequently, we obtain the perturbed counterpart of Sy(c;s) as Sy,hy(c;s)=g{ay,hy(c;s)}, where ay,hy(c;s) is the solution to the perturbed score equation

i=1nωiI(Yi=y)[1hy1ε1i(s)]Khy{ε1i(s)}[Mi(c)g{a+bε1i(s)}]ξi=0.

Then, the perturbed pAUC is given by, pAUCf(s)=0fROC(u;s)du, where ROC(u;s)=S1,h1{S0,h01(u;s);s}. In the Appendix A.3, we show that the unconditional distribution of W^pAUCf(s) can be approximated by the conditional distribution of

WpAUCf(s)=nh1{pAUCf(s)pAUC^f(s)},
(5)

given the data. With the above resampling method, for any fixed sIh1, one may obtain a variance estimator of W^pAUCf(s), σ^f2(s), based on the empirical the variance of B realizations from (5). For any fixed sIh1 and α [set membership] (0, 1), a pointwise 100(1 − α)% confidence interval (CI) for pAUCf(s) can be constructed via pAUC^f(s)±(nh1)12cασ^f(s), where cα is the 100(1 − α)th percentile of the standard normal distribution.

2.4.1 Inference for IDI(s)

Based on S^y,hy(c;s), we may obtain plug-in estimators for IS(s)=01S1(c;s)dc and IP(s)=01S0(c;s)dc respectively as

IS^(s)=01S^1,h1(c;s)dc,andIP^(s)=01S^0,h0(c;s)dc.

Thus, IDI(s) can be estimated by IDI^(s)=IS^(s)IP^(s). Similar to the derivations given in the Appendix A for pAUC^f(s), the asymptotic results for S^y,hy(c;s) can be directly used to establish the consistency and asymptotic normality for IDI^(s). In addition, the unconditional distribution of W^IDI(s)=nh1{IDI^(s)IDI(s)} can be approximated the conditional distribution of WIDI(s)=nh1{IDI(s)IDI^(s)}, given the data, where IDI(s)=01S1,h1(c;s)dc01S0,h0(c;s)dc and Sy,hy(c;s) is the perturbed counterpart of Sy,hy(c;s),y=0,1. The pointwise CIs for any fixed sIh1 are constructed in a similar way as the inference for pAUCf(s). As a special case, a kernel local constant estimator of IDI(s) is given by

IDI^(s)=i=1nω^iYiKh1{ε^1i(s)}p^2ii=1nω^iYiKh1{ε^1i(s)}i=1nω^i(1Yi)Kh0{ε^1i(s)}p^2ii=1nω^i(1Yi)Kh0{ε^1i(s)},

with the perturbed counterpart given by

IDI(s)=i=1nωiYiKh1{ε1i(s)}p2iξii=1nωiYiKh1{ε1i(s)}ξii=1nωi(1Yi)Kh0{ε1i(s)}p2iξii=1nωi(1Yi)Kh0{ε1i(s)}ξi.

Selection of the optimal bandwidths for pAUCf(s) and IDI(s) is illustrated in the Appendix B.

2.4.2 Identifying the effective subpopulation

To identify the effective subpopulation, one may simultaneously assess the subgroup-specific IncV wrt a certain accuracy measure, denoted by A(s), for example pAUCf(s) − f2/2 or IDI(s), over a range of s values by constructing simultaneous CI for {A(s),sIh1}. Unfortunately, the distribution of W^A(s) does not converge as a process in s, as n → ∞. Thus, we cannot apply the standard large sample theory for stochastic processes to approximate the distribution of W^A(s). Nevertheless, by the strong approximation argument and extreme value limit theorem (Bickel and Rosenblatt 1973), we show in the Appendix A.3 that a standardized version of the sup-statistic Γ=supsIh1W^A(s)σ^A(s) converges in distribution to a proper random variable, where σ^A2 denotes the variance estimator of W^A(s). In practice, for large n, one can approximate the distribution of Γ based on realizations of Γ=supsIh1WA(s)σ^A(s), where W^A is the perturbed counterpart of W^A. Therefore, a 100(1 − α) % simultaneous CI for A(s) can be obtained as A^(s)±(nh1)12dασ^A(s), where dα is the empirical 100(1 − α)th quantile of Γ*. Thus, to account for sampling variation and multiple testing, the effective subpopulation is chosen as {X:A^(p^1(X))>(nh1)12dασ^A(p^1(X))} in real data analyses.

2.4.3 Test for heterogeneous IncV

Another question of interest is whether the subgroup-specific VA(s), for example pAUCf(s), is constant across different values of s over a certain interval [sl, su]. We define the average IncV over [sl, su] as

A[sl,su]=slsuA(s)dF(s)[F(su)F(sl)]

where F(s)=Pr{p1(X)s}, and we define the relative subgroup-specific IncV over [sl, su] as DA[sl,su](s)=A(s)A[sl,su]. The point estimate of DA[sl,su](s) is given by

D^A[sl,su](s)=A^(s)A^[sl,su],A^[sl,su]=n1i=1nA^(p^1i)I(p^1i[sl,su])F^(su)F^(sl)

where F^(s)=n1Σi=1nI{p^1is}. In addition, the unconditional distribution of W^D(s)=nh1{DAsl,su(s)DA[sl,su](s)} can be approximated by the conditional distribution of WD(s)=nh1{DA[sl,su](s)D^A[sl,su](s)} given the data, where DA[sl,su](s)=A(s)A[sl,su] with A(s) as the perturbed counterpart of A^(s) and A[sl,su]=n1Σi=1nA(p^1i)I(p^1i[sl,su])[F^(su)F^(sl)]. The variance estimator σ^D2 of W^D(s) can be obtained from realizations of WD(s).

If the subgroup-specific IncV of Z is constant over [sl, su], i.e., A(s)c0 for s[sl,su],A[sl,su]=c0 and DA[sl,su](s)=0 for s [set membership] [sl, su]. Testing whether the subgroup specific IncV is constant over [sl, su] is the equivalent to testing the null hypothesis H0:DA[sl,su](s)=0 for s [set membership] [sl, su]. To adjust for multiple testing, we consider the standard version of the sup-statistic ΓD=sups[sl,su]W^D(0)(s)σ^D(s) where W^D(0)(s)=nh1D^A[sl,su](s) is the statistic W^D(s) under the null hypothesis H0. One may approximate the distribution of ΓD based on realizations of ΓD=sups[sl,su]WD(s)σ^D(s). The empirical p value for testing the null hypothesis H0 can be obtained by B1Σb=1BI{ΓD(b)>ΓD}, where {ΓD(b),b=1,,B} are B realizations of ΓD.

3 Simulation studies

To examine the finite sample properties of the proposed estimation procedure, we conduct a simulation study where the conventional marker X and the new marker Z are both univariate and jointly generated from a bivariate normal distribution

(XZ)~BVN([μXμZ],[σX2σXσZρXZσXσZρXZσZ2]).

In this simulation study, μX = μZ = 0, σX = 2 and σZ 0.5, and ρXZ 0.01. The failure time T, given the markers X and Z, is generated from an accelerated failure time model with a log-normal distribution for T, i.e., log T = h(X, Z) + [set membership], where [set membership] is a normal random variable with mean 0 and standard deviation σT = 1.5. In this simulation study, h(X, Z) is a linear model, i.e., h(X, Z) = −βX XβZ ZβXZXZ. We consider a practical situation where the new marker Z may make a major contribution to the underlying mechanism in contrast with the conventional marker X, although it may not be measured routinely. Thus, in this simulation study, we set βX = 0.01 and βZ = βX Z = 1. The censoring time C is generated from an exponential distribution with rate c01. A value of c0 ≈ 20 is chosen such that roughly 80% of the failure time is censored. A time point t0 ≈ 0.2 is set such that the proportion of the “cases” in the sample is approximately 20%.

We investigate the kernel local constant estimator for the conditional pAUCf with f = 0.1 representing a low FPR region and f = 1 representing the standard AUC. Since Z and log T are jointly normal conditional on X = x, it is straightforward to calculate the true values of pAUCf(x). We consider a relatively smaller sample size 1,000, a moderate sample size of 5,000 and a relatively larger sample size of 10,000. Both of the pAUC with FPR ≤ 0.1, i.e., pAUC0.1(x) and the full AUC are estimated at a sequence of values of X. For ease of computation, the pair of the bandwidths (h0, h1) for constructing the non-parametric estimate was fixed at (i) for the full AUC, (2.531, 2.102) for n = 1,000, (1.905,1.534) for n = 5,000, and (1.640, 1.380) for n = 10,000; (ii) for the pAUC0.1, (2.377,2.432) for n = 1,000, (1.843,2.361) for n = 5,000, and (1.507, 2.085) for n = 10,000. Here, (h0opt,h1opt) were chosen as the average of the bandwidths selected from 10 independent simulated datasets using the two-stage of five-fold cross-validation method described in the Appendix B and n−0.1 was multiplied to hyopt to yield the final bandwidths used for simulation. In addition, the kernel function K (·) was chosen as the Epanechnikov kernel. Here, since we assume that the censoring time C is independent of both T and (X, Z), GX,Z(t) = G(t) is estimated by a Kaplan–Meier estimator.

The performance of the point estimates and pointwise 95% CIs obtained by the resampling method was assessed from 1,000 independent replicates. For all of these scenarios, the non-parametric estimators have substantially small biases, the estimated standard errors are close to their empirical counterparts, and empirical coverage levels are close to the nominal level. In Fig. 1, we summarize the performance of the point and interval estimates for pAUC0.1 with sample size 10,000. For this scenario, the empirical coverage probabilities of the 95% pointwise CIs range from 92.9 to 95.4%. The empirical coverage levels of the 95% simultaneous confidence bands for the standard AUC are 93.2% for n = 1,000, 93.3% for n = 5,000, and 94.5% for n = 10,000; the empirical coverage levels of the 95% simultaneous confidence bands for the pAUC0.1 are 93.3% for n = 1,000, 93.4% for n = 5,000, and 92.5% for n = 10,000.

Fig. 1
Performance of the point estimates, the standard error estimates and pointwise CIs for pAUC0.1 with sample size 10,000: (a) the true pAUC0.1(x) (solid) and the average point estimates (dashed) over 1,000 replicates, (b) the empirical standard error estimates ...

4 Example: the Framingham Offspring Study

The Framingham Offspring Study was established in 1971 with 5,124 participants who were monitored prospectively on epidemiological and genetic risk factors of CVD. Here, we use data from 1,687 female participants of which 261 have either died or experienced a CVD event by the end of follow-up period, and the 10-year event rate is 6%. The Framingham risk model, based on several clinical risk factors including age, systolic blood pressure, diastolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, current smoking status and diabetes, is widely used in clinical settings but only with moderate accuracy for predicting the 10-year risk of CVD (Cook et al. 2006). The FRS is constructed as the weighted average of the risk factors in the Framingham risk model using β-coefficients given in Table 6 of Wilson et al. (1998). The risk estimates are obtained from the FRS through the transformation 1 − exp{− exp(·)}. The density plot of the risk estimates obtained from the FRS is shown in Fig. 2a. The overall gain in C-statistic by adding the CRP on top of FRS is 0.002 (from 0.776 to 0.778, with 95% CI (−0.005,0.01)). Note that a log transformation is applied on the CRP throughout the analysis. According to the Framingham risk model (Wilson et al. 1998) and the risk threshold values employed by the Adult Treatment Panel III of the National Cholesterol Eduction Program (Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults 2001), these 1,687 female participants may be classified into three risk groups: 1,462 as low risk (<10 %); 193 as intermediate risk (between 10% and 20%); 32 as high risk (>20 %). The IncVs wrt C-statistic are 0.00057 (with 95% CI (−0.012,0.013)) for the low risk group; 0.037 (with 95% CI (−0.054,0.13)) for the intermediate risk group; 0.034 (with 95% CI (−0.097,0.16)) for the high risk group. Note that the low risk group consists of about 87% of the entire cohort. Now we further classify the 1,462 patients of the low risk group into 10 finer subgroups with the length of the risk interval for each subgroup being 0.01, for example, 0–0.01, 0.01–0.02, and etc. The IncVs wrt C-statistics for these 10 subgroups of low risk as well as the intermediate and high risk groups with their 95% CIs are shown in Fig. 2b. This suggests that adding CRP on top of FRS may be most useful for the risk groups around 5%, which is also referred to as the intermedium low risk group in some literature.

Fig. 2
(a) The density estimates of the 10-year event risk calculated from the FRS. (b) The IncVs wrt C-statistics for the 10 subgroups of low risk as well as the intermediate and high risk groups with their 95% CIs

First, we investigate the IncV of the CRP over the FRS wrt AUC, pAUC0.1 and IDI in predicting the 10-year risk of CVD events among subgroups defined by the FRS. For the purpose of kernel smoothing, the transformation function [var phi](·) in the local likelihood score equation (4) is ϕ(x)=Φ(xμXσX), where μX = −3.74 is the sample mean of the FRS and σ X = 1.35 is the sample standard deviation of the FRS, and Φ(x) is the cumulative distribution function of a standard normal distribution. Here we use local kernel constant estimates with Epanechnikov kernel. The optimal bandwidths (h0opt,h1opt) in [var phi]-scale are chosen via a 10-fold cross validation procedure: (0.117,0.393) for the standard AUC, (0.264,0.721) for pAUC0.1, and (0.018,0.273) for IDI. The point estimates along with the 95% pointwise and simultaneous CIs for the subgroup-specific IncV wrt AUC, pAUC0.1 and IDI are shown in Fig. 3. The point estimate for IDI is obtained via a cross-validation procedure to correct for biases due to overfitting. Based on the pointwise CIs of the subgroup-specific IncV wrt AUC, the addition of CRP appears to improve risk prediction for subjects with the FRS risk ranging from 0.028 to 0.096. The corresponding range is 0.008–0.148 when based on the CIs for the subgroup-specific IncV wrt pAUC0.1; 0.004–0.102 when based on the CIs for the subgroup-specific IncV wrt IDI. After controlling for the overall type I error, inclusion of CRP may significantly improve discrimination for subjects with the FRS risk ranging from 0.034 to 0.070 based on AUC; from 0.010 to 0.078 based on pAUC0.1; from 0.032 to 0.068 based on IDI. The IDI findings and the pAUC findings agree with each other. These results suggest that CPR might be useful to improve risk prediction among patients regarded as having low to moderate risk according to the FRS.

Fig. 3
The point estimates (solid line), and its 95% pointwise CIs (dashed lines) and the 95% simultaneous confidence bands (dark shaded region) for (I) the subgroup-specific IncV with respect to AUC, AUC(x) − 1/2; (II) the subgroup-specific IncV with ...

It is worth to note that the bandwidth selection procedure is not sensitive towards the choice of the number of folds in cross-validation. Using a five-fold cross-validation, the optimal bandwidths (h0opt,h1opt) are (0.121, 0.394) for the standard AUC, (0.238, 0.614) for pAUC0.1, and (0.016, 0.272) for IDI. The resulting point estimates and CIs are almost the same as the results with the bandwidths selected via a 10-fold cross validation procedure. In addition, for calculating the weights ω^i, the survival function G(·) of the censoring time C is estimated by a Kaplan-Meier estimator since in the study C is likely to be independent of both T and X, Z. In Sect. 2.1, we commented that if this independence assumption does not hold, we could still provide a correct estimate of G(·) via a semi-parametric model, for example a Cox PH model. Here, we also obtained the estimates of G(t0) via a Cox PH model, i.e., exp{Λ0(t0)exp(γ^cWc)} where Wc consists of the FRS and the CRP. Based on the resulting weights ω^i, we obtained the point estimates and CIs for the subgroup-specific IncV wrt AUC, pAUC0.1 and IDI, which is presented in Fig. 4. The results are very similar to the results using Kaplan–Meier estimator of G(·), and therefore it implies that the independence assumption about the censoring time C is reasonable.

Fig. 4
The point estimates (solid line), and its 95% pointwise CIs (dashed lines) and the 95% simultaneous confidence bands (dark shaded region) for the subgroup-specific IncV with respect to AUC and pAUC0.1 as well as IDI. The results are based on the weights ...

We are also interested in testing whether the subgroup-specific IncV of the CRP over the FRS is constant over the values [0,0.4] of the risk estimates obtained from the FRS. The p values of testing for constant subgroup-specific IncV are 0.028 for AUC, 0.108 for pAUC0.1 and 0.002 for IDI. These results agree with Fig. 5, which shows the point estimates and simultaneous 95% CIs for the relative subgroup-specific IncV wrt AUC, pAUC0.1 and IDI. It shows that the subgroup-specific IncVs wrt AUC and IDI are not constant over the interval [0,0.4]; on the other hand,the subgroup-specific IncV wrt pAUC0.1 is constant over this interval. It is worth to note that the asymptotic variance of D^A[sl,su](s) is larger than that of A^(s), and therefore, the power of testing whether the subgroup-specific IncV is constant over a certain interval is not as strong as the power of testing whether the subgroup-specific IncV is above zero over the interval.

Fig. 5
The point estimates (solid line), and its 95% simultaneous confidence bands (dark shaded region) for the relative subgroup-specific IncVs, which are used to test for heterogeneous IncVs, with respect to AUC and pAUC0.1 as well as IDI over the interval ...

5 Concluding remarks

In this paper, we propose a non-parametric procedure to estimate the IncVs of new markers in prediction accuracy accross different subgroups defined by the conventional scoring system. We also provide the pointwise and simultaneous interval estimates via perturbation resampling. In addition, with proper adjustment for multiple subgroups comparison, our approach is able to systematically identify the subgroups which would benefit from adding new markers. Unlike global measures which do not provide information on how the IncV may vary across subgroups, our methods enables the identification of subgroups for which the new markers may or may not be useful. Existing procedures often assess subgroup-specific IncVs empirically. We provide more rigorous and systematic analytical tools to ensure the validity of such claims and more precisely pinpoint such specific subgroups.

Appropriate choice of prediction accuracy summaries is of great importance to capture the usefulness of new markers. It is also motived by primary research interests. Discrimination is one of the major components in assessing the accuracy of prediction models. The AUC is the most popular summary index which depicts inherent discrimination capacity. However, it is unable to capture how well the predicted risks agree with the actual observed risks (Gail and Pfeiffer 2005). In some cases, alternative summary measures should be also considered, for example, NRI, PCF and PNF. Our approach can be naturally extended to other metrics that maybe more appropriate for particular clinical applications.

The subgroup-specific TPR S1(c;s,t)=pr{p2(X,Z)cTt,p1(X)=s} and the subgroup-specific FPR S0(c;s,t)=pr{p2(X,Z)cT>t,p1(X)=s} both depend on the time point t, which is usually pre-determined. In some applications, new biomarkers might produce relatively better long-term performance in prediction accuracy than short-term. It is straightforward to extend our procedure to different time points over an arbitrary time interval since the non-parametric estimates of the TPR and FPR, S^(c;s,t), converge to a Gaussian process in time t. We could estimate the overall improvement of new markers over a certain time interval by integrating the subgroup-specific pAUC and the subgroup-specific IDI index wrt time t. Furthermore, with properly adjusting for multiple comparison, it is possible to identify the time interval where new markers have the most IncVs for different subgroups.

Instead of focusing on the prediction of t-year survival for a fixed time point, we might be also interested in a global assessment of a fitted prediction model for the continuous event time. One example of such global measure is the C-statistic of the prediction score P2(X,Z),pr{p2(Xi,Zi)>p2(Xi,Zi)Ti>Ti} (Harrell Jr et al. 1996; Korn and Simon 1990; Pencina and D’Agostino 2004). When the event time T is subject to right censoring which may have finite support [0, τ], one may consider a truncated C-statistic,

Cτ=pr{p2(Xi,Zi)>p2(Xi,Zi)Ti>Ti,Ti<τ},

as considered in Heagerty and Zheng (2005) and Uno et al. (2011a). It is straightforward to extend Cτ to our subgroup-specific C-statistic

Cτ(s)=pr{p2(Xi,Zi)>p2(Xi,Zi)Ti>Ti,Ti<τ,p1(Xi)=p1(Xi)=s}

and construct an IPW kernel estimator for Cτ (s) as for other accuracy measures.

Acknowledgements

The Framingham Heart Study and the Framingham SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University. The Framingham SHARe data used for the analyses described in this manuscript were obtained through dbGaP (access number: phs000007.v3.p2). This manuscript was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or the NHLBI. The work is supported by Grants U01-CA86368, P01-CA053996, R01-GM085047, R01-GM079330, R01-AI052817 and U54-LM008748 awarded by the National Institutes of Health.

Appendix A

Let Pn and P denote expectation with respect to (wrt) the empirical probability measure of {(Ti, Δi, Xi, Zi), i = 1, …, n} and the probability measure of (T, Δ, X, Z}, respectively, and Gn=n(PnP). We use F.(x) to denote dF(x)dx for any function F, [similar, equals] to denote equivalence up to op(1), and [less, similar] to denote being bounded above up to a universal constant. Let β0 and γ0 denote the solution to E[Vi{Yig1(βVi)}]=0 and E[Wi{Yig2(γWi)}]=0, respectively. Let p1i=g1(β0Vi) and p2i=g2(γ0Wi). Let ω = ΔI (Tt0)/GX, Z(T) + I (T > t0)/GX, Z(t0), M^i(c)=I(p^2ic) and Mi(c)=I(p2ic). For y = 0, 1, let fy(c; s) denote the conditional density of p2i given Yi=y and pli=s and we assumed that fy(c; s) is continuous and bounded away from zero uniformly in c and s. This assumption implies that ROC(u; s) has continuous and bounded derivative ROC(u; s) = [partial differential]ROC(u; s)/[partial differential]u. We assume that V and W are bounded, and τ(y;s)=pr[ϕ{p1(X)}s,Y=y]s, is continuously differentiable bounded derivatives and bounded away from zero. Throughout, the bandwidths are assumed to be of order n−ν with ν [set membership] (1/5, 1/2). For ease of presentation and without loss of generality, we assume that h1 = h0, denoted by h, and suppress h from the notations. Without loss of generality, we assume that supt,x,zn12{G^X,Z(t)GX,Z(t)}=Op(1). When C is assumed to be independent of both T and (X, Z), the simple Kaplan–Meier estimator satisfies this condition. When C depends on (X, Z), ĜX, Z obtained under the Cox model also satisfies this condition provided that Wc is bounded. The kernel function K is assumed to be symmetric, smooth with a bounded support on [−1, 1] and we let m2 = ∫ K (x)2dx.

A.1 Asymptotic expansions for S^y(c;s)

Uniform convergence rate for S^y(c;s) We first establish the following uniform convergence rate of S^y(c;s)=g{a^y(c;s)}:

supsIh,cS^y(c;s)Sy(c;s)=Op{(nh)12log(n)}=op(1).
(6)

To this end, we note that for any given c and s,

ζ^y(c;s)=[ζ^ay(c;s)ζ^by(c;s)]=[a^y(c;s)ay(c;s)b^y(c;s)by(c;s)]

is the solution to the estimating equation Ψ^y(ζy,c,s)=0, where ζy = (ζay, ζby)’ and

Ψ^y(ζy;c,s)=[Ψ^y1(ζy,c,s)Ψ^y2(ζy,c,s)]=n1i=Yi=yw^i[1h1ε^i1(s)]Kh{ε^i1(s)}×[M^i(c)G{ζy,c,s;ϕ(p^1i),h}],

ay(c;s)=g1{Sy(c;s)}, by(c;s)=g1{Sy(c;s)}s and G(ζy,c,s;e,h)=g[ay(c;s)+by(c;s){eϕ(s)}+ζay+ζbyh1{eϕ(s)}]. We next establish the convergence rate for supζy,c,sΨ^y(ζy;c,s)Ψy(ζy;c,s), where

Ψ^y(ζy;c,s)=[Ψy1(ζy,c,s)Ψy2(ζy;c,s)]=τ(y;s)[Sy(c;s)K(t)g{ay(c;s)+ζay+ζbyt}dttK(t)g{ay(c;s)+ζay+ζbyt}dt].

We first show that

supsIh,cn1i:Yi=yω^Kh{ε^i1(s)}M^i(c)τ(y;s)Sy(c;s)

and

supζy,sIh,cn1i:Yi=yω^Kh{ε^i1(s)}G{ζy,c,s;ϕ(p^1i),h}τ(y;s)K(t)g{ay(c;s)+ζay+ζbyt}dt

are both Op{(nh)−½ log(n)} where Ih=[ϕ1(ρl+h),ϕ1(ρuh)] and [ρl, ρu] is a subset of the support of ϕ{g1(β0TV)}. To this end, we note that since supu |ĜX, Z(u) − GX, Z(u)| = Op(n−½ and β^β0=Op(n12),

n1i:Yi=y(ω^iωi)Kh{ε^i1(s)}G{ζy,c,s;ϕ(p^1i),h}n1i:Yi=yω^iωiKh{ε^i1(s)}=Op(n12).

This implies that

n1i:Yi=yω^iKh{ε^i1(s)}G{ζy,c,s;ϕ(p^1i),h}τ(y;s)K(t)g{ay(c;s)+ζay+ζbyt}dtn12Kh{eϕ(s)}G{ζy,c,s;ϕ(p^1i),h}×dGn[ωI{ϕ(p^i1)e}ωI{ϕ(pi1)e}]+Kh{eϕ(s)}G{ζy,c,s;ϕ(p^1i),h}dP[ωI{ϕ(pi1)e}]τ(y;s)K(t)g{ay(c;s)+ζay+ζbyt}dt+n12Kh{eϕ(s)}dP[ωG{ζy,c,s;ϕ(p^1i),h}I{ϕ(pi1)e}]+Op(n12)n12h1GnHδ+n12Kh{eϕ(s)}×dP[ωG{ζy,c,s;ϕ(p^1i),h}I{ϕ(pi1)e}]+Op(n12+h2),

where Hδ={ωI[ϕ{g1(βν)}e]ωI[ϕ{g1(β0ν)}e]:ββ0δ,e} is a class of functions indexed by β and e. By the maximum inequality of Van der Vaart and Wellner (1996), we have

EGnHδδ12{log(δ)+log(h)}[1+δ12{log(δ)+log(h)}δn12]

Together with the fact that β^β0=Op(n12) from Uno et al. (2007), it implies that n12h1GnHδ=Op{(nh)12(nh2)14log(n)}. In addition, with the standard arguments used in Bickel and Rosenblatt (1973), it can be shown that

n12Kh{eϕ(s)}dP[ωG{ζy,c,s;ϕ(p^1i),h}I{ϕ(pi1)e}]=Op{(nh)12log(n)}.

Therefore, for h = nν, 1/5 < ν < 1/2,

supζy,sIh,cn1i:Yi=yω^iKh{ε^i1(s)}G{ζy,c,s;ϕ(p^1i),h}τ(y;s)K(t)g{ay(c;s)+ζay+ζbyt}dt

is Op{(nh)−½log(n)}. Following with similar arguments as given above, coupled with the fact that γ^γ0=Op(n12), we have

supsIh,c[0,1]n1i:Yi=yω^iKh{ε^i1(s)}M^i(c)τ(y;s)Sy(c;s)=Op{(nh)12log(n)}.

Thus, supζy,c,sΨ^y1(ζy;c,s)Ψy1(ζy;c,s)=Op{(nh)12log(n)}=op(1). It follows from same arguments as given above that

supζy,c,sΨ^y2(ζy;c,s)Ψy2(ζy;c,s)=Op{(nh)12log(n)+h}=op(1).

Therefore, supζy,c,sΨ^y(ζy;c,s)Ψy(ζy;c,s)=op(1). In addition, we note that 0 is the unique solution to the equation ψy(ζy; c, s) = 0 wrt ζy. It suggests that sups,cζ^ay(c;s)=Op{(nh)12log(n)}=op(1), which implies the consistency of Sy(c;s),

supsIh,c[0,1]S^y(c;s)Sy(c;s)=Op{(nh)12log(n)}=op(1).

Asymptotic expansion for S^y(c;s) Let d^y(c;s)=nh{a^y(c;s)ay(c;s)}. It follows from a Taylor series expansion and the convergence rate of ζy(c; s) that

d^y(c;s)=nhPn(ω^I(Y=y)Kh{ε^1(s)}[M^(c)Gy0{c,s;ϕ(p^1)}])τ{y;ϕ(s)}g.{ay(c;s)}+op(1),
(7)

where Gy0(c,s;e)=g[ay(c;s)+by(c;s){eϕ(s)}]. Futhermore, since suptto{ĜX, Z(t) − GX, Z(t)| = Op(n−1/2),

d^y(c;s)=nhPn(ωI(Y=y)Kh{ε^1(s)}[M^(c)Gy0{c,s;ϕ(p^1)}])τ{y;ϕ(s)}g.{ay(c;s)}+op(1).

We next show that d^y(c;s) is asymptotically equivalent to

d~y(c;s)=nhPn(ωI(Y=y)Kh{ε1(s)}[M(c)Gy0{c,s;ϕ(p1)}])τ{y;ϕ(s)}g.{ay(c;s)},
(8)

where ε1(s)=ϕ(p1)ϕ(s). From (8) and the fact that τ{y; [var phi](s)} is bounded away from 0 uniformly in s, we have

d^y(s)d~y(s)h12Kh{eϕ(s)}dGn(I(Y=y)ω[M^(c)I{ϕ(p^1)e}M(c)I{ϕ(p1)e}])+h12Kh{eϕ(s)}Gy(c,s;e)dGn(I(Y=y)[ωI{ϕ(p^1)e}ωI{ϕ(p1)e}])+nhKh{eϕ(s)}dP(I(Y=y)[ωM^(c)I{ϕ(p^1)e}ωM(c)I{ϕ(p1)e}])+nhKh{eϕ(s)}dP(I(Y=y)[ωGy{c,s;ϕ(p^1)}I{ϕ(p^1)e}ωGy{c,s;ϕ(p1)}I{ϕ(p1)e}])h12GnFδ+h12GnHδ+Op{(nh)12β^β0+γ^γ0+h2},

where Fδ={ωI{g2(γω)c}I[ϕ{g1(βν)}e]ωI{g2(γ0ω)c}I[ϕ{g1(β0ν)}e]:γγ0+ββ0δ,e} is the class of functions indexed by γ, β and e. By the maximum inequality of Van der Vaart and Wellner (1996) and the fact that β^β0+γ^γ0=Op(n12) from Uno et al. (2007), we have h12GnFδ=Op{h12n14log(n)} and h12GnHδ=Op{h12n14log(n)}. It follows that supsd^y(s)d~y(s)=op(1). Then, by a delta method,

W^Sy(c;s)=nh{S^y(c;s)Sy(c;s)}nhPn[Kh{ε1(s)}DSy(c;s)]
(9)

where

DSy(c;s)=τ{y;ϕ(s)}1ωI(Y=y){M(c)Sy(c;s)}
(10)

Using the same arguments as for establishing the uniform convergence rate of conditional Kaplan-Meier estimators (Dabrowska 1989; Du and Akritas 2002), we obtain (6). Furthermore, following similar arguments as given in Dabrowska (1987, 1997), we have W^Sy(c;s) converges weakly to a Gaussian process in c for all s. Note that as for all kernel estimators, W^Sy(c;s) does not converge as a process in s.

A.2 Uniform consistency of pAUC^f(s)

Next we establish the uniform convergence rate for ROC^(u;s). To this end, we write

ROC^(u;s)ROC(u;s)=ε^1(u;s)+ε^0(u;s),

where ε^(u;s)=S^1{S^01(u;s);s}S1{S^01(u;s);s} and ε^(u;s)=S1{S^01(u;s);s}S1{S01(u;s);s}. It follows from (6) that supu;sε^1(u;s)supc;sS^1(c;s)S1(c;s). Let I^(u;s)=S0{S^01(u;s);s}. Then ε^0(u;s)=ROC{I(u;s);s}ROC(u;s). Noting that supuI^(u;s)u=supuI^(u;s)S^0{S^01(u;s);s}+n1supcS0(c;s)S^0(c;s)+n1=Op{(nh)12logn}, we have ε^0(u;s)=Op{(nh)12logn} by the continuity and boundedness of ROC(u; s). Therefore,

supu,sROC^(u;s)ROC(u;s)=Op{(nh)12logn}

which implies

supsIhpAUC^f(s)pAUCf(s)supsIh0fROC^(u;s)ROC(u;s)du=Op{(nh)12logn}.

and hence the uniform consistency of pAUC^f(s).

A.3 Asymptotic distribution of W^pAUCf(s)

To derive the asymptotic distribution for W^pAUCf(s), we first derive asymptotic expansions for W^ROC(u;s)=nh{ROC^(u;s)ROC(u;s)}=nhε^1(u;s)+nhε^0(u;s). From the weak convergence of W^Sy(c;s) in c, the approximation in (9), and the consistency of S^01(c;s) given in the Appendix A.2, we have

nhε^1(u;s)nh[S^1{S01(u;s);s}ROC(u;s)]nhPn[Kh{ε1(s)}DS1{S01(u;s);s}]

On the other hand, from the uniform convergence of I^0(u;s)u and the weak convergence of D^0(c;s) in c, we have

nh{uI^(u;s)}nh[I^1{I^(u;s);s}I^(u;s)]nh{I^1(u;s)u}nh[S^0{S01(u;s);s}u]

This, together with a Taylor series expansion and the expansion given (9), implies that

nhε^0(u;s)RO.C(u;s)Pn[Kh{ε1(s)}DS0{S01(u;s);s}]

It follows that

W^pAUCf(s)nhPn[Kh{ε1(s)}DpAUCf(s)]
(11)

where

DpAUCf(s)=0f[DS1{S01(u;s);s}RO.C(u;s)DS0{S01(u;s);s}]du.
(12)

It then follows from a central limit theorem that for any fixed s, W^pAUCf(s) converges to a normal with mean 0 and variance

σpAUCf2(s)=m2[τ{1;ϕ(s)}F.ϕ(P1)(s)]1σ12(s)+m2[τ{0;ϕ(s)}F.ϕ(P1)(s)]1σ02(s),

where F.ϕ(p1)(s) is the density function of ϕ(p1),

σ12(s)=E(G(T)1[0fM{S01(u;s)}dupAUCf(s)]2p1=s,Y=1),

and

σ02(s)=E(G(t0)1[0fM{S01(u;s)}dROC(u;s)0fudROC(u;s)]2p1=s,Y=0).

A.4 Justification for the resampling methods

To justify the resampling method, we first note that ββ^+γγ^+suptt0GX,Z(t)G^X,Z(t)=Op(n12). It follows from similar arguments given in the Appendix A and Appendix 1 of Cai et al. (2010) that WSy(c;s)=nh{Sy(c;s)S^y(c;s)}n12h12i=1nD^Syi(c;s)ξi, where D^Syi(c;s) is obtained by replacing all theoretical quantities in DSy(c;s) given in (10) with the estimated counterparts for the ith subject. This, together with similar arguments as given above for the expansion of W^ROC(u;s), implies that

WpAUCf(s)=0fnh{ROC(u;s)ROC^(u;s)}dun12h12i=1nKh{ε^1(s)}D^pAUCf(s)ξi,

where D^pAUCf(s)=0f[D^S1i{S^01(u;s);s}RO.C(u;s)D^S0i{S^01(u;s);s}]du Conditional on the data, WpAUCf(s) is approximately normally distributed with mean 0 and variance

σ^pAUCf2(s)=h1i=1nKh{ε^(s)}2D^pAUCf(s)2.

Using the consistency of the proposed estimators along with similar arguments as given above, it is not difficult to show that the above variance converges to σpAUCf2(s) as n → ∞. Therefore, the empirical distribution obtained from the perturbed sample can be used to approximate the distribution of W^pAUCf(s).

We now show that after proper standardization, the supermum type statistics Γ converges weakly. To this end, we first note that, similar arguments as given in the Appendix A can be used to show that supsIhσ^pAUCf2(s)σpAUCf2(s)=op(nδ) and

Γ=supsIhnhPn[Kh{ε1(s)}DpAUCf(s)]σpAUCf(s)+op(nδ),

for some small positive constant δ. Using similar arguments in Bickel and Rosenblatt (1973), we have

pr{an(Γdn)<x}e2ex,

where an = [2 log{{ρuρl)/h}]½ and dn=an+an1log{K.(t)2dt(4m2π)}. Now justify the resampling procedure for constructing the CI, we note that

WpAUCf(s)=n12h12i=1nKh{ε^1i(s)}D^pAUCf(s)(ξi1)+ε(s)

where pr{sups[set membership]Ω(h)|nε*(s)| ≥ e | data} → 0 in probability. Therefore,

Γ=supsIhn12h12i=1nKh{ε^1i(s)}D^pAUCf(s)(ξi1)σpAUCf(s)+εsup.

where pr{nδεsupedata}0. It follows from similar arguments as given in Tian et al. (2005) and Zhao et al. (2010) that

suppr{an(Γdn)<xdata}e2ex0,

in probability as n → ∞. Thus, the conditional distribution of an(Γ* − dn) can be used to approximate the unconditional distribution of an(Γ − dn). When h0 = h1, in general, the standardized Γ does not converge to the extreme value distribution. However, when h0 = h1 = k [set membership] (0, ∞), the distribution of the suitable standardized version of Γ still can be approximated by that of the corresponding standardized Γ* conditional on the data (Gilbert et al. 2002).

Appendix B

B.1 Bandwidth selection for pAUCf(s)

The choice of the bandwidths h0 and h1 is important for making inference about Sy(c;s) and consequently pAUCf(s). Here we propose a two-stage K-fold cross-validation procedure to obtain the optimal bandwidth for S^0,h01(u;s) and S^1,h1(c;s) sequentially. Specifically, we randomly split the data into K disjoint subsets of about equal sizes denoted by {Jk,k=1,,K}. The two-stage procedure is described as follows:

  1. Motivated by the fact that S01(u;s) is essentially the (1 − u)-th quantile of the conditional distribution of p2(X,Z) given Y = 0 and p1(X)=s, for each k, we use all the observations not in Jk to estimate q0,1u(s)=S01(u;s) by obtaining {α^0(s;h),α^1(s;h)}, the minimizer of
    jJl,lkI(Yj=0)w^jKh{ε^1j(s)}ρ1u[p^2jg{α0+α1ε^1j(s)}]
    wrt (α0, α1), where ρτ(e) is a check function defined as ρτ(e) = τ e, if e ≥ 0; = (τ − 1)e, otherwise. Let q^0,1u(k)(s;h)=g{α^0(s;h)} denote the resulting estimator of q0,1u(s)=S01(u;s). With observation in Jk, we obtain
    Errk(q0)(h)=iJk(1Yi)w^i0fρ1u[p^2iq^0,1u(k)(p^1i;h)]du.
    Then, we let h0opt=argminhk=1KErrk(q0)(h).
  2. Next, to find an optimal h1 for S^1,h1(;s), we choose an error function that directly relates to pAUCf(s)=S01(f;s)S1(c;s)dS0(c;s). Specifically, noting the fact that
    E(S01(f;s)[I{g2(γWi)c}S1(c;s)]dS0(c;s)Yi=1,g1(βXi)=s)=0,
    we use the corresponding mean integrated squared error for I{g2(γWi)c}S1(c;s) as the error function. For each k, we use all the observations which are not in Jk to obtain the estimate of S1(c;s), denoted by S^1,h(k)(c;s) via (4). Then, with the observations in Jk, we calculate the prediction error
    Errk(S1)(h)=iJk,Yi=1w^iS^0,h01(f;p^1i){I(p^2ic)S^1,h(k)(c;p^1i)}2dS^0,h0(c;p^1i).

We let h1opt=argminhk=1KErrk(S1)(h).

Since the order of hyopt is expected to be n−1/5 (Fan and Gijbels 1995), the bandwidth we use for estimation is hy=hyopt×nd0 with 0 < d < 3/10 such that hy = n−ν with 1/5 < ν < 1/2. This ensures that the resulting functional estimator Sy,hy(c;s) with the data-dependent smooth parameter has the above desirable large sample properties.

B.2 Bandwidth selection for IDI(s)

Same as bandwidth selection for pAUC, we also propose a K-fold cross validation procedure to choose the optimal bandwidth h1 for IS(s)=01S1(c;s)dc and h0 for IP(s)=01S0(c;s)dc separately. The procedure is described as follows: we randomly split the data into K disjoint subsets of about equal sizes denoted by {Jk,k=1,,K}. Motivated by the fact (3), for each k, we use all the observations not in Jk to estimate 01Sy(c,s)dc by obtaining {φ^0(y)(s;h),φ^1(y)(s;h)} for y = 0, 1, which is the solution to the estimating equation

jJl,lkI(Yj=y)w^jKh{ε^1j(s)}[p^2jg{φ0(y)+φ1(y)ε^1j(s)}]=0,

wrt (φ0(y),φ1(y)). Let IS^(k)(s;h)=g{φ^0(1)(s;h)} and IP^(k)(s;h)=g{φ^0(0)(s;h)}. With observations in Jk, we obtain

Errk(IS)(h)=iJkYiw^i{p^2iIS^(k)(p^1i;h)}2,

or

Errk(IP)(h)=iJk(1Yi)w^i{p^2iIP^(k)(p^1i;h)}2,

Then, we let h1opt=argminhk=1KErrk(IS)(h) and h1opt=argminhk=1KErrk(IP)(h).

Appendix C

R codes for application will be available from the corresponding author upon request.

Contributor Information

Qian M. Zhou, Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada, ac.ufs.tats@uohzmq.

Yingye Zheng, Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA, gro.crchf@gnehzy.

Tianxi Cai, Department of Biostatistics, Harvard University, Boston, MA 02115, USA.

References

  • Baker S, Pinsky P. A proposed design and analysis for comparing digital and analog mammography: special receiver operating characteristic methods for cancer screening. J Am Stat Assoc. 2001;96:421–428.
  • Bickel P, Rosenblatt M. On some global measures of the deviations of density function estimates. Ann Stat. 1973;1:1071–1095.
  • Blumenthal R, Michos E, Nasir K. Further improvements in CHD risk prediction for women. J Am Med Assoc. 2007;297:641–643. [PubMed]
  • Cai T, Cheng S. Robust combination of multiple diagnostic tests for classifying censored event times. Biostatistics. 2008;9:216–233. [PubMed]
  • Cai T, Dodd LE. Regression analysis for the partial area under the ROC curve. Stat Sin. 2008;18:817–836.
  • Cai T, Tian L, Wei L. Semiparametric Box–Cox power transformation models for censored survival observations. Biometrika. 2005;92(3):619–632.
  • Cai T, Tian L, Uno H, Solomon S, Wei L. Calibrating parametric subject-specific risk estimation. Biometrika. 2010;97(2):389–404. [PMC free article] [PubMed]
  • Cook N, Ridker P. The use and magnitude of reclassification measures for individual predictors of global cardiovascular risk. Ann Intern Med. 2009;150(11):795–802. [PMC free article] [PubMed]
  • Cook N, Buring J, Ridker P. The effect of including C-reactive protein in cardiovascular risk prediction models for women. Ann Intern Med. 2006;145:21–29. [PubMed]
  • Cox D. Regression models and life-tables. J R Stat Soc B. 1972;34(2):187–220.
  • Dabrowska D. Non-parametric regression with censored survival time data. Scand J Stat. 1987;14(3):181–197.
  • Dabrowska D. Uniform consistency of the kernel conditional Kaplan–Meier estimate. Ann Stat. 1989;17(3):1157–1167.
  • Dabrowska D. Smoothed Cox regression. Ann Stat. 1997;25(4):1510–1540.
  • D’Agostino R. Risk prediction and finding new independent prognostic factors. J Hypertens. 2006;24(4):643–645. [PubMed]
  • Dodd L, Pepe M. Partial AUC estimation and regression. Biometrics. 2003;59:614–623. [PubMed]
  • Du Y, Akritas M. Iid representations of the conditional Kaplan–Meier process for arbitrary distributions. Math Methods Stat. 2002;11:152–182.
  • Dwyer AJ. In pursuit of a piece of the ROC. Radiology. 1996;201:621–625. [PubMed]
  • Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults Executive summary of the third report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) J Am Med Assoc. 2001;285(19):2486–2497. [PubMed]
  • Fan J, Gijbels I. Data-driven bandwidth selection in local polynomial regression: variable bandwidth selection and spatial adaptation. J R Stat Soc B. 1995;57:371–394.
  • Gail M, Pfeiffer R. On criteria for evaluating models of absolute risk. Biostatistics. 2005;6(2):227–239. [PubMed]
  • Gilbert P, Wei L, Kosorok M, Clemens J. Simultaneous inferences on the contrast of two hazard functions with censored observations. Biometrics. 2002;58(4):773–780. [PubMed]
  • Harrell F, Jr, Lee K, Mark D. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361–387. [PubMed]
  • Heagerty P, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005;61:92–105. [PubMed]
  • Jiang Y, Metz C, Nishikawa R. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology. 1996;201:745–750. [PubMed]
  • Jin Z, Ying Z, Wei L. A simple resampling method by perturbing the minimand. Biometrika. 2001;88(2):381–390.
  • Korn E, Simon R. Measures of explained variation for survival data. Stat Med. 1990;9(5):487–503. [PubMed]
  • An approach to nonparametric regression for life history data using local linear fitting. Ann Stat. 23:787–823.
  • McIntosh M, Pepe M. Combining several screening tests: optimality of the risk score. Biometrics. 2002;58(3):657–664. [PubMed]
  • Park Y, Wei L. Estimating subject-specific survival functions under the accelerated failure time model. Biometrika. 2003;9:717–723.
  • Park B, Kim W, Ruppert D, Jones M, Signorini D, Kohn R. Simple transformation techniques for improved non-parametric regression. Scand J Stat. 1997;24(2):145–163.
  • Paynter N, Chasman D, Pare G, Buring J, Cook N, Miletich J, Ridker P. Association between a literature-based genetic risk score and cardiovascular events in women. J Am Med Assoc. 2010;303(7):631–637. [PMC free article] [PubMed]
  • Pencina M, D’Agostino R. Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Stat Med. 2004;23(13):2109–2123. [PubMed]
  • Pencina M, D’Agostino RS, D’Agostino RJ, Vasan R. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond (with Coomentaries & Rejoinder) Stat Med. 2008;27:157–212. [PubMed]
  • Pfeiffer R, Gail M. Two criteria for evaluating risk prediction models. Biometrics. 2010;67(3):1057–1065. [PMC free article] [PubMed]
  • Pfeffer M, Jarcho J. The charisma of subgroups and the subgroups of CHARISMA. N Engl J Med. 2006;354(16):1744–1746. [PubMed]
  • Ridker P. C-reactive protein and the prediction of cardiovascular events among those at intermediate risk: moving an inflammatory hypothesis toward consensus. J Am Coll Cardiol. 2007;49(21):2129–2138. [PubMed]
  • Ridker P, Rifai N, Rose L, Buring J, Cook N. Comparison of C-reactive protein and low-density lipoprotein cholesterol levels in the prediction of first cardiovascular events. N Engl J Med. 2007;347:1557–1565. [PubMed]
  • Robins J, Ya’Acov R. Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Stat Med. 1997;16(3):285–319. [PubMed]
  • Rothwell P. Treating Individuals 1: External validity of randomised controlled trials: “To whom do the results of this trial apply?” Lancet. 2005;365:82–93. [PubMed]
  • Tian L, Zucker D, Wei L. On the Cox model with time-varying regression coefficients. J Am Stat Assoc. 2005;100(469):172–183.
  • Tian L, Cai T, Wei LJ. Identifying subjects who benefit from additional information for better prediction of the outcome variables. Biometrics. 2009;65:894–902. [PMC free article] [PubMed]
  • Tibshirani R, Hastie T. Local likelihood estimation. J Am Stat Assoc. 1987;82(398):559–567.
  • Tice J, Cummings S, Ziv E, Kerlikowske K. Mammographic breast density and the Gail model for breast cancer risk prediction in a screening population. Breast Cancer Res Treat. 2005;94(2):115–122. [PubMed]
  • Uno H, Cai T, Tian L, Wei L. Evaluating prediction rules for t-year survivors with censored regression models. J Am Stat Assoc. 2007;102:527–537.
  • Uno H, Cai T, Pencina M, D’Agostino R, Wei L. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011a;30(10):1105–1117. [PMC free article] [PubMed]
  • Uno H, Cai T, Tian L, Wei LJ. Graphical procedures for evaluating overall and subject-specific incremental values from new predictiors with censored event time data. Biometrics. 2011b;67:1389–1396. [PMC free article] [PubMed]
  • Van der Vaart AW, Wellner JA. Weak convergence and empirical processes. Springer; New York: 1996.
  • Wacholder S, Hartge P, Prentice R, Garcia-Closas M, Feigelson H, Diver W, Thun M, Cox D, Hankinson S, Kraft P, et al. Performance of common genetic variants in breast-cancer risk models. N Engl J Med. 2010;362(11):986–993. [PMC free article] [PubMed]
  • Wand M, Marron J, Ruppert D. Transformation in density estimation (with comments) J Am Stat Assoc. 1991;86:343–361.
  • Wang T, Gona P, Larson M, Tofler G, Levy D, Newton-Cheh C, Jacques P, Rifai N, Selhub J, Robins S. Multiple biomarkers for the prediction of first major cardiovascular events and death. N Engl J Med. 2006;355:2631–2639. [PubMed]
  • Wang R, Lagakos S, Ware J, Hunter D, Drazen J. Statistics in medicine-reporting of subgroup analyses in clinical trials. N Engl J Med. 2007;357(21):2189–2194. [PubMed]
  • Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of cornary heart disease using risk factor categories. Circulation. 1998;97:1837–1847. [PubMed]
  • Zhao L, Cai T, Tian L, Uno H, Solomon S, Wei L, Minnier J, Kohane I, Pencina M, D’Agostino R, et al. Harvard University Biostatistics Working Paper Series 2010: Working Paper 122. 2010. Stratifying subjects for treatment selection with censored event time data from a comparative study.